Logo 
Search:

MS Office Answers

Ask Question   UnAnswered
Home » Forum » MS Office       RSS Feeds
  Question Asked By: Marjorie Tucker   on Sep 29 In MS Office Category.

  
Question Answered By: Cais Nguyen   on Sep 29

text  parser basically takes a continuous flow of text-based input
and breaks it down or extracts it into various pieces. The key to
that extraction is having recognizable (and consistent) delimiters or
patterns.

I think the biggest problem you'd have would be whether or not various
obituaries have a consistent enough pattern to extract the data,
especially across various publications. You could certainly look for
key phrases like "passed away" or "born" or "survived by" to parse the
data into pieces, then parse those apart with other delimiters. For
example, once you found "survived by", a semi-colon could be used to
delimit between each type of survivor, as in:

That last line could then be parsed by the comma delimiter, giving you:

Clean it up to remove the "and" and covert the first word of
"grandchildren" to "Grandchild" and you'd end up with:

The number of variations on names may make it difficult to determine
exactly what the last name is. In this case, the pattern is
straightforward, but if you start adding "Jr." or "III" or two word
last names like "Le Clair", it gets more difficult.

However, I would suspect most algorithms you need already exist on the
net. I suspect such a routine has already been needed for one thing
or another...

Share: 

 

This Question has 2 more answer(s). View Complete Question Thread

 
Didn't find what you were looking for? Find more on What is a text file parser? How does it work, basically? Or get search suggestion and latest updates.


Tagged: