Parsing sentences and words from sentences in java

Jul 11, 06:45 AM

I think this is worth highlighting, because I’ve seen so many cases where programmers “parse” text using java tools like StringTokenizer or split() with a set of punctuation characters:

java already has a built-in, locale-aware method for getting sentences from text, and words from sentences:

java.text.BreakIterator

Anything you write yourself to parse text will likely miss corner-cases and be un-prepared for other languages.

Since BreakIterator does the job, isn’t difficult to use and has been around jdk 1.2, why not use it?

odd punctuation like this, for example, when reading words.

---

Comment

Commenting is closed for this article.

---