Deprecated: Optional parameter $pophelp declared before required parameter $event is implicitly treated as a required parameter in /home/bradt/sites/blog.ipsin.org/textpattern/lib/txplib_html.php on line 1425
General error Warning: Cannot modify header information - headers already sent by (output started at /home/bradt/sites/blog.ipsin.org/textpattern/lib/txplib_html.php:1425) on line 4706
General error Warning: Cannot modify header information - headers already sent by (output started at /home/bradt/sites/blog.ipsin.org/textpattern/lib/txplib_html.php:1425) on line 5260
Parsing sentences and words from sentences in java | ipsin.org

Parsing sentences and words from sentences in java

Jul 11, 07:45 AM

I think this is worth highlighting, because I’ve seen so many cases where programmers “parse” text using java tools like StringTokenizer or split() with a set of punctuation characters:

java already has a built-in, locale-aware method for getting sentences from text, and words from sentences:

java.text.BreakIterator

Anything you write yourself to parse text will likely miss corner-cases and be un-prepared for other languages.

Since BreakIterator does the job, isn’t difficult to use and has been around jdk 1.2, why not use it?

odd punctuation like this, for example, when reading words.

---

Comment

Commenting is closed for this article.

---