Sunday, July 11, 2010

Machine Learning for Text Extraction

In a previous post we looked at the use of Natural Language Processing techniques in text extraction. Several steps are involved in the processing as each document passes through a pipeline of chained tasks.

A deep pipeline can take several seconds for a document. So if one is dealing with thousands of documents an hour the processing requirements could make the system nonviable. Care needs to be taken to evaluate the trade-off between the improvements in accuracy caused by adding pipeline tasks with the additional processing power that it entails.