A hadoop based platform for natural language processing of web pages and documents. (December 2015)