Linguistically-augmented perplexity-based data selection for language models. (July 2015)