Supervised machine learning for text analysis in R. (2021)

Record Type:: Book
Title:: Supervised machine learning for text analysis in R. (2021)
Main Title:: Supervised machine learning for text analysis in R
Further Information:: Note: Emil Hvitfeldt, Julia Sigle.
Authors:: Hvitfeldt, Emil
Sigle, Julia
Contents:: I Natural Language Features 1. Language and modeling Linguistics for text analysis A glimpse into one area: morphology Different languages Other ways text can vary Summary 2. Tokenization What is a token? Types of tokens Character tokens Word tokens Tokenizing by n-grams Lines, sentence, and paragraph tokens Where does tokenization break down? Building your own tokenizer Tokenize to characters, only keeping letters Allow for hyphenated words Wrapping it in a function Tokenization for non-Latin alphabets Tokenization benchmark Summary 3. Stop words Using premade stop word lists Stop word removal in R Creating your own stop words list All stop word lists are context-specific What happens when you remove stop words Stop words in languages other than English Summary 4. Stemming How to stem text in R Should you use stemming at all? Understand a stemming algorithm Handling punctuation when stemming Compare some stemming options Lemmatization and stemming Stemming and stop words Summary 5. Word Embeddings Motivating embeddings for sparse, high-dimensional data Understand word embeddings by finding them yourself Exploring CFPB word embeddings Use pre-trained word embeddings Fairness and word embeddings Using word embeddings in the real world Summary II Machine Learning Methods; Regression A first regression model Building our first regression model Evaluation Compare to the null model Compare to a random forest model Case study: removing stop words Case study: varying n-grams Case … (more)
Edition:: 1st
Publisher Details:: Boca Raton : Chapman & Hall/CRC
Publication Date:: 2021
Extent:: 1 online resource, illustrations (black and white, and colour)
Subjects:: 006.35
Computational linguistics -- Statistical methods
Natural language processing (Computer science)
Supervised learning (Machine learning)
Predictive analytics
R (Computer program language)
Languages:: English
ISBNs:: 9781000461992
9781000461978
9781003093459
Related ISBNs:: 9780367554187
9780367554194
Notes:: Note: Includes bibliographical references and index.
Note: Description based on CIP data; resource not viewed.
Access Rights:: Legal Deposit; Only available on premises controlled by the deposit library and to one user at any one time; The Legal Deposit Libraries (Non-Print Works) Regulations (UK).
Access Usage:: Restricted: Printing from this resource is governed by The Legal Deposit Libraries (Non-Print Works) Regulations (UK) and UK copyright law currently in force.
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library HMNTS - ELD.DS.644014
Ingest File:: 06_038.xml