Computer‐Assisted Keyword and Document Set Discovery from Unstructured Text. Issue 4 (3rd April 2017)
- Record Type:
- Journal Article
- Title:
- Computer‐Assisted Keyword and Document Set Discovery from Unstructured Text. Issue 4 (3rd April 2017)
- Main Title:
- Computer‐Assisted Keyword and Document Set Discovery from Unstructured Text
- Authors:
- King, Gary
Lam, Patrick
Roberts, Margaret E. - Abstract:
- Abstract: The (unheralded) first step in many applications of automated text analysis involves selecting keywords to choose documents from a large text corpus for further study. Although all substantive results depend on this choice, researchers usually pick keywords in ad hoc ways that are far from optimal and usually biased. Most seem to think that keyword selection is easy, since they do Google searches every day, but we demonstrate that humans perform exceedingly poorly at this basic task. We offer a better approach, one that also can help with following conversations where participants rapidly innovate language to evade authorities, seek political advantage, or express creativity; generic web searching; eDiscovery; look‐alike modeling; industry and intelligence analysis; and sentiment and topic analysis. We develop a computer‐assisted (as opposed to fully automated or human‐only) statistical approach that suggests keywords from available text without needing structured data as inputs. This framing poses the statistical problem in a new way, which leads to a widely applicable algorithm. Our specific approach is based on training classifiers, extracting information from (rather than correcting) their mistakes, and summarizing results with easy‐to‐understand Boolean search strings. We illustrate how the technique works with analyses of English texts about the Boston Marathon bombings, Chinese social media posts designed to evade censorship, and others.
- Is Part Of:
- American journal of political science. Volume 61:Issue 4(2017)
- Journal:
- American journal of political science
- Issue:
- Volume 61:Issue 4(2017)
- Issue Display:
- Volume 61, Issue 4 (2017)
- Year:
- 2017
- Volume:
- 61
- Issue:
- 4
- Issue Sort Value:
- 2017-0061-0004-0000
- Page Start:
- 971
- Page End:
- 988
- Publication Date:
- 2017-04-03
- Subjects:
- Political science -- Periodicals
Electronic journals
Computer network resources
320.05 - Journal URLs:
- http://books.google.com/books?id=b3YbAAAAIAAJ ↗
http://books.google.com/books?id=vHobAAAAIAAJ ↗
http://books.google.com/books?id=KH0bAAAAIAAJ ↗
http://books.google.com/books?id=hH4bAAAAIAAJ ↗
http://catalog.hathitrust.org/api/volumes/oclc/1789847.html ↗
http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%291540-5907 ↗
http://www.jstor.org/journals/00925853.html ↗
http://onlinelibrary.wiley.com/ ↗
http://firstsearch.oclc.org ↗
http://firstsearch.oclc.org/journal=0092-5853;screen=info;ECOIP ↗ - DOI:
- 10.1111/ajps.12291 ↗
- Languages:
- English
- ISSNs:
- 0092-5853
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 0834.300000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5281.xml