'Will I Regret for This Tweet?'—Twitter User's Behavior Analysis System for Private Data Disclosure. (9th May 2020)
- Record Type:
- Journal Article
- Title:
- 'Will I Regret for This Tweet?'—Twitter User's Behavior Analysis System for Private Data Disclosure. (9th May 2020)
- Main Title:
- 'Will I Regret for This Tweet?'—Twitter User's Behavior Analysis System for Private Data Disclosure
- Authors:
- Geetha, R
Karthika, S
Kumaraguru, Ponnurangam - Abstract:
- Abstract: Twitter is an extensively used micro-blogging site for publishing user's views on recent happenings. This wide reachability of messages over large audience poses a threat, as the degree of personally identifiable information disclosed might lead to user regrets. The Tweet-Scan-Post system scans the tweets contextually for sensitive messages. The tweet repository was generated using cyber-keywords for personal, professional and health tweets. The Rules of Sensitivity and Contextuality was defined based on standards established by various national regulatory bodies. The naive sensitivity regression function uses the Bag-of-Words model built from short text messages. The imbalanced classes in dataset result in misclassification with 25% of sensitive and 75% of insensitive tweets. The system opted stacked classification to combat the problem of imbalanced classes. The system initially applied various state-of-art algorithms and predicted 26% of the tweets to be sensitive. The proposed stacked classification approach increased the overall proportion of sensitive tweets to 35%. The system contributes a vocabulary set of 201 Sensitive Privacy Keyword using the boosting approach for three tweet categories. Finally, the system formulates a sensitivity scaling called TSP's Tweet Sensitivity Scale based on Senti-Cyber features composed of Sensitive Privacy Keywords, Cyber-keywords with Non-Sensitive Privacy Keywords and Non-Cyber-keywords to detect the degree of disclosedAbstract: Twitter is an extensively used micro-blogging site for publishing user's views on recent happenings. This wide reachability of messages over large audience poses a threat, as the degree of personally identifiable information disclosed might lead to user regrets. The Tweet-Scan-Post system scans the tweets contextually for sensitive messages. The tweet repository was generated using cyber-keywords for personal, professional and health tweets. The Rules of Sensitivity and Contextuality was defined based on standards established by various national regulatory bodies. The naive sensitivity regression function uses the Bag-of-Words model built from short text messages. The imbalanced classes in dataset result in misclassification with 25% of sensitive and 75% of insensitive tweets. The system opted stacked classification to combat the problem of imbalanced classes. The system initially applied various state-of-art algorithms and predicted 26% of the tweets to be sensitive. The proposed stacked classification approach increased the overall proportion of sensitive tweets to 35%. The system contributes a vocabulary set of 201 Sensitive Privacy Keyword using the boosting approach for three tweet categories. Finally, the system formulates a sensitivity scaling called TSP's Tweet Sensitivity Scale based on Senti-Cyber features composed of Sensitive Privacy Keywords, Cyber-keywords with Non-Sensitive Privacy Keywords and Non-Cyber-keywords to detect the degree of disclosed sensitive information. … (more)
- Is Part Of:
- Computer journal. Volume 65:Number 2(2022)
- Journal:
- Computer journal
- Issue:
- Volume 65:Number 2(2022)
- Issue Display:
- Volume 65, Issue 2 (2022)
- Year:
- 2022
- Volume:
- 65
- Issue:
- 2
- Issue Sort Value:
- 2022-0065-0002-0000
- Page Start:
- 275
- Page End:
- 296
- Publication Date:
- 2020-05-09
- Subjects:
- Twitter -- boosting -- classification -- sensitivity -- privacy in OSN -- regrets -- sensitive privacy keywords -- cyber keywords
Computers -- Periodicals
005.1 - Journal URLs:
- http://comjnl.oxfordjournals.org/ ↗
http://ukcatalogue.oup.com/ ↗ - DOI:
- 10.1093/comjnl/bxaa027 ↗
- Languages:
- English
- ISSNs:
- 0010-4620
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.060000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20958.xml