The Fast Maximum Distance to Average Vector (F-MDAV): An algorithm for k-anonymous microaggregation in big data. (April 2020)
- Record Type:
- Journal Article
- Title:
- The Fast Maximum Distance to Average Vector (F-MDAV): An algorithm for k-anonymous microaggregation in big data. (April 2020)
- Main Title:
- The Fast Maximum Distance to Average Vector (F-MDAV): An algorithm for k-anonymous microaggregation in big data
- Authors:
- Rodríguez-Hoyos, Ana
Estrada-Jiménez, José
Rebollo-Monedero, David
Mezher, Ahmad Mohamad
Parra-Arnau, Javier
Forné, Jordi - Abstract:
- Abstract: The massive exploitation of tons of data is currently guiding critical decisions in domains such as economics or health. But serious privacy risks arise since personal data is commonly involved. k -Anonymous microaggregation is a well-known method that guarantees individuals' privacy while preserving much of data utility. Unfortunately, methods like this are computationally expensive in big data settings, whereas the application domain of data might require an immediate response to make "life or death" decisions. Accordingly, this paper proposes five strategies to simplify the internal operations (such as distance calculations and element sorting) of the maximum distance to average vector algorithm, the de facto microaggregation standard. For the sake of its usability in large-scale databases, they, e.g., reduce the number of operations necessary to compute distances from 3 m to 2 m, where m is the number of attributes of the data set. Also, the complexity of sorting operations gets reduced from O ( n log n ) to O ( n ) where n is the number of records. Through extensive experimentation over multiple data sets, we show that the new algorithm gets significantly faster. Interestingly, the speedup factor by each technique is not greater than 2, but the multiplicative effect of combining them all turns the algorithm four times faster than the original microaggregation mechanism. This remarkable speedup factor is achieved, literally, with no additional cost in terms ofAbstract: The massive exploitation of tons of data is currently guiding critical decisions in domains such as economics or health. But serious privacy risks arise since personal data is commonly involved. k -Anonymous microaggregation is a well-known method that guarantees individuals' privacy while preserving much of data utility. Unfortunately, methods like this are computationally expensive in big data settings, whereas the application domain of data might require an immediate response to make "life or death" decisions. Accordingly, this paper proposes five strategies to simplify the internal operations (such as distance calculations and element sorting) of the maximum distance to average vector algorithm, the de facto microaggregation standard. For the sake of its usability in large-scale databases, they, e.g., reduce the number of operations necessary to compute distances from 3 m to 2 m, where m is the number of attributes of the data set. Also, the complexity of sorting operations gets reduced from O ( n log n ) to O ( n ) where n is the number of records. Through extensive experimentation over multiple data sets, we show that the new algorithm gets significantly faster. Interestingly, the speedup factor by each technique is not greater than 2, but the multiplicative effect of combining them all turns the algorithm four times faster than the original microaggregation mechanism. This remarkable speedup factor is achieved, literally, with no additional cost in terms of data utility, i.e., it does not incur greater information loss. … (more)
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 90(2020)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 90(2020)
- Issue Display:
- Volume 90, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 90
- Issue:
- 2020
- Issue Sort Value:
- 2020-0090-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-04
- Subjects:
- MDAV -- k-anonymous microaggregation -- Speedup -- Data privacy -- Big data
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2020.103531 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13413.xml