Random additions efficiently anonymize large data sets

December 29, 2015

Random additions efficiently anonymize large data sets

Original (left) and reconstructed anonymized data (right) for age and occupation using the proposed algorithm.

Balancing transparency and freedom of information with the right to privacy lays high demands on data handling methods. So far methods for anonymizing shared data sets have assumed that there is a distinction between details that can be used to identify an individual (quasi-identifiers) and details that are deemed 'sensitive' and private, but this is not always the case. Now Yuichi Sei and Akihiko Ohsuga from the University of Electro- Communications, alongside Takao Takenouchi from NEC Corporation in Japan, have devised an algorithm that efficiently anonymizes data sets without assuming this distinction.

The researchers use hospital lists as an example. A data set may include the name (direct identifier), address and age (quasi-identifier) and sensitive information (a medical condition). Even without giving the name for each entry, someone using the data set could identify entries from the age and address. In addition, anonymization should be resistant to attempts to identify particulars by comparing two anonymized sets for the same data.

One approach to anonymizing data is to add noise to a data set, where the frequency of each possible value for each attribute is presented in a histogram. However as Sei, Ohsuga and Takenouchi point out this can greatly increase the quantity of the data. "Because almost all of the categories have only a few people in the histogram, the noise added to each category of the histogram has a heavy impact."

The UEC-NEC Corporation researchers instead randomised the data set for each attribute and added random values to each entry. "Through simulations of real data sets, we prove that our proposed method can anonymize and reconstruct databases while keeping a high quality of data within a realistic period." The approach may be useful for anonymizing public records such as the census and electronic electoral votes.

More information: (l1, ..., lq)-diversity for anonymizing sensitive quasi-identifiers 2015 IEEE Trustcom/BigDataSE/ISPA 596-603.

Provided by University of Electro Communications

Recommended

�鶹��Ժ

Random additions efficiently anonymize large data sets

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

How to train your robot: Research provides new approaches

Smart mRNA drugs listen to the body, adjusting protein production based on disease-related signals

Permanent magnet configurations outperform classical arrangement to deliver strong and homogeneous fields

AI helps narrow 8,000 catalyst options down to one that supercharges green ammonia

New study offers a double dose of hot Jupiters

From spin glasses to quantum codes: Researchers develop optimal error correction algorithm

Multicore fiber testbed demonstrates precise optical clock signal transmission over 25 km

Chemical 'staples' help collagen resist unraveling and repair itself after being heated

Nanobody-based 3D immunohistochemistry allows rapid visualization in thick tissue samples

Lanthanide-doped nanomaterials unlock new horizons for advanced imaging and photonics

Biologists uncover how different coral reproduction methods shape Caribbean reefs' future

Scientists propose blueprint for 'universal translator' in quantum networks

Get Instant Summarized Text (GIST)