An improved data anonymization algorithm for incomplete medical dataset publishing

Wei Liu, Mengli Pei, Congcong Cheng, Wei She, Chase Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To protect sensitive information of patients and prevent privacy leakage, it is necessary to deal with data anonymously in medical dataset publishing. Most of the existing anonymity protection technologies discard the records with missing data, and it will cause large differences in characteristics in data anonymization, resulting in severe information loss. To solve this problem, we propose a novel data anonymization algorithm for incomplete medical dataset based on L-diversity algorithm (DAIMDL) in this work. In the premise of preserving records with missing data, DAIMDL clusters data on the basis of the improved k-member algorithm, and uses the information entropy generated by data generalization to calculate the distance in clustering stage. Then, the data groups obtained by clustering are generalized. The experimental results show that it can protect the sensitive attributes of patients better, reduce the information loss during the anonymization process of missing data, and improve the availability of the dataset.

Original languageEnglish (US)
Title of host publicationProceedings of the 2nd International Conference on Healthcare Science and Engineering
EditorsXianxian Li, Chase Q. Wu, Ming-Chien Chyu, Jaime Lloret
PublisherSpringer Verlag
Pages115-128
Number of pages14
ISBN (Print)9789811368363
DOIs
StatePublished - Jan 1 2019
Event2nd International Conference on Healthcare Science and Engineering, Healthcare 2018 - Guilin, China
Duration: Jun 1 2018Jun 3 2018

Publication series

NameLecture Notes in Electrical Engineering
Volume536

Conference

Conference2nd International Conference on Healthcare Science and Engineering, Healthcare 2018
CountryChina
CityGuilin
Period6/1/186/3/18

Fingerprint

Entropy
Availability

All Science Journal Classification (ASJC) codes

  • Industrial and Manufacturing Engineering

Keywords

  • Data anonymization
  • Incomplete medical dataset
  • L-diversity
  • Missing data

Cite this

Liu, W., Pei, M., Cheng, C., She, W., & Wu, C. (2019). An improved data anonymization algorithm for incomplete medical dataset publishing. In X. Li, C. Q. Wu, M-C. Chyu, & J. Lloret (Eds.), Proceedings of the 2nd International Conference on Healthcare Science and Engineering (pp. 115-128). (Lecture Notes in Electrical Engineering; Vol. 536). Springer Verlag. https://doi.org/10.1007/978-981-13-6837-0_9
Liu, Wei ; Pei, Mengli ; Cheng, Congcong ; She, Wei ; Wu, Chase. / An improved data anonymization algorithm for incomplete medical dataset publishing. Proceedings of the 2nd International Conference on Healthcare Science and Engineering. editor / Xianxian Li ; Chase Q. Wu ; Ming-Chien Chyu ; Jaime Lloret. Springer Verlag, 2019. pp. 115-128 (Lecture Notes in Electrical Engineering).
@inproceedings{284997b051b04becab75d8659e4c2724,
title = "An improved data anonymization algorithm for incomplete medical dataset publishing",
abstract = "To protect sensitive information of patients and prevent privacy leakage, it is necessary to deal with data anonymously in medical dataset publishing. Most of the existing anonymity protection technologies discard the records with missing data, and it will cause large differences in characteristics in data anonymization, resulting in severe information loss. To solve this problem, we propose a novel data anonymization algorithm for incomplete medical dataset based on L-diversity algorithm (DAIMDL) in this work. In the premise of preserving records with missing data, DAIMDL clusters data on the basis of the improved k-member algorithm, and uses the information entropy generated by data generalization to calculate the distance in clustering stage. Then, the data groups obtained by clustering are generalized. The experimental results show that it can protect the sensitive attributes of patients better, reduce the information loss during the anonymization process of missing data, and improve the availability of the dataset.",
keywords = "Data anonymization, Incomplete medical dataset, L-diversity, Missing data",
author = "Wei Liu and Mengli Pei and Congcong Cheng and Wei She and Chase Wu",
year = "2019",
month = "1",
day = "1",
doi = "https://doi.org/10.1007/978-981-13-6837-0_9",
language = "English (US)",
isbn = "9789811368363",
series = "Lecture Notes in Electrical Engineering",
publisher = "Springer Verlag",
pages = "115--128",
editor = "Xianxian Li and Wu, {Chase Q.} and Ming-Chien Chyu and Jaime Lloret",
booktitle = "Proceedings of the 2nd International Conference on Healthcare Science and Engineering",
address = "Germany",

}

Liu, W, Pei, M, Cheng, C, She, W & Wu, C 2019, An improved data anonymization algorithm for incomplete medical dataset publishing. in X Li, CQ Wu, M-C Chyu & J Lloret (eds), Proceedings of the 2nd International Conference on Healthcare Science and Engineering. Lecture Notes in Electrical Engineering, vol. 536, Springer Verlag, pp. 115-128, 2nd International Conference on Healthcare Science and Engineering, Healthcare 2018, Guilin, China, 6/1/18. https://doi.org/10.1007/978-981-13-6837-0_9

An improved data anonymization algorithm for incomplete medical dataset publishing. / Liu, Wei; Pei, Mengli; Cheng, Congcong; She, Wei; Wu, Chase.

Proceedings of the 2nd International Conference on Healthcare Science and Engineering. ed. / Xianxian Li; Chase Q. Wu; Ming-Chien Chyu; Jaime Lloret. Springer Verlag, 2019. p. 115-128 (Lecture Notes in Electrical Engineering; Vol. 536).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - An improved data anonymization algorithm for incomplete medical dataset publishing

AU - Liu, Wei

AU - Pei, Mengli

AU - Cheng, Congcong

AU - She, Wei

AU - Wu, Chase

PY - 2019/1/1

Y1 - 2019/1/1

N2 - To protect sensitive information of patients and prevent privacy leakage, it is necessary to deal with data anonymously in medical dataset publishing. Most of the existing anonymity protection technologies discard the records with missing data, and it will cause large differences in characteristics in data anonymization, resulting in severe information loss. To solve this problem, we propose a novel data anonymization algorithm for incomplete medical dataset based on L-diversity algorithm (DAIMDL) in this work. In the premise of preserving records with missing data, DAIMDL clusters data on the basis of the improved k-member algorithm, and uses the information entropy generated by data generalization to calculate the distance in clustering stage. Then, the data groups obtained by clustering are generalized. The experimental results show that it can protect the sensitive attributes of patients better, reduce the information loss during the anonymization process of missing data, and improve the availability of the dataset.

AB - To protect sensitive information of patients and prevent privacy leakage, it is necessary to deal with data anonymously in medical dataset publishing. Most of the existing anonymity protection technologies discard the records with missing data, and it will cause large differences in characteristics in data anonymization, resulting in severe information loss. To solve this problem, we propose a novel data anonymization algorithm for incomplete medical dataset based on L-diversity algorithm (DAIMDL) in this work. In the premise of preserving records with missing data, DAIMDL clusters data on the basis of the improved k-member algorithm, and uses the information entropy generated by data generalization to calculate the distance in clustering stage. Then, the data groups obtained by clustering are generalized. The experimental results show that it can protect the sensitive attributes of patients better, reduce the information loss during the anonymization process of missing data, and improve the availability of the dataset.

KW - Data anonymization

KW - Incomplete medical dataset

KW - L-diversity

KW - Missing data

UR - http://www.scopus.com/inward/record.url?scp=85065920765&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065920765&partnerID=8YFLogxK

U2 - https://doi.org/10.1007/978-981-13-6837-0_9

DO - https://doi.org/10.1007/978-981-13-6837-0_9

M3 - Conference contribution

SN - 9789811368363

T3 - Lecture Notes in Electrical Engineering

SP - 115

EP - 128

BT - Proceedings of the 2nd International Conference on Healthcare Science and Engineering

A2 - Li, Xianxian

A2 - Wu, Chase Q.

A2 - Chyu, Ming-Chien

A2 - Lloret, Jaime

PB - Springer Verlag

ER -

Liu W, Pei M, Cheng C, She W, Wu C. An improved data anonymization algorithm for incomplete medical dataset publishing. In Li X, Wu CQ, Chyu M-C, Lloret J, editors, Proceedings of the 2nd International Conference on Healthcare Science and Engineering. Springer Verlag. 2019. p. 115-128. (Lecture Notes in Electrical Engineering). https://doi.org/10.1007/978-981-13-6837-0_9