H. Ren, J. Wang, and WX. Zhao
in Proceedings of the 28th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'22)
In recent years, automatic computational systems based on deep learning are widely used in medical fields, such as automatic diag- nosing and disease prediction. Most of these systems are designed for data sufficient scenarios. However, due to the disease rarity or privacy, the medical data are always insufficient. When applying these data-hungry deep learning models with insufficient data, it is likely to lead to issues of over-fitting and cause serious performance problems. Many data augmentation methods have been proposed to solve the data insufficiency problem, such as using GAN (Gen- erative Adversarial Networks) to generate training data. However, the augmented data usually contains lots of noise. Directly using them to train sensitive medical models is very difficult to achieve satisfactory results. To overcome this problem, we propose a novel deep model learn- ing method for insufficient EHR (Electronic Health Record) data modeling, namely GRACE, which stands GeneRative Adversarial networks enhanCed prE-training. In the method, we propose an item-relation-aware GAN to capture changing trends and correla- tions among data for generating high-quality EHR records. Further- more, we design a pre-training mechanism consisting of a masked records prediction task and a real-fake contrastive learning task to learn representations for EHR data using both generated and real data. After the pre-training, only the representations of real data is used to train the final prediction model. In this way, we can fully exploit useful information in generated data through pre-training, and also avoid the problems caused by directly using noisy gener- ated data to train the final prediction model. The effectiveness of the proposed method is evaluated using extensive experiments on three healthcare-related real-world datasets. We also deploy our method in a maternal and child health care hospital for the online test. Both offline and online experimental results demonstrate the effectiveness of the proposed method. We believe doctors and pa- tients can benefit from our effective learning method in various healthcare-related applications.
@inproceedings{ren2022generative,
title={Generative Adversarial Networks Enhanced Pre-training for Insufficient Electronic Health Records Modeling}, author={Ren, Houxing and Wang, Jingyuan and Zhao, Wayne Xin},
booktitle={Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
pages={3810--3818},
year={2022}
}