Main Content

Artificial data give the same results as real data — without compromising privacy

Although data scientists can gain great insights from large data sets — and can ultimately use these insights to tackle major challenges — accomplishing this is much easier said than done. Many such efforts are stymied from the outset, as privacy concerns make it difficult for scientists to access the data they would like to work with. In a paper presented at the IEEE International Conference on Data Science and Advanced Analytics, members of the Data to AI Lab at the MIT Laboratory for Information and Decision Systems (LIDS) Kalyan Veeramachaneni, principal research scientist in LIDS and the Institute for Data, Systems, and Society (IDSS) and co-authors Neha Patki and Roy Wedge describe a machine learning system that automatically creates synthetic data — with the goal of enabling data science efforts that, due to a lack of access to real data, may have otherwise not left the ground. While the use of authentic data can cause significant privacy concerns, this synthetic data is completely different from that produced by real users — but can still be used to develop and test data science algorithms and models.”

Link to article