Differential Privacy

Accountability and trust are difficult to garner for a company without using data collection and processing techniques that respect an individual's privacy. With growing number of applications employing Artificial Intelligence, web and mobile applications are collecting and processing large data streams and hence the importance of data privacy is going on a steep upward path. In the wake of it's growing importance, a new line of research in theoretical computer science emerged to be known as differential privacy. Differential privacy has provided a framework for computing on sensitive datasets in which one can mathematically prove the privacy of individual-specific information.

Definition : M : 𝒳ⁿ X 𝒬 → 𝒴 is differentially private iff ∀ q ∈ 𝒬, and ∀ x, x' ∈ 𝒳 differing only on a single row, the distributions M(x, q) and M(x', q) are similar

Here, 𝒳 is the space from which we get rows, 𝒬 is the space of questions, and 𝒴 is the output space. The main idea behind the definition is that no individual's data has a significant influence on the output

In addition to this, private learning is a combination of probably approximate correct learning and differential privacy. Thus, we collect labeled individual information, and output a hypothesis while preserving the privacy of each individual.

Google just announced TensorFlow privacy module - thus enabling machine learning with differential privacy in their TensorFlow framework

References:

Written on March 7, 2019