Extremely Private Supervised Learning



In quite a few sensitive domains, such as hospitals, or financial markets, the data curator has access to a large repository of private data, but is unwilling/unable to divulge any of this data, referred to as the target data. In-situ analysis is compromised owing to constraints on computational resources or on availability of in-house experts. The novelty of the proposed approach, called ExPriL, is to only require some access to the marginals of the target data, in order to learn a fitting hypothesis, and through it a privacy-preserving synthetic version of the target data. Several approaches, aimed to learn from (very) limited information about the target data, have been proposed at the intersection of privacy-preserving learning, generative modelling, and domain adaptation. All these approaches, to our best knowledge, assume that the learner has access to the joint distribution of the target data, an assumption that the proposed approach relaxes significantly.

This paper presents a new approach called ExPriL for learning from extremely private data. Iteratively, the learner supplies a candidate hypothesis and the data curator only releases the marginals of the error incurred by the hypothesis on the privately-held target data. Using the marginals as supervisory signal, the goal is to learn a hypothesis that fits this target data as best as possible. The privacy of the mechanism is provably enforced, assuming that the overall number of iterations is known in advance.
[paper] [code] [video]