'Amnesia' - A Selection of Machine Learning Models That Can Forget User Data Very Fast

Sebastian Schelter

Abstract

Software systems that learn from user data with machine learning (ML) techniques have become ubiquitous over the last years. Recent law requires companies and institutions that process personal data to delete user data upon request (enacting the ‘right to be forgotten’). However, it is not sufficient to merely delete the user data from databases. ML models that have been learnt from the stored data can be considered a lossy compressed version of the data, and therefore it can be argued that the user data must also be removed from them. Typically, this requires an inefficient and costly retraining of the affected ML models from scratch, as well as access to the original training data. We address this perfomance issue by formulating the problem of ‘decrementally’ updating trained ML models to ‘forget’ the data of a user, and present efficient decremental update procedures for three popular ML algorithms. In an experimental evaluation on synthetic data, we find that our decremental update is two orders of magnitude faster than retraining the model without a particular user’s data in the majority of cases.

Type

Conference paper

Publication

Conference on Innovative Data Systems Research (CIDR)

Date

January, 2020

Links

PDF