Responsible Data Management


Incorporating ethics and legal compliance into data-driven algorithmic systems has been attracting significant attention from the computing research community, most notably under the umbrella of fair and interpretable machine learning. Yet, much of this work has been limited to the ‘last mile’ of data analysis, disregarding both the data lifecycle, and the lifecycle of a system’s design, development, and use. In this paper, we argue that the decisions we make during data collection and preparation profoundly impact the robustness, fairness and interpretability of the systems we build, and that our responsibility for the operation of these systems does not stop once they are deployed. Embracing ethics and legal compliance also has significant implications on how we teach data management, expanding our focus beyond the needs of the large enterprises, and into supporting responsible software engineering practice by citizen data scientists.

Communications of the ACM