Fairness-Aware Instrumentation of Preprocessing Pipelines for Machine Learning


Surfacing and mitigating bias in ML pipelines is a complex topic, with a dire need to provide system-level support to data scientists. Humans should be empowered to debug these pipelines, in order to control for bias and to improve data quality and representativeness. We propose fair-DAGs, an open-source library that extracts directed acyclic graph (DAG) representations of the data flow in preprocessing pipelines for ML. The library subsequently instruments the pipelines with tracing and visualization code to capture changes in data distributions and identify distortions with respect to protected group membership as the data travels through the pipeline. We illustrate the utility of fair-DAGs with experiments on publicly available ML pipelines.

Human-In-the-Loop Data Analytics workshop at ACM SIGMOD