Recent Publications

All Publications

(2025). Towards Cross-Modal Error Detection with Tables and Images. Workshop on Unifying Data Curation Frameworks Across Domains (DataWorld) at ICML.

PDF

(2025). mlidea: Interactively Improving ML Data Preparation Code via 'Shadow Pipelines'. International Conference on Very Large Databases (VLDB, demo).

Video

(2025). scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data. International Conference on Machine Learning (ICML), spotlight.

PDF

(2025). Towards Automated Task-Aware Data Validation. Workshop on Data Management for End-to-End Machine Learning (DEEM) at SIGMOD.

PDF Code

(2025). Navigating Data Errors in Machine Learning Pipelines: Identify, Debug, and Learn. ACM SIGMOD (tutorial).

PDF Code Slides

Team

Faculty, Postdocs & Staff
Prof. Dr. Sebastian Schelter Dr. Arnab Phani Celia Bohnhardt-Schneider
PhD Students & Guests
Hao Chen Olga Ovcharenko Pierre Lubitzsch
Zeyu Zhang
(University of Amsterdam)
Shubha Guha
(University of Amsterdam)
Till Doehmen
(Motherduck)
Yichun Wang
(University of Amsterdam)
Stefan Grafberger
(Snowflake)
Master Students
Aynaz Abdollahzadeh
(University of Amsterdam)
Leonardo Dominici
(University of Amsterdam)

Alumni (name, role and first employment)

Prof. Dr.-Ing. Sebastian Schelter

Sebastian Schelter is a Full Professor at the Berlin Institute on the Foundations of Learning and Data (BIFOLD) and Technische Universität Berlin. His research is focused on the intersection of data management and machine learning with the goal to foster the responsible management of data and to democratise data science technologies.

The research of his group is accompanied by efficient and scalable open source implementations, many of which are applied in real world use cases, for example in the Amazon Web Services cloud and in large European e-commerce platforms.

In the past, he has been an assistant professor at the University of Amsterdam, a faculty fellow at New York University, a senior applied scientist at Amazon Research and a research intern at Twitter and IBM Almaden in California. His research contributions have been recognized with an ACM SIGMOD Systems Award, an ACM SIGMOD Best Demo Runner Up Award, and a Best Paper Runner Up Award from the Table Representation Learning workshop at NeurIPS.

Scientific Service
  • Editorial duties: Associate Editor for PVLDB Volume 15, Action Editor for the Journal of Data-Centric Machine Learning Research (DMLR), Action Editor for the open source track of the Journal of Machine Learning Research (JMLR) 2022-2025, Guest editor for the IEEE Data Engineering Bulletin
  • Organisation: Founder and co-organiser (until 2020) of the workshop series on “Data Management for End-to- End Machine Learning (DEEM)” at SIGMOD, workshop chair EDBT 2026, co-chair industry track of EDBT 2022, web chair of SIGMOD 2025, co-chair BOSS workshop at VLDB in 2016, Co-organiser of the “Dutch Data Systems Design Seminar” series with CWI Amsterdam
  • Program Committee: SIGMOD 2017 & 2019-2026, VLDB 2021, ICDE 2018-2021 & 2023-2024, NeurIPS'25, EDBT 2017 & 2021, CIKM 2020, PhD Symposium at VLDB 2021, DEEM workshop at SIGMOD 2021-2024, aiDM workshop at SIGMOD 2019, LSRS workshop at RecSys 2013-2015, AIDB workshop at VLDB 2020, DBML workshop at ICDE 2021,2024,2025, TRL workshop at NeurIPS 2022-2025, Provenance Week 2020
  • Awards: ACM SIGMOD Systems Award 2023, ACM SIGMOD Best Demo Runner Up Award 2023, Best Paper Runner Up Award from the Table Representation Learning workshop at NeurIPS
  • Keynotes: Workshop on Online Recommender Systems and User Modeling at RecSys'20, Workshop on Data Management for End-to-End Machine Learning at SIGMOD'21, Data Centric AI Workshop from ETH Zuerich/Stanford 2021, Workshop on Quality in Databases at VLDB'24
  • Panelist: Systems for ML at VLDB 2021, PhD symposium at ICDE 2021, Data management challenges for LLM-powered solutions at DEEM@SIGMOD'23, Panel on Open Science und AI at the Weizenbaum Institute 2025
  • Reviewer for Grant Proposals: Open Competition ENW (Dutch Research Council NWO), Binational Science Foundation (United States - Israel)
Completed PhD dissertations as advisor
  • Olivier Sprangers, Efficient and accurate forecasting in large-scale settings, University of Amsterdam, 2024
    (with Maarten de Rijke)
  • Mozhdeh Ariannezhad, User-oriented recommender systems in retail, University of Amsterdam, 2023
    (with Maarten de Rijke)
Completed PhD dissertations as a committee member
  • Andra Ionescu, Feature discovery for data-centric AI, TU Delft, 2025
  • Gerardo Vitagliano, Modeling the structure of tabular files for data preparation, HPI Potsdam, 2024
  • Madelon Hulsebos, Table representation learning, University of Amsterdam, 2024
  • Bojan Karlaš, Data systems for managing and debugging machine learning workflows, ETH Zürich, 2023
  • Cedric Renggli, Building data-centric systems for machine learning development and operations, ETH Zürich, 2023
  • Amir Pouya Aghasadeghi, Generating and querying temporal property graphs, New York University, 2022
  • Ke Yang, Fairness, diversity, and interpretability in ranking, New York University, 2021
Past employments
Professional Memberships
  • Apache Software Foundation (emeritus)
  • Association for Computing Machinery
  • Electronic Frontier Foundation
  • Deutscher Hochschulverband

Teaching

Summer semester 2025

We offer the following courses during the summer semester 2025:

For taking one of our courses, please sign up on the corresponding course page on ISIS and attend the first lecture, where we will discuss the details for the formal registration.

Theses

If you are interested in writing a bachelor and master thesis with us, please check out our list of available topics at theses.tu-berlin.de.

Job Openings

No current openings.

Contact

Email: sekr[at]deem[dot]tu-berlin[dot]de

Technische Universität Berlin
FG Management of Data Science Processes
Sekr. TEL 9-2
Ernst-Reuter Platz 7
10587 Berlin
Germany

Responsibility under the German Press Law §55 Sect. 2 RStV:
Prof. Dr.-Ing. Sebastian Schelter