AUTOMATING AND DEMOCRATIZING DATA SCIENCE

A SYNTH RESEARCH CAMP

ERC Logo Advanced Grant

About the Event

Data analysis is a difficult process that requires a skilled data scientist. A typical analysis requires many different steps: Selecting the right subset of data, pre-processing the data into the right format (data-wrangling), determining the learning task, selecting the right algorithms, evaluating the result. The field of automated data science tries to democratize data analysis and make it more accessible to non-experts by automating these different steps as much as possible.

This event is organized by the ERC AdG project SYNTH, which has been devoted to the goal of automating and democratizing data science. The program spans three afternoons (February 2nd, 3rd and 4th). The first two days will feature invited keynote talks by Tijl De Bie, Holger Hoos, and Sumit Gulwani, on various aspects of automating data science. It will also feature several talks and demonstrations by the PI of SYNTH, Luc De Raedt, and team members on topics such as automating data-wrangling, learning constraints and inductive models (with probabilistic programs) to model data, as well as integrating these steps in one common framework to make predictions and find anomalies. The last half-day will consist of a hands-on workshop with the SYNTH software package as well as a poster session. We encourage all participants to submit posters of their recent work in the field of automating data science.

The event is also highly relevant to related projects the Leuven ML Lab is involved in, in particular, TAILOR Network of Excellence (WP 7 on Automated AI), the Grand Challenge on "AI-Driven Data Science" of Flanders AI Program, and the iBOF Project on "Automating Data Science: the Next Frontiers".

This research camp is of interest for PhD students and researchers whose research is related to automated data science. During the poster session they will also be able to present their work.

Free Registration - Registration is mandatory

Scope

  • Automating Data Wrangling
  • Inductive Models
  • Constraint Learning
  • Constrained Inference
  • Probabilistic Programming
  • Neuro-Symbolic AI
  • Automated Feature Construction
  • Automated Machine Learning
  • Anomaly Detection

Important Dates

15.12.2021 — Registration opens

31.01.2021 — Registration closes

Where

Virtual

When

2nd-4th of February 2022

Keynote Speakers

Tijl De Bie (Ghent University)
Abstract

Automating data exploration (3rd of February)

Research into the automation of machine learning has made great progress over the past decade, bringing the power of machine learning within reach of many software developers without the need for in-depth knowledge on the topic. Much of this success is due to the existence of well-defined objectives in machine learning: it is often fairly well understood what one wishes to optimize (typically some form of accuracy). Data exploration tasks, on the other hand, tend to be far less well-defined, and are therefore intrinsically harder to automate. In this talk I will discuss some possible avenues for approaching the (partial) automation of data exploration, along with some recent results.

Holger Hoos (RWTH Aachen University & Leiden University)
Abstract

Explorations into the crucible of AI: Learning and reasoning, flexibility and trust (2nd of February)

AI is poised to bring fundamental change to the way we live and work, and AI techniques have already begun to enable a new wave of progress in science and engineering. The nature of this revolution is often misunderstood. In this presentation, I will explain why the next generation of AI systems will benefit from progress in learning, reasoning and other areas of AI, and specifically, why learning is key to perception, adaptation, and flexibility, while reasoning is the key to verification, robustness, and trust. I will illustrate this using current work by my group on neural network verification, and I will highlight how this forms part of a broader effort towards the advancement and democratisation of AI under the banner of Automated AI (AutoAI), which generalises widely successful concepts in the area of Automated Machine Learning (AutoML).

Sumit Gulwani (Microsoft)
Abstract

Program Synthesis for Data Wrangling (2nd of February)

Data wrangling is a killer application for program synthesis. This is because of availability of input data that allows easy intent expression (by examples, natural language, or even implicit) and disambiguation (active learning). I will showcase a variety of techniques driven by advances in symbolic reasoning, pre-trained language models, and their combination.

Luc De Raedt (KU Leuven)
Abstract

On the automation of data science (2nd of February)

Inspired by recent successes towards automating highly complex jobs like automatic programming and scientific experimentation, I want to automate the task of the data scientist when developing intelligent systems. In this talk, I shall introduce some of the involved challenges and some possible approaches and tools for automating data science. More specifically, I shall discuss how automated data wrangling approaches can be used for pre-processing and how both predictive and descriptive models can in principle be combined to automatically complete spreadsheets and relational databases. I will argue that autocompleting spreadsheets is a simple yet highly challenging setting for the automation of data science. Special attention will be given towards the induction of constraints in spreadsheets and in an operations research context.

Abstract

From Probabilistic Logics to Neuro-Symbolic Artificial Intelligence (3rd of February)

A central challenge to contemporary AI is to integrate learning and reasoning. The integration of learning and reasoning has been studied for decades already in the fields of statistical relational artificial intelligence and probabilistic programming. StarAI has focussed on unifying logic and probability, the two key frameworks for reasoning, and has extended this probabilistic logics machine learning principles. I will argue that StarAI and Probabilistic Logics form an ideal basis for developing neuro-symbolic artificial intelligence techniques. Thus neuro-symbolic computation = StarAI + Neural Networks. Many parallels will be drawn between these two fields and will be illustrated using the Deep Probabilistic Logic Programming language

Hendrik Blockeel (KU Leuven)
Abstract

Learning multi-directional functional models using MERCS (3rd of February)

MERCS (Multi-directional ensembles of regression and classification trees) is a method for extracting functional relationships from data. Contrary to standard supervised learning, the input and output variables need not be identified in advance: any variable can serve as input or output at any time at prediction time. This makes MERCS models very versatile, while the models can still be learned very efficiently. In this talk, I will provide an overview of the basic ideas behind MERCS and illustrate their application potential, which includes instantaneous prediction on demand and explainable anomaly detection.

Stefano Teso (University of Trento)

Schedule - CET

Welcome Speech

Invited talk: Luc De Raedt (KU Leuven)

Break

Invited talk: Holger Hoos (Leiden University)

Break

Invited talk: Sumit Gulwani (Microsoft)

Demo: Gust Verbruggen (KU Leuven)

Opening the Second Day

Invited talk: Luc De Raedt (KU Leuven)

Break

Invited talk: Stefano Teso (University of Trento), Samuel Kolb (KU Leuven), and Mohit Kumar (KU Leuven)

Break

Invited talk: Tijl De Bie (Ghent University)

Invited talk: Hendrik Blockeel (KU Leuven)

SYNTH Workshop

Break

Poster Highlight

Round table & Closing | Remarks, Feedback, and comments on Automated Data Science

Poster Session

Poster Session

To allow everyone to showcase their work in connection to the topics related to Automated Data Science, part of the workshop will consist of a poster session. On the last day of the workshop, there will be a poster highlight session where each submission will be presented briefly with at most three slides.

Submission details

Submission deadline: 1st of February 2022

Submit by email to: adem.kikaj@kuleuven.be

Submission content:

  • PDF file of the poster
  • Title
  • List of authors
  • Name of the presenter
  • A short abstract
  • Pre-recorded presentation — One-minute video with at most three slides

Registration & Contact

Please register using the form below.

Anything we can help you with?