Abstract

Inspired by recent successes towards automating highly complex jobs like programming and scientific experimentation, the ultimate goal of this project is to automate the task of the data scientist when developing intelligent systems, which is to extract knowledge from data in the form of models. More specifically, this project wants to develop the foundations of a theory and methodology for automatically synthesising inductive data models. An inductive data model (IDM) consists of:

  1. a data model (DM) that specifies an adequate data structure for the dataset (just like a database), and
  2. a set of inductive models (IMs), that is, a set of patterns and models that have been discovered in the data.

While the DM can be used to retrieve information about the dataset and to answer questions about specific data points, the IMs can be used to make predictions, propose values for missing data, find inconsistencies and redundancies, etc. The task addressed in this project is to automatically synthesise such IMs from past data and to use these to support the user when making decisions. It will be assumed that the data set consists of a set of tables, that the end-user interacts with the IDM via a visual interface, and the data scientist via a unifying IDM language offering a number of core IMs and learning algorithms.

The key challenges to be tackled in SYNTH are:

  1. the synthesis system must ”learn the learning task”, that is, it should identify the right learning tasks and learn appropriate IMs for each of these;
  2. the system may need to restructure the dataset before IM synthesis can start; and
  3. a unifying IDM language for a set of core patterns and models must be developed.

The approach will be implemented in an open source software and evaluated on two challenging application areas: rostering and sports analytics.

Call identifier: ERC-ADG-2015 
Project number: 694980
Duration: September 1, 2016 -  August 31, 2021
Budget: around € 2,500,000