Big Data reduction for predictive computational modeling
French National Research Agency (ANR)
Funder: French National Research Agency (ANR)Project code: ANR-19-CE46-0008
Funder Contribution: 711,703 EUR

The DataRedux project focuses on developing radically new methods for the reduction of the complexity of large networked datasets to feed effective and realistic data-driven models of spreading phenomena. Many rich datasets on actions and interactions of individuals have recently become available, commonly encoded as networked systems, arising from heterogeneous sources with details at different scales and resolutions, and potentially containing geographical and temporal information as well as metadata. These outstanding sources of information and knowledge fuel a wide spectrum of data-driven numerical simulations of dynamical processes. Data alone, however, even in huge amounts, do not easily transform into knowledge or predictive models. The rich and diverse information they contain raises crucial challenges concerning their analysis, representation and interpretation, the extraction of meaningful structures, and their integration into data-driven models. In this context, DataRedux puts forward an innovative framework to reduce networked data complexity while preserving its richness, by working at intermediate scales (“mesoscales”). Our objective is to reach a fundamental breakthrough in the theoretical understanding and representation of rich and complex networked datasets for use in predictive data-driven models. Our main novelty is to define network reduction techniques in relation with the dynamical processes occurring on the networks. To this aim, we will develop methods to go from data to information and knowledge at different scales in a human-accessible way by extracting structures from high-resolution, diverse and heterogeneous data. Our methodology will involve the identification of the most relevant subparts of time-resolved datasets while remapping the remaining parts of the system, the simultaneous structural-temporal representations of time-varying networks, the development of parsimonious data representations extracting meaningful structures at mesoscales (“mesostructures”), and the building of models of interactions that include mesostructures of various types. Our aim is to identify data aggregation methods at intermediate scales and new types of data representations that carry the richness of information of the original data, keeping their most relevant patterns and summarising less salient properties for their more manageable integration in data-driven models for decision making and actionable insights. The scientific program of DataRedux will optimally benefit from the diverse expertise of the participating teams to reach the objectives of the project. The project will last 48 months and is organised in six work packages: four scientific, one on dissemination, and one on management. It involves three teams with a leading position in their own field of research. The coordinator is the DANTE INRIA team, hosted by the Laboratoire de l'Informatique du Parallélisme and IXXI Complex System Institute at ENS Lyon, expert in exploration of massive enriched networked datasets on human behaviour, statistical methods and data-driven modelling of social contagion phenomena; the Statistical Physics and Complex Systems team from CNRS, CPT Marseille, with expertise on complex networks, temporal networks, spreading processes, dynamical processes; the Pierre Louis Institute of Epidemiology and Public Health from INSERM, with expertise on computational epidemiology, data-driven modelling and host dynamics. This proposal is an invited resubmission after being ranked 2nd in the waiting list of the AAPG ANR 2018. The new proposal has been revised to take into account the evaluation remarks and the results obtained by the partners since last year.

Data Management Plans