Large and complex data with a dynamic component of space and/or time abound in many fields of study (Dabo-Niang et al. (2004) in geophysics; Bayle, Monestiez and Nerini (2014) on oceanographic data; Crambes et al. (2009) on energy consumption data) in particular in the description of systems, atmospheric, hydrological, oceanological, where the study of relationships between variables consisting of high dimensional vectors and/or functional components, is of prime importance for the understanding of the functioning of these natural systems. For a very high dimensional problem, dimensional reduction is a common approach. In general, to solve this dimensional problem, many multivariate regression methods using high-dimensional predictors consider dimension as a nuisance parameter and use low-density conditions, mainly for convenience (Izenman, 2008).
Data functional analysis (DFA)
Functional data analysis (FDA) can be an alternative to such multivariate modelling by transforming very high dimensional data into functional data, i.e. data objects, such as curves, shapes, images or a more complex mathematical object, thought of as smooth realisations of a stochastic process, which is an infinite dimensional data object. Rather than using correlation models between predictors, all data from multidimensional variables observed for several dependent spatial units can be analysed by ADF methods by taking advantage of the high dimensionality of the predictors. Basically, ADF objects correspond to the realisation of a stochastic process, usually assumed to be smooth on a continuum and lying in a metric, semi-metric, Hilbert or Banach space.
In this project, we are trying to model and predict high-dimensional ocean data using ADF for better management of water and fisheries resources. The main objective of the project will be to determine the key environmental parameter(s) in the spatial structuring of marine organisms in three dimensions i.e. the pelagic ecosystem. Indeed, the team has an original set of oceanological data acquired in the framework of the tripartite AWA project (IRD-BMBF www.awa.ird.fr). These data, of a functional and spatial nature (scanfish, CTD probe, satellite, acoustic), collected along the coast of Senegal and Mauritania in the upwelling ecosystem (upwelling promoting the productivity of the ecosystem) of the current large marine ecosystem of the Canary Islands (Auger et al 2016), using a direct approach method, with the help of on-board observation systems, e.g. multi-frequency acoustic, sound, etc. g. multi-frequency acoustics, sonar and multibeam echosounder, on board research vessels such as the Thalassa (Ifremer) and the Antea (IRD) coupled with original in situ environmental observations of physical (e.g. temperature, salinity), chemical and biological parameters in three dimensions, made available following their cleaning extraction and pre-analysis. This project aims to develop innovative and practical methods for predicting the evolution of fisheries resources subject to variations in abiotic factors. To achieve this, we consider functional data analysis (FDA) (Ramsay and Silverman, 2005), a statistical framework where the data is a curve (Horváth and Kokoszka, 2012). In this framework, the different variables (e.g. temperature, salinity, Sv, etc.) measured at different depth points are seen as curves (functions) representing all the variability of the data studied. Thus the implementation of a fully functional regression method taking into account the spatial aspect will allow to better quantify the impact of environmental factors but also to predict the evolution of these resources in its globality but not in a punctual way. We also have at our disposal a physico-chemical and microbiological database from an AWATOX campaign that was carried out on the Dakar peninsula. Contrary to the data from the AWA project, these were measured on few and relatively distant sites. The same functional methods will be applied and compared to the classical statistical methods.
This thesis project is part of an applied research framework and is divided into two stages. The first step is to propose a spatio-functional regression model. After validation, we will be able to apply it to environmental data sets. The work plan of our approach will be as follows:
- Describe and study the spatial structuring of pelagic marine organisms in three dimensions.
- Conduct exploratory spatial functional analyses (factorial analyses, classification,) in order to Identify differences (or similarities) between sampling points in terms of abiotic factors and pollutant types.
- Implement functional non-parametric modelling of overall sediment toxicity as a function of chemical and microbiological parameters.
- Implement spatial and functional modelling of exploited fish populations, planktonic communities as a function of environmental factors.
- Make spatial-functional forecasts; comparisons will be made with usual forecasting methods.
- Perform functional tests for rupture detection.