Workshop of the Michael Stifel Center Jena about Machine Learning applied to different areas of science.
Machine-learning methods such as deep learning are frequently used in data-intensive particle physics and astronomy to extract scientific information from data. A number of illustrative expamples will be shown, including the classification of event topologies in particle physics experiments, the morphological and spectral classification of astronomical objects, time-series analysis, or the automated detection of radio interference patterns in radio antenna arrays. Causal inferences require the knowledge of the uncertainties of the measurement which can be obtained by means of calibration or simulation, within the limitations provided by the stability and reproducability of the output from neural networks. The next generation of experiments and observatories will rely on intelligent data pipelines using fast implementations of machine-learning methods to reduce the data volume to levels that can be stored in long-term archives, and yet to ease the exploration of the full discovery space of the data.
Optimization is the working horse of machine learning. Our framework (GENO) allows specifying optimization problems in an easy to read modeling language. From such a description a solver is generated automatically, that can solve the specified class of optimization problems. The generated solvers are highly efficient and often outperform hand-written, specialized solvers for machine learning. We will demonstrate the integration of GENO into a typical machine learning workflow on an example from Physics (Raman spectroscopy).
The heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In disciplines dealing with complex dynamical systems, such as the Earth system, replicated real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated data opens up the use of novel data-driven causal inference methods beyond the commonly adopted correlation techniques. The key idea shared by several approaches is that, while the truism “correlation does not imply causation” holds, causal relations among variables can be estimated from their joint probability distribution given some assumptions. Causal inference is indeed a rapidly growing field with enormous potential to help answer long-standing scientific questions. Unfortunately, many methods are still little known and therefore rarely adopted in Earth system sciences. In this talk I will give an overview over causal inference methods and identify key tasks and major challenges where causal methods have the potential to greatly advance the state-of-the-art. Several methods will be illustrated by `success' examples where causal inference methods have already led to novel insights and I will close with an outlook of this relatively new and exiting field.
The ability to detect and attribute anomalous behavior in multivariate environmental time series is crucial. These events are signals of changes in the underlying dynamical system and understanding them can help to pave the way for the development of intervention strategy. The availability of high temporal resolution data along with the powerful computing platforms further enhance the capacity of data-driven methods in capturing the complex relationships between the variables of the underlying dynamical system. In this talk, we present some newly developed methods for the detection as well as the attribution of changes in multivariate environmental time series. We discuss the challenges and point out future directions.
Gravitational wave data analysis solves a specific problem in signal analysis for time-series data with noise. Recently (starting 2017), deep learning methods have been applied to different aspects of the problem, in particular noise classification, signal detection, denoising, and parameter estimation. We discuss briefly how certain convolutional and recurrent networks work in this context. Results from first experiments with such techniques on mock signals are presented.
In many applications one is interested in a decomposition of a given matrix into a sparse and a low-rank component. The talk takes a closer look on latent variable graphical models. For these the interaction parameter matrix of a multivariate joint probability distribution is decomposed, where the sparse component corresponds to direct interactions, and the low-rank component depicts spurious indirect interactions due to latent variables. In practice, the models can be learned using convex optimization. The talk concludes with an outlook on further applications of sparse + low-rank decompositions such as robust principal component analysis, multi-task learning and hyperspectral image denoising.
Many measurements require solving an inverse problem which is usually large (e.g. in imaging/deconvolution) and often ill-posed. Solving such problems usually requires running an iterative gradient-based optimization. Researchers in those fields need a simple way to specify their inverse modelling problem which is then successively solved. To this aim an inverse modelling (IM) toolbox was constructed in Python based on Google's Tensorflow framework with the aim to be easy to use, versatile, fast and scalable. Compared to directly constructing models in Tensorflow, the IM toolbox supports changing the meaning of variables after their definition, which allows to modify their domain by introducing extra boundary conditions such as their positivity or number of dimensions. The toolbox also supports a range of canned loss functions and regularizers. The toolbox will be presented along with a few of its applications.
The Earth is a complex dynamic networked system. Machine learning, i.e. derivation of computational models from data, has already made important contributions to predict and understand components of the Earth system, specifically in climate, remote sensing and environmental sciences. For instance, classifications of land cover types, prediction of land-atmosphere and ocean-atmosphere exchange, or detection of extreme events have greatly benefited from these approaches. Such data-driven information has already changed how Earth System models are evaluated and further developed. However, many studies have not yet sufficiently addressed and exploited dynamic aspects of systems, such as memory effects for prediction and effects of spatial context, e.g. for classification and change detection. In particular new developments in deep learning offer great potential to overcome these limitations. Most promising near-future applications include nowcasting and short-term forecasting applications, as well as anomaly detection and classification based on spatial and temporal context information. A longer-term vision includes data-driven seasonal forecasting, modeling of long-range spatial connections across multiple time-scales, modeling dynamics where spatial context plays an important role (e.g. fires), and for uncovering yet unknown (tele)connections between variables. One key challenge and opportunity will be to integrate physical modeling approaches with machine learning into hybrid modeling approaches, which will combine physical consistency and machine learning versatility.