Aufnahme der Sonne während einer Eruption (Quelle: SDO/AIA)

Solar eruptions: predicting geomagnetic storms

Authors
Prof. Svyatoslav Voloshynovskiy, Prof. Samuel Krucker and Prof. Martin Melchior
Université de Genève and FHNW

Interview with the principal investigators of this NRP75 project.

What was the aim of your project?

Solar activity has a continuous impact on Earth’s geomagnetic field that encompasses the Earth’s surface and the near-Earth space environment. Thus it can affect our daily lives and other fields of study: long distance radio communication can be disturbed or disrupted, satellite electronics can be damaged or destroyed, transpolar flight trajectories have to be rerouted due to increased radiation doses and communication problems, the ozone layer can be weakened for several months, and power grids can be disturbed or disrupted completely for several hours.

The objective of this project was to gain a better understanding of the physics of the sun and to develop methods for predicting solar eruptions. We used the huge data archive compiled by IRIS (Interface Region Imaging Spectrograph), NASA’s latest solar satellite, and wanted to create machine learning algorithms to evaluate the data for spatial and temporal patterns. For this project, we had 30 TB of spectral and image data at our disposal that were not fully examined with machine learning methods and also not at this large scale.

Results of your project “Automatic analysis of solar eruptions”?

According to the main objective of the project to elucidate the physics underlying solar flares and to develop capabilities to predict them, we have developed a number of methods to address this objective. The developed methods were applied to the real data produced by the IRIS NASA mission. The main obtained results are reflected in 7 published papers and presented at 11 international conferences and workshops.

Along this study we have addressed the following main research questions: (1) Identification of typical Mg II Flare spectra using machine learning, (2) Exploration of mutual information between IRIS spectral lines, (3) Real-time flare prediction based on distinctions between flaring and non-flaring active regions spectra, (4) Solar flares detection on IRIS data using DCT-Tensor-Net, (5) Investigation of solar activity classification based on compressed Mg II spectra and (6) Information bottleneck classification in extremely distributed system.

First results indicate that IRIS spectral data could provide a useful enrichment of the standard magnetic dataset, as we managed to predict a solar flare half an hour before it occurred. The community is extremely excited about these results and further research is underway.

What are the main messages of the project?

  • The projects in Big Data should be multidisciplinary and include the experts from the domain where the Big Data originate from, machine learning and high-performance computing.
  • In view of the lack of labels in Big Data applications, the domain experts should carefully validate the results produced by the developed machine learning tools.
  • A new generation of machine learning techniques based on unsupervised or semi-supervised learning should be further developed and mastered.

Does your project have any scientific implications?

The main scientific implications for practice and science are:

  • We have shown that high resolution spectral data from the ultraviolet regime contains a high potential for a deeper understanding and a possible prediction of solar flares and solar activity in general.
  • The presence of labelled data is highly unlikely in many Big Data applications. One important scientific implication of the project is the development of unsupervised machine learning tools for automatic data clustering and analysis of statistical relationships. We think that these techniques are of high significance for both science and practice.
  • One more implication is a possibility to perform the reliable classification of complex physical phenomena on specially designed compressed data that leads to the considerable simplification of training complexity and requirements to computational infrastructures. Furthermore, such a compression might be moved directly on data-sensors and data acquisition devices that will in addition drastically reduce the communication burden in Big Data applications. Finally, the developed techniques might be of great interest for privacy preserving applications where the utility attributes can be encoded into the compressed representations while the privacy sensitive attributes will be compressed and removed.

We are convinced that the scientific findings and technical outcomes of this project might be of great interest for many interdisciplinary projects facing similar challenges related to Big Data.

Big Data is a very vague term. Can you explain to us what Big Data means to you?

We see three principal dimensions that make this project Big Data. Firstly in large data volume, our project data comprise tens of TB, a quantity that requires special storage and some efforts to move from one location to another. Secondly in large dimensionality, the spectra in our data are represented by several hundred bins that are each represented by their own dimension. Thirdly in the number of samples, even though our data contains only a limited number of observations, it nevertheless contains a few billion individual spectra – the objects of interest to most of our studies. Finally in the number of modalities, our data are both represented with time series of spectral lines and two-dimensional images describing the same phenomena. Therefore, the data used in our project is a typical example of a Big Data scenario in scientific applications.

About the project

Related links