Flood detection: automatic geotagging of crowdsourced videos

Author
Prof. Susanne Bleisch
Fachhochschule Nordwestschweiz

Interview with the principal investigator of this NRP75 project.

What was the aim of your project?

The objective was to develop and test methods and algorithms to select and prepare information from eyewitness videos to support different applications, for example, crisis management. A challenge was the assessment of the videos for relevance, and then to analyse them regarding content and correctly position and align them geographically. Developing appropriate presentation for visual processing ensures that the results can be efficiently and beneficially integrated into operational procedures.

What were the results?

Initial expert interviews explained crisis management operation and detailed research questions. Regarding relevance, it could be shown that reliably located video content is potentially relevant and that contextualising videos with other mapped domain data is beneficial.

Video classification algorithms are mostly trained on labelled data sets. To make them more robust to unseen videos, algorithms that perform unsupervised learning of intuitive physics and reason over object-centric decomposition using unlabelled videos were developed and tested. Unlike prior approaches, these methods learn in an unsupervised fashion directly from raw visual images to discover objects, parts, and their relations. They explicitly distinguish multiple levels of abstraction and improve over other models at modelling synthetic and real-world videos of human actions.

What further research did you do?

To locate video content more precisely, subparts of the visual localisation pipeline were investigated. Fine localisation was improved with image pose estimation based on a Structure-from-Motion approach relying on approximate position knowledge and reference images. Tests with different videos showed that the quality of pose estimation is influenced by the differences in viewpoints and the changes of appearance of the environment. The processing pipeline was then adapted and extended to improve robustness regarding the changes in environment. Changes in the viewpoint remain a challenge.

To contextualise video imagery with other domain data and considering the multi-granular nature of the events, visualisations and interactions allowing visual integration of spatial data that holds relevant information at several levels of scale were developed. Further, a multi-perspective interface for mentally linking street-level images and mapped data was designed and evaluated.

What are the main messages of the project?

  • Algorithms should learn more like humans: Approaches for learning object- and relation-centric representations from raw (unlabelled) videos are promising to achieve robust, interpretable machine learning models with strong generalization to different scenes.
  • Precise localisation of ‘random’ video or image data benefits from suitable references: Widespread availability of street level imagery has potential for the creation of services for visual localization tasks, which can help organization/government also in emergency situations.
  • Visualizations allow seeing information (but this is hindered easily): Integration of representations at different scales and from different perspectives in visualizations with appropriate interaction options support interpretation and understanding but currently only if the uncertainties are low enough.

Does your project have any scientific implications?

Access and collection of future data sets require some guidelines and potentially also concerted efforts. For the acquisition of imagery and video, specifically street level imagery, which are not ‘for viewing only’ purposes but could serve as (reference) information, it is important to capture, store and make accessible relevant meta information (such as position, camera type, viewing angle, etc.).

Does your project have any policy recommendations?

Real-world analysis of crowdsourced video for use in crisis management will not work on incidental videos uploaded on random platforms without clear guidelines and communication of purpose and use. A concerted effort may be required to define the requirements of the videos and the data collection should happen on a (trusted) national platform that is widely known. Literature has reported that people are keen to offer their help, for example, by uploading defined imagery to a trusted platform (e.g., in citizen science projects).

Big Data is a very vague term. Can you explain what it means to you?

Big Data has a range of meanings, but we interpret it to mean large amounts of data that require specific approaches as it is difficult/impossible to view, process, analyse, etc., the complete data set at once. This definition also implies that the ‘precise’ definition of big data changes with developments in hardware and software.

Collections of large (digital) video data are potentially big data. Our project is based on the fact that human-based search in or viewing of, even small, video data sets is very time-consuming. Thus, we aimed at defining, selecting, and presenting relevant video data for a specific purpose (i.e., crisis events). With the development of new types of algorithms in track A we got a step closer to automatically analysing video collections and selecting relevant videos for specific purposes.

One track of our project, concerned with the localisation of selected video data, dealt specifically with big reference data. Geographical data has a long tradition of being big data and today’s mobile image or laser scanning data capture abilities certainly create large amounts of data. The localisation process defines hyperdimensional image descriptions data that allow matching selected imagery (i.e., event imagery) to large collections of reference data. While a coarse position indication of the video might reduce the geographical search space, the matching still includes large descriptor data sets and this project also investigated solutions for the efficient storage and retrieval of those. Future developments for more robustness regarding different viewpoints will increase the requirements for efficient and effective definition, storage, and retrieval even more.

Visualising Big Data is a challenge and per definition (cf. above) impossible. If the screen space is sufficient to visualise it, it is no longer big data. However, this implies that some sort of selection happened before or on the fly (through interactions) and can also interactively be changed to fluidly navigate the data space. But the key challenge is the visualization of varying extents and granularities of data concurrently to support the cognition of links and overall insight into the data. In this regard, this project has designed and implemented improved forms of visualizations and purpose-built interaction opportunities that specifically allow visually analysing overview on demand as well as detail on demand to support understanding of data with varying spatial extents and granularities.

About the team

The ‘Eyewitness videos as an aid to crisis management’ project under NRP 75 was an interdisciplinary research collaboration between research groups at FHNW (Susanne Bleisch, Daria Hollenstein, Stephan Nebiker, Daniel Rettenmund, Severin Rhyner, and Ursula Kälin) and IDSIA (Aleksandar Stanić and Jürgen Schmidhuber).

About the project

Related links