BioSODA: An intuitive search function for bioinformatics databases

BioSODA: An intuitive search engine for bioinformatics databases

Author
Ana Claudia Sima
Zurich University of Applied Sciences, University of Lausanne and SIB Swiss Institute of Bioinformatics

Interview with one of the researcher of the NRP75 project “BioSODA: An intuitive search engine for bioinformatics databases”.

What can you tell us about the status of your project?

The project is advancing very well and we are already finding synergies with new projects and therefore new potential uses cases of Bio-SODA, the semantic search engine developed within the scope of this project.

What about first results?

We are happy to have made publicly available the Bio-Query template search interface, which enables users to explore integrated data across 3 different domains of bioinformatics (gene expression, orthology and protein data). It was previously not possible to query the sources jointly. However, today any researcher can formulate federated queries, for example, across the OMA orthology database and the Bgee gene expression database, for which a SPARQL endpoint was released officially as a result of the work in this project, available at https://bgee.org/sparql. The results were published in the Database journal in 2019 [1]. We have also drafted a hands-on guide for querying evolutionary relationships in SPARQL [2].

Currently, we are finalising the work on the natural language search engine Bio-SODA, which will enable intuitive exploration of data stored in knowledge graphs. Furthermore, Bio-SODA is already finding new potential applications within the EU project INODE [3].

Keyword “technology transfer”: In your opinion, who could be possible users of your project? Who could benefit from this?

Although our initial scope, defined within the NRP 75 project, has been users of bioinformatics databases, our approach in designing Bio-SODA has been focused from the beginning on generality. More precisely, we wanted the system to have the potential to reach a larger audience and to be applied on more datasets, even beyond the scope of the current project.

We are happy today that Bio-SODA is already finding new applications and new users within the INODE EU-project [3], where the system is used in order to explore new datasets in natural language. One concrete example where Bio-SODA is finding a new application is the exploration of the EU projects knowledge graph, with further use cases within INODE including an astrophysics dataset, as well as a knowledge graph related to cancer research data.

Furthermore, we plan to make the code publicly available, such that other users interested in exploring knowledge graphs in natural language, can directly download the Bio-SODA application and use it in order to ask questions on their local data.

Big data is a very vague term. Can you explain to us what Big Data means to you?

In the context of the Bio-SODA project, “Big Data” had a somewhat special meaning for us, as our main challenge within the project was not necessarily dealing with very large volumes of homogeneous data – which could be one possible interpretation of the term. In contrast, the main challenge we have seen, and therefore our interpretation of Big Data, has been the integration of a large number of medium-sized, heterogeneous sources. In this sense, Big Data refers to the added-value brought by the sum of its parts: when combined, many small data sources, different in structure and content, have the potential to bring forward new insights that were not visible at the level of any of the individual sources alone.

[1] Sima, A. C., Mendes de Farias, T., Zbinden, E., Anisimova, M., Gil, M., Stockinger, H., … & Dessimoz, C. (2019). Enabling semantic queries across federated bioinformatics databases. Database2019.

[2] Sima, A. C., Dessimoz, C., Stockinger, K., Zahn-Zabal, M., & de Farias, T. M. (2019). A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL. Under Review for F1000Research8(1822), 1822.

[3] EU Project INODE (Intelligent Open Data Exploration), http://www.inode-project.eu/

Link to the Bio-Query Template Search Interface over Bio Databases: http://biosoda.expasy.org/

Link to the Bgee SPARQL endpoint: https://bgee.org/sparql

About the project

Related links