Big Data Technologies

Big data applications require big data technologies: hardware and software solutions able to handle massive datasets as well as analyse them reliably and efficiently. The research outcomes of NRP 75 presented in this chapter show how Swiss academic experts can contribute significantly to developing new technologies for big data applications and can help successfully deploy next-generation solutions for the necessary infrastructures and analytics.

The application of big data in a realworld setting faces profound technological challenges. One is the sheer volume of the data: by popular definition, big data exceeds the capabilities of existing data capture, storage, management and analytics technologies. Many current computing infrastructures will soon be outdated and need replacement. Big data calls for new means of information processing and data analysis. This is why fundamental research in big data infrastructure and analytics technologies is crucial. NRP 75 has strengthened Swiss basic research in this field. It has produced a dozen new approaches to developing the technology that underlies big data applications.

More efficient big data infrastructures

Big data requires high-performance infrastructure, namely the low-level processes that serve as the backbone for higher-level data analytics. This infrastructure comprises hardware and software.

Novel approaches for big data analytics

Analytics is the most visible component of big data applications, creating value from the data by extracting knowledge and insights that are valuable for users or customers.

Solution avenues

Current administrative processes regarding data access, sharing and processing in Switzerland can be streamlined, especially when these relate to public research. The use of data in applications can affect privacy adversely, but restricting data use in a blanket fashion has drawbacks, for instance making innovation prohibitively time consuming. Privacy has a cost that must also be considered.

Solution avenues comprise making privacy an inherent and possibly mandatory aspect of big data processing. Developers and users of big data applications must be informed about the various techniques that preserve privacy and their pros and cons. Ideally, they would have access to tools that could help them to optimise algorithms given the balance they want to strike between privacy, efficiency and quality of services. Digitisation of information requires careful analysis to ensure purposeful use of formats and metadata.

Metrics of scientific success for promotion and funding allocation should go beyond the usual publication and citation counts, to potentially include the impact of research outside academia (especially when the work is using open-source protocols). Scientists must have freedom and flexibility of research so that they can tailor their plans to make the most of rapidly evolving fields such as big data.

The human factor can be as important as access to technology, especially since the latter is often open-source and available. Supporting academic research not only enables advances in big data technology locally and internationally, it also trains the specialists that society needs. These experts will not only develop technologies but also bring understanding of the issues surrounding big data, such as technology availability, privacy, cybersecurity and participation of stakeholders. As such, they will contribute to public and private organisations’ strategic decisions on digitisation.

Key messages

NRP75 produced numerous worldclass research results, pursuing novel avenues to improve the infrastructure and analytics needed to exploit big data. While such fundamental research is intrinsically very challenging, there is a known route to success and it is ultimately more straightforward than developing applications. For example, limited access to data can sometimes be circumvented by using artificially generated datasets – whose known properties allow testing and tuning of the new systems. The research remains dedicated to the question of how fast the systems can process and analyse data and reach the expected result within a given margin of error. In other words, the research problems are well defined. However, they are embedded in a fast-moving environment with actors in industry and elsewhere pursing different goals.

The private-public research competition and cooperation

The intense international competition in big data technologies threatens nations’ digital autonomy, but also provides an opportunity for collaboration. Industry, particularly in the US and China, is making many of the advances in infrastructure and analytics for big data – presenting a good third of the work at top scientific conferences. It leads in the development of language models, image generation and hardware such as Tensor Processing Units, which are optimised to run neural networks. Private research and development is at least on a par with the best academic research worldwide.

Companies’ big data technology might appear universal, given their desire for wide adoption. A CPU or pre-processing algorithm may be essentially agnostic as to how they are used. But big data technologies are becoming increasingly specialised to best deal with the problem at hand, and in particular the kind of data involved – whether dynamic or static, homogeneous or heterogeneous, etc. This means that industry is also influencing the possible range of big data applications. It is therefore crucial for publicly-funded research to keep pace with industry if society is to have a voice in the future of digitisation.

Academic research remains essential for developing big data technologies, especially when addressing objectives that are important for society but less so for big tech companies, such as reducing energy consumption or ensuring privacy-by-design. In addition, public research can be bolder by pursuing high risk-high gain avenues. While industry often follows one-size-fits-all approaches, academic research has successfully developed a wider range of hardware and software for big data technology. These include programmable network switches, in-network analytics, new programming models for domain-specific devices, and algorithms based on formal logic instead of machine learning.

Academic research is often far ahead of private industry, the latter relying on the innovation of university spin-offs. But this gap is much smaller for some research topics in big data, so encouraging collaboration between academia and industry – particularly since the former needs the computing power, storage capacities and data access of the latter. Such collaborations are in principle “win-win”, with academia gaining from industry resources, real-world problems and a stiff challenge, while industry benefits from state-of-the-art research and more innovative ideas.

One dormant issue in academia is the lack of recognition given to researchers who develop applications, stimulate collaboration, and adopt open-source software. This can potentially deter world class public researchers from addressing concrete problems and collaborating with industry. This calls for more diverse career paths in public research, and metrics that go beyond traditional scientific publishing and funding successes.

The staffing issue

A major challenge in deploying big data is the scarcity of qualified personnel all along the value chain, from infrastructure technologies to applications, business integration and regulation. There is fierce competition for talent, with many of the brightest minds being hired by large multinationals, as well as mid-size and startup companies. Academic research is losing out as a result, struggling to attract the best scientists – even at PhD level. Universities therefore risk losing talented researchers when they collaborate with big tech companies. Rapid and frequent career changes, while bringing new perspectives and connections, are a problem for research projects.

On the other hand, the world-leading Swiss research on big data ensures that the many specialists needed by public and private organisations are educated and trained, and maintain good contacts with academia and industry. This makes Switzerland innovative and attractive to multinational companies and international organisations.

Getting the data

The second big challenge is the availability of large, high-quality datasets, which are essential for realistic evaluation of big data analytics and applications.

This problem will diminish as public and private organisations develop a data culture, but its resolution will require a sound strategy that ensures data are high quality, properly described with metadata, and protected by privacy-by-design practices. Developers and users of big data technologies must be familiar with the various techniques for preserving privacy and able to find the right balance between privacy, efficiency and quality of services.

Privacy and data protection raise many questions, such as whether the Swiss or European data protection regulation sets the appropriate boundaries in data management or how to protect privacy while encouraging innovation. The current ethical, approval and administrative processes framing the use of medical and scientific data in Switzerland are perceived as slow and complex, and could be streamlined and simplified. But this is a multi-faceted discussion which requires multi-disciplinary approaches, including the involvement of the social sciences.