Data centres: efficient performance monitoring

Author
Prof. Lydia Y. Chen
Delft University of Technology

This project devised novel ways to analyse performance in cloud data centres, an important task in managing computing resources efficiently while minimising energy consumption.

The project advocated to do selective processing on the big data, by leveraging the spatial and temporal dependency, and reserve the computational resources to the critical data. Another reason for such selective data training is due to the amount of “dirty” data that is in the big data sets. Thus strategies to selectively choose informative and accurate data to train robust analytical models were derived. The results also confirmed that the insights of big data come at the expense of privacy, showing a strong trade-off between the data utility and the level of privacy preservation. To address those challenges, the following objectives were achieved:

Making big data processing lighter and faster: these issues were addressed via strategies of low bit representation, intelligent data subsampling, and hierarchical modelling that is specific to time series models.

Making the big data processing predictable stochastic models to predict the latency of big data applications, being simple data sorting or complex analytics were developed. Via such a model, one can then make a calculable trade-off between the model accuracy and model training time (and resources required).

Making big data processing privacy preserving: differential private algorithms that present the privacy leakage through the big data and its analysis were derived. Together with the latency models, one can further expand the criteria portfolio in designing big data analytics, i.e., accuracy, latency and privacy.
Making big data analytics distributed: various distributed and decentralised learning algorithms such that big data analytics can take place everywhere, more precisely on the premise where the data is collected were investigated.

The last work package on sharing the data centre traces via machine learning models has received attractions from Dutch national science foundation and the industry to commercialise the solution. A tabular data synthesiser such that proprietary data obtained by commercial companies can be shared with the public without worrying about the leakage of data privacy was developed. This development was unexpected and has opened up a new direction for the follow-up project, called tabular data synthesiser, which was funded by Dutch national science foundation to commercialise this idea.


About the project

Related links