Evidence-based policy: uncovering causality from data

Author
Prof. Michael Lechner
University of St. Gallen

Interview with the principal investigator of this NRP75 project.

What was the aim of your project “Economic impact analysis using Big Data”?

In recent years, micro-econometric research has made great advances in the development of methodological tools for answering causal questions. These methods – e.g. for the assessment of economic policy measures – have been successfully employed. Unfortunately, these tools are largely unsuitable for analysing complex data volumes and do not exploit latest advances in machine learning.

The goal of our project “Economic impact analysis using Big Data” was to combine the micro-econometric methods of causal analysis (impact measurement) and the statistical forecasting models of machine learning to be able to use large-volume data sets in a robust way to substantially improve the impact analysis of decisions taken by economic policymakers and private sector actors.

What are the results?

In the first part of our project “Economic impact analysis using Big Data”, we evaluated existing methods of causal machine learning by simulation methods, and subsequently extended these methods, and developed new ones.

The main goal of most of these extensions and new developments, based on double machine learning as well as causal forests, were to obtain one consistent set of methods that allows to estimate relevant causal parameters on different aggregation levels in a coherent way, as well as to perform optimal policy analysis. The latter is based on allocating the ‘policy’ or treatment to some population in order to maximise some objective function, like profits of a firm or some well-defined welfare of a policy maker.

Your project used the new methods to investigate applications?

The new methods were applied to several economic questions, with one question proving particularly productive, i.e. the evaluation of active labour market policies. The plan of the proposal was to apply causal machine learning methods to the IZA Evaluation Data, which includes 17,400 observations. This is the very minimum of observations that is required for the methods of interest. During the project, we received access to much more interesting, meaning larger, datasets that resulted in three papers:

Knaus, Lechner & Strittmatter (2021a, Journal of Human Resources) can be considered to be the pilot study of the NRP75. It marked the first application of causal machine learning for the research group, but also the first published study using causal machine learning methods in the economic policy evaluation realm (33 citations on Google Scholar by 6 May 2021). The paper uses a dataset of about 85,000 job seekers in Switzerland and documents that most of them do not benefit from a job search training program. This is in line with the previous literature on this type of programs and is known as the lock-in effect stating that job seekers decrease their effort to search for jobs while participating in such programs. However, using already existing (at the time of writing the paper) causal machine learning methods, the paper shows that effects are quite heterogeneous and that a small subgroup benefits from this program. Such insights can be used to improve the targeting of the training programs.

In Knaus, Lechner & Strittmatter (2021b, The Econometrics Journal), we investigated the performance of several causal machine learning for the analysis of effect heterogeneity. While this paper marked an important contribution to the literature in general, it was particularly important for the NRP75 project because it sharpened the insights of the research group about the shortcomings in the literature that motivated the methodological extensions and informed the subsequent analyses.

Cockx, Lechner & Bollens (2020) uses a dataset of about 70,000 job seekers in Flanders and investigates the employment effects of three different training programs. It documents that training programs in Flanders have mostly positive long-run effects on employment. However, the new methods reveal that the largest benefits can be observed for recent immigrants. The paper uses these insights to provide data-driven policy recommendations that could substantially improve the effectiveness of the active labour market policy in Flanders.

Goller et al. (2021) is like the latter paper in the use of the modified causal forest, a method that has been proposed as part of our NFP75 project. However, it exploits a German dataset of about 300,000 long-term unemployed persons. Again, the mostly positive effects of the training programs are found to vary with characteristics of the long-term unemployed persons. However, the analysis reveals that the current assignment mechanism of assigning people to training programs does not leverage what is best for whom. Thus, the paper concludes with different data-driven proposals.

Other applications?

The project also investigated the following applications: What are the effects of environmental regulations on offer prices of used cars? Is there any favouritism of referees in soccer games towards teams coming from the same Swiss language region? And in addition, a variety of new applications that use the new methods of causal machine learning were conducted during the NRP75 project. That were questions of the effects of practicing music on child development, the effects of being sporty on the success in online dating platforms, the effects of news sentiment about earnings announcements on stock market indicators, as well on questions concerning the so-called ‘resource curse’ in developing countries.

What are the main messages of the project?

  • Well-designed machine learning methods can substantially improve the usefulness of empirical studies for decision making. Just substituting machine learning methods into prediction components of established estimators may however even reduce their usefulness.
  • Uncovering heterogeneity by causal machine learning can lead to very valuable insights for decision makers in the private and the public sector.
  • Application of the new methods is straightforward and should become routine in empirical work.

Does your project have any scientific implications?

The research project documents and increases the value added of the recent causal machine learning literature for empirical research concerned with causal inference in economics and beyond. In many dimensions this project allowed to introduce the applied literature to the concepts of causal machine learning and to showcase the possibilities of these methods on how to estimate standard and new parameters of interest. The methodological papers increased the set of available methods and the knowledge about their performance. The applications provide blueprints for future studies asking related research questions.

The causal machine learning methods increase the transparency of the research process by delegating as much as possible to data-driven methods tying the hands of the researchers in consciously or unconsciously searching for pleasant results (reduces data snooping). Additionally, they allow to extract more detailed results from the same data. This combination will improve the way how and what we learn from data in empirical research in the next years as these methods find their way into more and more applications. The papers of this project might act as accelerators of this process.

Does your project have any policy recommendations?

The policy recommendations are twofold.

First, on a high-level the project illustrates the big potential for the integration of machine learning to improve empirical research in many different domains. This potential should be leveraged in any policy relevant application.

Second, on a practical level the results of the active labour market policy evaluations clearly show that the effectiveness of the assignment can be improved in at least three countries. The use of data-driven policy making seems thus to have a big potential to improve decision making. This project focused mostly on the flexible estimation of average and heterogeneous effects, which can be seen as an intermediate step towards data-driven policy making. This direction is now continued in a follow-up project called “Data-driven decision making for labour market policy”, which is part of NRP77. The NRP75 can thus be considered as a steppingstone towards a better understanding of data-driven policy making.

About the project

Related links