One of greatest limitations of machine learning systems is that they are based solely on a statistical interpretation of data. In order to provide responses to the questions of why? and what if? a causal model is required.

When it comes to designing machine learning systems, one of the most popular models is the design of a three-layer causal hierarchy which combines graphical modeling, and counterfactual and interventional logic.

Such a model consists of the following layers:

  • The lowest (first) layer is called Association’ which entails purely statistical relationships defined by raw data.
  • Level two, Intervention’ involves reasoning about the effects of actions or interventions. Usually, reinforcement learning systems operate at this level. To give a specific example, one needs to answer the following question of “What will happen if we double the price of a product being sold?”. Such questions cannot be answered based on data alone as there occurs a change in customers’ behavior in reaction to the new pricing.

The erroneous assumption may be as if a predictive model could be easily developed based on data displaying the effects of previous price increases (on the same or similar items). Yet, unless exactly the same market conditions corresponding to the last time the price reached double its current value existed, it would not be known in advance how customers would react to such changes.

  • The highest level of causal reasoning is calledCounterfactuals’ and addresses what if? questions requiring retrospective reasoning. This is similar to a sequence-to-sequence generative model. In order to see what happens to the output, the start of a sequence can be ‘replayed along with changes in data values.

Such a layered hierarchy explains why machine learning systems, based only on associations, are prevented from reasoning about causal explanations.

While interventional questions cannot be answered based on purely observational information, counterfactual questions cannot be answered from purely interventional information. This model enables the formal expression of causal questions by codifying existing knowledge in both diagrammatic and algebraic forms to leverage data to predict the answers. Moreover, the theory warns us when the state of existing knowledge or the available data are insufficient to answer our questions; and then suggests additional sources of knowledge or data to make the questions answerable.

Such an ‘inference engine’ takes as input assumptions (in the form of a graphical model), data, and a query. To give a specific example, the following graph shows that X (e.g. taking a drug) has a causal effect on Y (e.g. recovery), and a third variable Z (e.g. gender) affects both X and Y.

The model also enables the developers of machine learning systems the use of causal reasoning in the following ways:

  1. Testing: By providing the main connection between causes and probabilities, the model tells what pattern of dependencies to expect in the data for any given pattern of paths in the model.
  2. The control of confounding: Confounding refers to the presence of latent variables which are the unobserved causes of two or more observed variables. This also relates to the effects of policy interventions whenever feasible, taking a failure exit when the assumptions don’t permit predictions.
  3. Counterfactuals: As every structural equation model determines the truth of every counterfactual sentence, the model helps to determine analytically if the probability of the sentence is estimable from experimental or observational studies.
  4. Mediation analysis: This relates to asking queries such as “What fraction of the effect of X on Y is mediated by variable Z?”. The model helps the discovery of intermediate mechanisms through which causes are transmitted into effects.
  5. Selection bias: The problem of adapting to changes in environmental conditions cannot be handled at the level of association This model can be used both for re-adjusting learned policies to circumvent environmental changes and for controlling bias due to non-representative samples.

Current traditional approaches might be enriched by such approaches to machine learning. Given the transformative impact that causal modeling has had on the social and medical sciences, a similar transformation might occur through machine learning technology, once it is enriched with the guidance of a model of the data-generating process. This symbiosis to yield systems that communicate with users in their native language of cause and effect might soon become the dominant paradigm of next-generation AI leveraging this capability.




Please enter your comment!
Please enter your name here