A Brief Introduction to Structural Causal Models

In this post we are going to talk about a different kind of graphical model than in previous posts, known as a causal model. Previously I have written about probabilistic graphical models which are statistical models which learn from observations about the real world. But now we are going to consider causal models, where we learn from observations and interventions made during the process that we are modeling [1]. In this post we will introduce some basic ideas related to causal models and experiment with interventions on simple structural casual models.

Interventions and Counterfactuals

The major difference between causal and statistical models is that in causal models, we can intervene in the process we are modeling. Specifically, we can set the value of a random variable to a specific value, and see how that impacts the other random variables in the model [1].

Another way of reasoning about causal models is to consider counterfactuals, where we observe the outcome of a process given the value of a certain variable, and then we ask what would have been the outcome if we had set the value of a random variable to a different value. This is subtly different than interventions [1].

Note that we can think of different graphical models as fitting into a hierarchy, as follows [1]:

  • Statistical models: capable of handling observations
  • Causal models: capable of considering observations and interventions
  • Structural causal models: capable of handling observations, interventions and counterfactuals
  • Physical models: able to do everything a structural causal model can do, as well as giving insight into the real world.

Independent Mechanisms

A concept that we need to develop for causal models is the idea of independent mechanisms. Consider 2 random variables, A and B. In order to determine which variable is the cause, and which is the effect, we would need to intervene on each variable in turn and see if intervening in one affects the other. More specifically, let’s say we have a joint probability distribution [1]:

\[P(A, B) = P(A) P(B | A)\]

In this case, if A is the cause and B is the effect, then \(P(A)\) and \(P(B \| A)\) are independent mechanisms. In other words, if we changed P(A), we would not expect that the mechanism of \(P(B \| A)\) would change. But if B were the cause, then changing B would change the mechanism \(P(B \| A)\) [1].

This idea leads to the Principle of Independent Mechanisms which states that a causal generative process is made up of separate models that do not have any influence on one another. If the generative process describes a joint probability distribution, then all the conditional distributions of effects conditioned on causes do not influence the other conditional distributions [1]. This principle gives rise to 3 things that must be true for causal models [1]:

  • We should be able to affect one mechanism (i.e. intervene on one mechanism) without affecting any of the others.
  • One mechanism should not provide any information about other mechanisms. Similarly, the changes in one mechanism should not tell us anything about how other mechanisms might have changed.
  • It is possible to write down the conditional distributions (i.e. the mechanisms) as a set of random variables and noise variables, for example [1]:

\(C = N_C\)
\(E = f_E(C, N_E)\)

In this case, \(N_C\) and \(N_E\) are noise variables and \(f_E( \cdot)\) is some function. In this situation, \(N_C\) and \(N_E\) must be statistically independent with respect to each other for these two mechanisms to be independent.

(Simple) Structural Causal Models and Interventions

Building on the equations shown above, they represent a simple causal model with two variables that can be represented with a structural causal model (SCM). This representation includes the equations shown above as well as a causal graph that has nodes \(\{C, E\}\) and one directed edge \(C \rightarrow E\) [1].

We can intervene in this causal model by changing the causal mechanism for one of the random variables. There are two sub-categories of interventions based on this idea: hard and soft interventions. Hard interventions simply refer to setting the value of one of the random variables to a constant - in the example above, that would be like setting \(E = 3\). The SCM formed by this intervention is written as \(P^{\text{do}(E = 3)}\) [1].

Conversely, soft interventions refer to changing the function for one of the random variables, i.e. changing from \(E = f_E(C, N_E)\) to \(E = g(C, \tilde{N}_E)\). In this case, the associated SCM for this intervention is \(P^{\text{do}(E = g(C, \tilde{N}_E)}\) [1].

Let’s think about what we expect to happen if we intervene on C and E, respectively. If we intervene on the effect, E, then it should make sense that the cause mechanism would not change. Therefore we know that [1]:

\[P_C^{\text{do}(E = e)} = P_C \neq P_{C|E=e}\]

This expression says that intervening on the effect is not the same as conditioning the cause on the effect. (In other other words, intervening on the effect does not give the same probability distribution for the cause as we would get if we had information about the effect and computed the probability for the cause based on that knowledge.)

Now let’s intervene on the cause variable, C. If we do this, then we find that the probability of the effect after the intervention is equal to the probability of the event conditioned on the cause [1]:

\[P_E^{\text{do}(C=c)} = P_{E|C=c} \neq P_E\]

With that, I will close out this discussion on some of the basics of causal graphical models.


[1] Ravikumar, P. “Causal GMs.” 10-708: Probabilistic Graphical Models. 2021. Class notes.

Written on October 4, 2021