Chain Graphs and LWF and AMP Graph Properties

Today we are going to introduce a more general class of graphical models, called chain graphs, that encompass both directed and undirected graphical models. They are useful for looking at connections between groups of random variables, as we will see below. We will start by describing some of the basic features of chain graphs, and then talk about conditional independencies and other properties in two contexts: LWF and AMP.

Basic Properties of Chain Graphs

Chain graphs are allowed to have both directed and undirected edges, as long as they do not have a cycle that contains a directed edge [1]. This rule gives two important properties to chain graphs [1]:

1. We can divide the graph’s nodes, \(V\), into a disjoint partition (i.e. no overlap between the subsets), \(\{V_j\}_{j=1}^k\), where the subgraph for each subset, \(G[V_j]\), has no directed edges.

2. For any pair of nodes, (X, Y), there is a directed edge from X to Y only if X is in an earlier subgraph than Y.

In this way, we have groups of nodes (chain components) in a chain graph that have flow along directed edges from one group to the next, hence forming a chain. This representation encompasses both DGMs - where the chain components are individual nodes - and UGMs - where the entire graph is one chain component [1].

Conditional Random Fields

One type of chain graph is a conditional random field (CRF), which can be used to represent conditional probability distribution. In a CRF, we say that it represents a conditional probability as a product of factors as follows [1]:

\[P(Y | X) = \frac{1}{Z(X)} \Pi_{C \in \mathcal{C}(G) | C \notin \mathcal{C}(\mathcal{X})} \phi_C((Y, X)_C)\]

In this formulation, we assume that our inputs are represented by X and our targets by Y. We want to predict Y given X (this is a discriminative model) and the undirected edges of the graph G include those that connect nodes within Y, and nodes from X to Y, but we do not include edges between nodes in X. This paradigm of CRFs allows us to handle many different kinds of inputs, which may have their own complex relationships among themselves, because we focus instead on how those inputs relate to outputs [1].

LWF Markov Properties and C-Separation

In an earlier post we spent some time defining the Markov properties, separation and D-separation for UGMs and DGMs, because these definitions allowed us to find conditional independencies in the associated models. The idea of Markov properties for chain graphs, as well as c-separation, are a more generalized form of these concepts, which encompass the corresponding definitions for both UGMs and DGMs [1].

Let’s start by talking about the Lauritzen, Wermuth and Frydenberg (LWF) Markov properties for chain graphs. According to this interpretation, the pairwise Markov properties for a chain graph G are [1]:

\[\mathcal{I}_p(G) = \{X \perp Y | \text{NON-DESC}_X - \{X, Y\} : (X, Y) \notin E(G), Y \in \text{NON-DESC}_X\}\]

That is, X is conditionally independent of Y given all of the non-descendents of X, excluding X and Y themselves, and where there is no edge directly connecting X and Y. This definition still holds for both UGMs and DGMs. Let’s now define the local Markov properties [1]:

\[\mathcal{I}_l (G) = \{X \perp \text{NON-DESC}_X - \text{BOUNDARY}_X | \text{BOUNDARY}_X \}\]

For DGMs, the boundary of X are its parents, and for UGMs the boundary of X are all of its neighbors [1].

Finally, in order to define global Markov properties, I need to first define c-separation. If we have a disjoint partition of nodes in a graph, \(U = X \cup Y \cup Z\), then X is c-separated from Y given Z if X is separated from Y given Z in a moralized subgraph. Specifically, this moralized subgraph is the induced subgraph over the nodes in U, i.e. \(\mathcal{M}[G[U \cup \text{ANCES}_U^+]]\). The term \(\text{ANCES}_U^+\) is an upward closure over the nodes in U. More precisely, \(\text{ANCES}_U^+\) contains all the nodes in U as well as the nodes on U’s boundary [1].

For DGMs, this upward closure is equivalent to the ancestors of the nodes in the partition, so this definition simplifies to D-separation. Similarly, for UGMs the upward closure contains all the other nodes in the graph, this concept of c-separation also applies to UGMs [1]. In full, then, the global LWF Markov properties for a chain graph can be written as [1]:

\[\mathcal{I}(G) = \{X \perp Y | Z \text{ s.t. } \text{ CSEP }(X, Y | Z) \forall \text{ disjoint } X, Y, Z \in V\}\]

Alternative Markov Properties (AMP)

So far the LWF Markov properties have seemed like a natural way to define conditional independencies for the graphs we’ve seen. However, there is another way to parameterize graphical models which, in turn, gives rise to another way to determine conditional independencies, called Alternative Markov Properties (AMP) [1]. To see how this works, let’s consider a really simple 4-node graph that has two chain components as shown in Figure 1.

Fig 1
Figure 1 - Inspired by [1]

In LWF terms, we can see that [1]:

\(1 \perp 4 \| \{2, 3\}\)
\(2 \perp 3 \| \{1, 4\}\)

But if we express the relationship between these 4 random variables another way, then we can look at things a little differently. Specifically, let’s apply a structural equation modeling (SEM)*1 approach to describing this graph [1]:

\(X_1 = \epsilon_1\)
\(X_2 = \epsilon_2\)
\(X_3 = b_{31}X_1 + \epsilon_3\)
\(X_4 = b_{42}X_2 + \epsilon_4\)

Here the noise vectors are represented by \(\epsilon_i\), and we can assume that \((\epsilon_1, \epsilon_2) \perp (\epsilon_3, \epsilon_4)\) but that it is possible that \(\epsilon_1\) is related to \(\epsilon_2\) since they are in the same chain component [1]. Now that I have the same 4 random variables captured in this SEM representation, it should make sense that we can write the conditional independencies slightly differently [1]:

\(4 \perp 1 \| 2\)
\(3 \perp 2 \| 1\)

The alternative Markov properties also give rise to another notion of separation, called AMP-Separation. The definition of AMP-separation for a chain graph, G, is that if we consider 3 disjoint sets X, Y and Z then AMP-separation holds if X is separated from Y given Z in the undirected graph which is [1]:

\(\mathcal{A}[G[U \cup \text{DIR-ANCES}_U] \cup UG[G[U \cup \text{UG-CONNECT}_U]]]\)$

In order to understand this expression, we need to define some additional terms. First of all, \(\mathcal{A}[G]\) is termed the augmentation of a chain graph, G. This means that all of the flags and double flags in a chain graph have been augmented. A flag in a chain graph, G, is a group of 3 nodes that have any of the following sets of edges in G [1]:

\(X \rightarrow Y - Z\)
\(X - Y \leftarrow Z\)
\(X \rightarrow Y \leftarrow Z\)

Similarly, a double-flag in a chain graph, G, is a set of 4 nodes that have any of the following sets of edges in G [1]:

\(X \rightarrow Y - Z\)
\(U \rightarrow Z - Y\)

To augment a flag, we add the edge (X-Z), and to augment a double-flag, we add the edges (X, Z), (Y, U), (X, U). Therefore \(\mathcal{A}[G]\) is the fully augmented chain graph [1].

There are 3 more terms that we need to define. First, we have a directed ancestral set, \(\text{DIR-ANCES}_U\), that is the set of all nodes, v, that have a directed path from v to a node in U. And secondly, we have an equivalent term for undirected paths, that is an undirected connected set \(\text{UG-CONNECT}_U\), which is the set of all nodes v such that there is an undirected path from v to some node in U. Finally, we have the term \(UG[G]\) which is the chain graph, G, with all of the directed edges removed [1].

That’s about all I have to share today. Next time I expect to write more about causality with graphical models. Stay tuned!

Footnotes

*1 Structural equation modeling is a term that is a little broad but is generally used in the social sciences to model a system with some structure (i.e. a graph) that captures causal relationships between variables and includes some statistics to characterize these variables [2].

References

[1] Ravikumar, P. “Chain Graphical Models.” 10-708: Probabilistic Graphical Models. 2021. Class notes.

[2] “Structural equation modeling.” Wikipedia. https://en.wikipedia.org/wiki/Structural_equation_modeling Visited 03 Oct 2021.

Written on October 3, 2021