Friday, 3 April 2015

On the computability of causal inference

Causation (or else causality) refers to the cause-effect relation that may exist between two events or phenomena. The causal inference problem regards the determination of causation between events.
Let’s consider the case where there is an interaction between two events which is not affected by any other factor or event. The two events are observable and measurable and for each one we may record a set of data that describes it. Then the causal inference problem is equivalent to the existence of a computable relationship between the two data sets that describe the events.
As I show next, the general case of the problem of causal inference even with the above assumptions is undecidable and this has important implications in science. The undecidability of a problem means that there are some instances of it that may not be proven true or false, while other instances may be decidable.

Let’s formulate the general case of the causal inference problem as a decision problem. Let A and B be two sets of data.

CAUSAL = {A, B | There is computable function f(A) = B}
It is enough to use the formulation of CAUSAL in order to describe causal inference problem. If there is a cause-effect relationship between A and B then there is function f so that f(A) = B and if CAUSAL was proven decidable then f would be computable in any case.
Let me restrict this discussion in data sets and phenomena that their measurements are as accurate as it is demanded. The meaning of this assumption is that the data of the problem is not affected by factors that would make their computation inaccurate like the principle of uncertainty. Also, this implies that the elements of both sets are computable. The earlier assumption that the two events interact and are not affected by any other factor simplifies the definition of the problem, as it implies that if there is computable function f between A and B this describes a cause-effect relationship and it is not coincidental.
In the general case the problem is not trivial as A and B may be the products of complex or even chaotic systems and natural procedures. Here the term “trivial” is used in the sense of mathematical logic.

Theorem. The causal inference problem is logically and computationally undecidable.

Proof of logical undecidability. If the problem is decidable then we may construct a system using which we may prove every instance of the problem. We may encode using Gödel numbering and arithmetic each element a and b of A and B in a formal system of logic P so that for every sentence S of the form a -> b, S or -S is provable so that eventually the sentences A -> B or -A -> B will be provable. The problem is complex so P will be sufficiently strong.
This consideration results a contradiction, as proven by Kurt Gödel (Gödel, 1931) in such a system there are true sentences that are not provable. The problem is logically undecidable.

Proof of computable undecidability. If CAUSAL is decidable, then there is Turing Machine M which on input of a string e that contains the descriptions of A and B always halts and outputs “YES” or ”NO”.
As the problem is assumed decidable and M is computable, we may construct Turing Machine N that when inputting string w that contains the description of M and e verifies that M decides e; N accepts w in any case. There are no restrictions in the nature of A and B or their encoding and representation in e, hence w can be any string. Also there are no restrictions in the description of M and N, they can be any Turing Machines.
As a result, the language ATM = {N, w | N is a Turing Machine and accepts w} is decidable. This is a contradiction. As it is known from literature that ATM is undecidable; see (Sipser 2006, p.179). Hence CAUSAL is also computationally undecidable.
Another way to reason about the computational undecidability of CAUSAL is to consider Turing Machine N that simulates the phenomenon described by A. We input in N a string w that describes set A, then N produces a string that describes set B and then halts. If CAUSAL was decidable then N would be computable. The string that describes A may be any string as there is no restriction to the event that A describes; so N halts on any given input. Also N may be any Turing Machine.
Still these assumptions lead to the same contradiction; if CAUSAL was computable so would be language ATM.

Laplace’s demon

The undecidability of causal inference has an effect on the famous idea of the demon of Laplace.

We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes.

Pierre Simon Laplace, A Philosophical Essay on Probabilities 1820
source: Wikipedia, also in (Laplace, 1951, p.4)

The intellect that described in the text was later named “Laplace’s demon”. There has been criticism against this idea and some published proofs against it. The proofs presented above are also against this idea. Such an intellect or demon should be able to decide CAUSAL which is undecidable.

The extend of undecidability

In every undecidable problem there are some instances that are decidable and some that are not. To put it another way, each undecidable problem is undecidable to some extent. Logic and the theory of computation have not been able to determine this extent for every problem. For some problems like the halting problem this extent seems large; only very simple instances of the halting problem seem computable.
In causal inference, things must be different. Science and especially Artificial Intelligence has managed to build successful methods in computing quite a lot of instances of the problem. The importance in these methods is that they may be applied in the analysis of real world phenomena. In my opinion, some of the reasons of successful predictions in real world phenomena are bounded inputs, relaxation, approximation and strong hypotheses.

Bounded inputs. Real world phenomena usually don’t take arbitrary large or small values. The possible values in the parameters of a phenomenon are bounded. This results that the problem range is reduced making computations easier.
Relaxation. Relaxing some parameters in the definition of a problem might lead to the computation of an easier problem. In some cases the conclusions from the definition of the easier problem are as valuable as the solution of the initial problem is.
Approximation. It is quite often that the solutions of hard problems may be approximated, which is valuable when precision is not very important.
Strong hypotheses. Sometimes scientists have strong evidence and strong intuition about a certain universal truth that cannot be proven or it is at least very difficult to be proved. In these cases researcher state it as a hypothesis and accept it as an unproven true fact and they may work on it overriding the obstacle of finding a solid proof for it. The biggest part of this post is based on such a hypothesis; it is the Church - Turing Thesis (Turing, 1936).


Gödel, Kurt, 1931. Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme, I. and On formally undecidable propositions of Principia Mathematica and related systems I in Solomon Feferman, ed., 1986. Kurt Gödel Collected works. Vol. I. Oxford University Press, pp 144-195.

Laplace, Pierre Simon, 1951. A Philosophical Essay on Probabilities, translated into English from the original French 6th ed. by Truscott,F.W. and Emory,F.L., Dover Publications (New York, 1951)

Sipser, Michael, 2006. Introduction to the Theory of Computation, Second Edition. Thomson Course Technology, USA.

Turing, Alan M., 1936. On Computable numbers, with an application to the entscheidungsproblem. Proceedings of the London Mathematical Society. 42, 230–265.