Thursday, 19 March 2015

Why we cannot build a general method for algorithmic error correction

This post is part of this paper.

A few years ago, I was working as GIS programmer and administrator in a company that was mainly producing digital maps for navigators. At that time, we had to map the road network in big areas and cities from scratch and also update older maps in other areas. One of the issues that you have to deal in projects of this kind is the errors that occur during the mapping procedures. When there are 10 or more people that map a road network of 20.000 road segments it is unavoidable that some (few or quit a lot) of the roads will be mapped incorrectly for various reasons. For instance, a bridge may be accidentally mapped as a normal road or some road may be not characterized as unpaved while it is.
The only way to deal with this situation is to define some specifications for the mapped road network and then build some software that will detect the cases where your map is not compliant to the specifications. You may find the description of some methods of building such software in my paper (link). Then some experienced person has to look into the detected errors and correct them.
At that time I had a big disagreement with my employer as he believed that we could build software that could correct our maps without human intervention. He thought that as we know the specifications that the network should have and we also know the cases where the specifications are violated we could also correct them algorithmically. Besides him and his opinion, I have read some papers published in conferences in which their authors support more or less the same idea while in other papers in journals it is supported that this problem may be solved only partially and with some caution.
As I will show next such a problem is in generally undecidable by computers and it may be solved only in simple instances. I always believed that this should be common knowledge.
In fact the algorithmic correction of maps is a special case of a more general problem, the algorithmic correction of software bugs. In both cases you have some specification that may be written in some formal way, say logic. Then you may detect some errors or bugs and you need to manipulate the parameters and the structure of your program or dataset so as to meet your specifications.
Moreover we may model both problems as graphs. As it is known from software technology, a piece of software may be modeled as a graph. The nodes of the graph are the functions and the other units that consist the software and the edges model the ways that the units interact. The nodes and edges may be represent really complex software components or simpler components like databases with spatial information as in road networks. Studying the more general problem we decide the simpler problem as well. So in order to study this issue you have to answer one question.
Since we can build software that detects errors can we build software that algorithmically corrects these errors?
An empirical answer to this question would be “in general, no”, it is very difficult to build an algorithmic process that would take into account all the necessary parameters and lead to the desired result by correcting all the instances of errors in a piece of software or in a map. Let’s define the problem and prove its undecidability. On the proof, I use the Turing Machine (TM) as a model of general purpose computer and define the problem of algorithmic correction of software errors as an equivalent computational problem of algorithmic correction of the input string of a Turing Machine.
Let M be a Turing Machine that accepts input w. If w is build under certain specifications M prints a message of acceptance and then halts. In case w does not comply to specifications M might not halt or have unpredicted behavior. Now let N be another Turing Machine that examines any given input w. If w is incorrect according to specifications and the description of M, then N corrects w so M can run w properly and then halt.


INPUT-CORRECTIONTM = {<M, N, w> | M and N are TM, N modifies w according to the description of M so that M halts on input of modified w}

In order to prove the undecidability of this problem it is necessary to define another undecidable problem the HALT problem. Undecidability implies that it cannot be decided by a Turing Machine and an algorithm that solves it for any given input w cannot be built.

HALTTM = {< M, w> | M is a TM and M halts on input w}

Theorem. The INPUT-CORRECTION problem is undecidable.

Proof: Turing Machine N modifies any input w, according to the description of M, so M can run w and halt. If w is correct N prints “YES”, M runs w produces an output and halts. If w is incorrect and M cannot halt while running w, N corrects w and prints “YES” then M runs w produces an output and halts.
Let H be a machine that simulates the operation of both M and N, then H halts on any given input and the HALT problem is decidable for H and any given input w. This is a contradiction since the HALT problem is undecidable.

Another approach

The approach of the proof might look difficult to understand and especially its connection with the correction of errors in spatial networks. Let’s look into it from a more practical perspective.
The specifications of every network define the valid spatial relationships of the network elements as well as the valid attributes of every element. Attributes and spatial relationships cannot be examined independently since the validation of both of them ensures network consistency. Let A be the set of all the possible valid combinations among the valid spatial relationships and the valid attributes. Let B be the set of all the possible combinations of spatial relationships and attributes that are considered invalid according to specifications.
Let f be an algorithmic function that may correct the errors in a given network. Then f accepts as input one element of B and outputs one element of A. This means that each element of B must correspond exactly to one element of A; f must be 1-1 function or many-to-one (n-1). But this can happen only to very special and simple cases, it cannot be applied in general. If an element of B corresponds to many elements of A then f may not be defined.
On the same way we may define sets A and B for any other software structure. An 1-1 or n-1 function among software with bugs and error free software may be found only in very simple instances.


A more intuitive example that concerns spatial networks.


Let N be a non-planar network of linear features. Let a and b be two elements of N where the edge of a crosses the middle of the edge of b. Suppose that the person who draw a and b neglected to apply on them the necessary parameters that denote their structure and state. We may consider the following possible cases.
  • The edge of a is on ground level and the edge of b overpasses a, edge b represents a bridge.
  • The edge of a is on ground level and the edge of b underpasses a, edge b represents a tunnel.
  • Both edges are not on ground level and edge b overpasses edge a, multilevel traffic junction.
  • Both edges are in ground level, a and b represent a crossroad. Both edges must split on the intersection point and form a proper crossroad.
There is no way how we can build an algorithm that will decide which of the four cases corresponds to the given instance, so as to correct the error.
If we consider a special case of network instance in which we may find an 1-1 function between sets A and B, we could perform algorithmic correction of a specific error case. As it is easy to see this error case will be very simple. For instance, if we consider a network where each element that participates in a specific computable spatial relationship must be characterized by an attribute of a single value. Then an algorithmic process that assigns this value to the attributes of the appropriate elements may be build. Nevertheless this would be of minor importance in the complex task of error correction.
As a conclusion, I have to say that important corrections of detected errors in maps and in general in computer software have to be manual and of course by experienced personnel.