Pantelis Rodis: Spatial Analysis

Showing posts with label Spatial Analysis. Show all posts

Thursday, 19 March 2015

Why we cannot build a general method for algorithmic error correction

This post is part of this paper.

A few years ago, I was working as GIS programmer and administrator in a company that was mainly producing digital maps for navigators. At that time, we had to map the road network in big areas and cities from scratch and also update older maps in other areas. One of the issues that you have to deal in projects of this kind is the errors that occur during the mapping procedures. When there are 10 or more people that map a road network of 20.000 road segments it is unavoidable that some (few or quit a lot) of the roads will be mapped incorrectly for various reasons. For instance, a bridge may be accidentally mapped as a normal road or some road may be not characterized as unpaved while it is.
The only way to deal with this situation is to define some specifications for the mapped road network and then build some software that will detect the cases where your map is not compliant to the specifications. You may find the description of some methods of building such software in my paper (link). Then some experienced person has to look into the detected errors and correct them.
At that time I had a big disagreement with my employer as he believed that we could build software that could correct our maps without human intervention. He thought that as we know the specifications that the network should have and we also know the cases where the specifications are violated we could also correct them algorithmically. Besides him and his opinion, I have read some papers published in conferences in which their authors support more or less the same idea while in other papers in journals it is supported that this problem may be solved only partially and with some caution.
As I will show next such a problem is in generally undecidable by computers and it may be solved only in simple instances. I always believed that this should be common knowledge.
In fact the algorithmic correction of maps is a special case of a more general problem, the algorithmic correction of software bugs. In both cases you have some specification that may be written in some formal way, say logic. Then you may detect some errors or bugs and you need to manipulate the parameters and the structure of your program or dataset so as to meet your specifications.
Moreover we may model both problems as graphs. As it is known from software technology, a piece of software may be modeled as a graph. The nodes of the graph are the functions and the other units that consist the software and the edges model the ways that the units interact. The nodes and edges may be represent really complex software components or simpler components like databases with spatial information as in road networks. Studying the more general problem we decide the simpler problem as well. So in order to study this issue you have to answer one question.
Since we can build software that detects errors can we build software that algorithmically corrects these errors?
An empirical answer to this question would be “in general, no”, it is very difficult to build an algorithmic process that would take into account all the necessary parameters and lead to the desired result by correcting all the instances of errors in a piece of software or in a map. Let’s define the problem and prove its undecidability. On the proof, I use the Turing Machine (TM) as a model of general purpose computer and define the problem of algorithmic correction of software errors as an equivalent computational problem of algorithmic correction of the input string of a Turing Machine.
Let M be a Turing Machine that accepts input w. If w is build under certain specifications M prints a message of acceptance and then halts. In case w does not comply to specifications M might not halt or have unpredicted behavior. Now let N be another Turing Machine that examines any given input w. If w is incorrect according to specifications and the description of M, then N corrects w so M can run w properly and then halt.

Formally:

INPUT-CORRECTIONTM = {<M, N, w> | M and N are TM, N modifies w according to the description of M so that M halts on input of modified w}

In order to prove the undecidability of this problem it is necessary to define another undecidable problem the HALT problem. Undecidability implies that it cannot be decided by a Turing Machine and an algorithm that solves it for any given input w cannot be built.

HALTTM = {< M, w> | M is a TM and M halts on input w}

Theorem. The INPUT-CORRECTION problem is undecidable.

Proof: Turing Machine N modifies any input w, according to the description of M, so M can run w and halt. If w is correct N prints “YES”, M runs w produces an output and halts. If w is incorrect and M cannot halt while running w, N corrects w and prints “YES” then M runs w produces an output and halts.
Let H be a machine that simulates the operation of both M and N, then H halts on any given input and the HALT problem is decidable for H and any given input w. This is a contradiction since the HALT problem is undecidable.

Another approach

The approach of the proof might look difficult to understand and especially its connection with the correction of errors in spatial networks. Let’s look into it from a more practical perspective.
The specifications of every network define the valid spatial relationships of the network elements as well as the valid attributes of every element. Attributes and spatial relationships cannot be examined independently since the validation of both of them ensures network consistency. Let A be the set of all the possible valid combinations among the valid spatial relationships and the valid attributes. Let B be the set of all the possible combinations of spatial relationships and attributes that are considered invalid according to specifications.
Let f be an algorithmic function that may correct the errors in a given network. Then f accepts as input one element of B and outputs one element of A. This means that each element of B must correspond exactly to one element of A; f must be 1-1 function or many-to-one (n-1). But this can happen only to very special and simple cases, it cannot be applied in general. If an element of B corresponds to many elements of A then f may not be defined.
On the same way we may define sets A and B for any other software structure. An 1-1 or n-1 function among software with bugs and error free software may be found only in very simple instances.

A more intuitive example that concerns spatial networks.

Let N be a non-planar network of linear features. Let a and b be two elements of N where the edge of a crosses the middle of the edge of b. Suppose that the person who draw a and b neglected to apply on them the necessary parameters that denote their structure and state. We may consider the following possible cases.

The edge of a is on ground level and the edge of b overpasses a, edge b represents a bridge.
The edge of a is on ground level and the edge of b underpasses a, edge b represents a tunnel.
Both edges are not on ground level and edge b overpasses edge a, multilevel traffic junction.
Both edges are in ground level, a and b represent a crossroad. Both edges must split on the intersection point and form a proper crossroad.

There is no way how we can build an algorithm that will decide which of the four cases corresponds to the given instance, so as to correct the error.
If we consider a special case of network instance in which we may find an 1-1 function between sets A and B, we could perform algorithmic correction of a specific error case. As it is easy to see this error case will be very simple. For instance, if we consider a network where each element that participates in a specific computable spatial relationship must be characterized by an attribute of a single value. Then an algorithmic process that assigns this value to the attributes of the appropriate elements may be build. Nevertheless this would be of minor importance in the complex task of error correction.
As a conclusion, I have to say that important corrections of detected errors in maps and in general in computer software have to be manual and of course by experienced personnel.

Sunday, 15 June 2014

Building spatial algorithms with ModelBuilder of ArcGIS

ModelBuilder of ArcGIS is the most useful tool (or set of tools) I have ever used as a GIS programmer and administrator. The main reason of course is the (very) large set of tools that comes with it. We may not overlook the simplicity in the designing of custom models; the users draw graphically a chart flow of the procedure they want to run and they run it. Moreover, if you get familiar with ModelBuilder you end up building models that you use everyday automating large parts of your daily routine work.

Most people use ModelBuilder as an application for building procedures that simply combine tools from the ArcGIS Toolkit mainly for the manipulation of spatial datasets. This is also how most tutorials describe the use of ModelBuilder, but this alone is not very interesting. Let us see through examples how a user may use ModelBuilder in order to implement spatial algorithms. A theoretical and application-independent justification of the following examples may be found in my paper here. The paper is too technical and theoretical and some people have difficulties reading it so let us see some examples of spatial algorithms implementations in ModelBuilder along with a graphical presentation of each one.

For the next examples we call N a dataset of linear features (arcs) that represent some directed network. The features are categorized in 2 or more categories and the categorization is stored in the attributed of each arcs. N is stored in a geodatabase with spatial indexes but without built topology or any network dataset relationships. One advantage of the next examples is that the user is not require to build topology or other relationships which makes the procedure more efficient.

Problem I: Locate the nodes in N with degree equal to one.

It is a simple problem; in each nodes of degree one (also called free node) there is only one arc attached. These nodes are the ends of dead-end streets in road networks or the terminals in telecommunication networks.

Algorithm I:

1. Create point feature class Point_1 with the endpoints of the arcs in N. Use FeatureVerticesToPoints with the option BOTH_ENDS.

2. Execute Spatial Join, set Point_1 as Join Feature and Target Featureas well. Output the result in class Point_2.

3. Select features from Point_2 where field Join_Count = 1. Output the result in class RESULT; this is the output of the algorithm.

The model runs fast as it performs spatial join among point features and it avoids computing intersections among arcs which is more complex. I have tested it in ArcGIS 10 in a network of 1,000,000 arcs. It ran in less than 5 minutes in a computer of medium capabilities.

Problem II: Locate in N the intersections of arcs that belong in categories 1 and 2.

This is a little more interesting. It is useful when you need to check the connectivity of the network as well as illegal connections. In case category 1 regards highway roads and category 2 unpaved roads, it is critical to know if there are intersections among the features of these two categories in your dataset

Algorithm II:

1. Create classes Cat_1 and Cat_2 using Select tool with the features of categories 1 and 2.

2. Create point feature classes Point_1 and Point_2 with the endpoints of the arcs of the two categories as in Algorithm I.

3. Execute Spatial Join, set Point_1 as Join feature class and Point_2 as Target. Output the result in class JOIN_RESULT.

4. Select the features from JOIN_RESULT where Join_Count > 0. Output the result in class RESULT; this is the output of the algorithm.

Problem III: Locate the points of no flow in N.

Locate the nodes in N which stand as barriers for network flow. In these nodes, network traffic is either only directed to them or only directed form them. See an instance in the following image.

Algorithm III:

1. Create point feature class Point_1 with the starting endpoints of the arcs. Use FeatureVerticesToPoints with the option END.

2. Create point feature class Point_2 with the ending endpoints of the arcs. Use FeatureVerticesToPoints with the option START.

3. Execute Spatial Join, set Point_1 as Join feature class and Point_2 as Target. Output the result in class JOIN_RESULT.

4. Select the features from JOIN_RESULT where Join_Count = 0. Output the result in class RESULT; this is the output of the algorithm.

The main idea in Algorithm 3 is that in nodes with normal flow, there is at least an ending point of an arc that leads the traffic to the node and at least a starting point of an arc that takes the traffic from it.

Thursday, 1 May 2014

How to reduce a spatial problem to an easier one

The reduction of the computation of a problem A to the computation of an easier problem B is a traditional method of building a more efficient algorithm for the resolution of A, especially useful when A is computationally hard. This method is efficient when it comes to spatial analysis and spatial query problems.

Spatial query problems regard the spatial relationships of some set of objects S in a dataset E with some other set of objects S’ in dataset E’. Of course E and E’ may be the same set and in this case we talk about detection of spatial autocorellation or of overlapping features in E. One reason why such problems are sometimes hard to solve, especially for large datasets, is the nature of the spatial objects. More complex spatial objects require more complex spatial queries. The complexity of a spatial object is proportional to its dimensions and geometry.

In a paper of mine that you may find in arxiv.org here I discuss a method that I have developed to work around this issue and it has been very useful to me in GIS applications. I call this method Geometrical Reduction. It is based in the notion of mapping reduction and it regards the reduction of the computation of a spatial query in dataset E to the computation of a spatial query in dataset J where the objects in J have simpler geometry than those in E. When applied in GIS, it is required to produce dataset J analyzing the geometry of the objects in E.

For instance, if the objects in E are polygons then J may consist of lines produced from the edges of the polygons. Then the problem of detection of overlapping edges in E is reduced to the problem of intersecting lines in J.

Next, let us define which reductions are valid between object classes.

Geometrical Reduction of object classes

A computable function g reduces geometrically n-dimensional object class A to d-dimensional object class B, A ≤_G B, if for every object o in A <=> object g(o) in B.

Then Geometrical Reduction is defined as follows.

Geometrical Reduction of spatial relationships

Spatial relationship S(A, C) is geometrically reducible to spatial relationship P(B, D), written S(A, C) ≤_G P(B, D), if A ≤_G B and C ≤_G D and if S(A, C) is satisfied when P(B, D) is satisfied and P(B, D) is satisfied when S(A, C) is satisfied.

Geometrical reduction theorem

The problem R of deciding if a spatial relationship P(B, D) is valid in dataset E is mapping reducible to problem V of deciding if a spatial relationship S(A, C) is valid in dataset J, if P(B, D) ≤_G S(A, C) and if there is a computable function h that decides V.

In my paper I study the problem of detection of connection errors in networks of linear features and especially in hierarchical networks and I use geometrical reduction in order to build efficient algorithms even for large datasets.

Pages