In this post I talk about system security and I examine it as a computational problem. The motivation of thinking about this came from a very interesting post by Dick Lipton and Ken Regan that was published on their blog a few days ago.
The problem of developing a secure system regards the development of a system that will maintain a set of information F which will only be accessible by authorised users. As a result, any unauthorised user will be excluded from accessing F. A good way to study the system is to model it using Turing thesis.
Any user has to provide some input like username and password in order to get clearance and usually F is represented by some text or binary information, so let's define F as a set of strings and w as a string which is the user input. Also let A be a Turing Machine that simulates the system we have developed and we consider it secure.
Some systems in order to identify authorised users demand extensive interaction which results that users have to fill questionnaires and forms. Without loss of generality, we may enclose all the user input in w also the acceptance or rejection of it by the system may be modeled efficiently by the operation of A. On this assumption, A will accept a valid w and output F or reject input from not authorized users and output "NO".
Let S be a set of strings so that if A decides that input w belongs to S then the user is considered as authorized and F is outputted. There are two cases where this model does not work as expected.
Case1. There is string e that does not belong to S and on input in A, it outputs F. So an intruder may use e in order to gain access to F without knowing w.
Case2. String w may be discovered algorithmically by unauthorized users.
Some explanation on these cases.
In case1 I model the cases of the real world where A is hacked. The intruder does not input the identification of some authorized user in order to access F, so user input does not belong to S. The intruder inputs a string that will cause A to get undesired behavior and this will result the output of F. In real world, sometimes the intruders exploit security holes and send data to the systems (like macros, telnet commands or Trojans) that cause such undesired behavior.
Case2 refers to the complexity of w and how easily it can be guessed.
Intentionally I leave out the cases where w has been stolen from its owner with techniques likes phishing or similar; this seems to be more of a social than a computational problem.
The following language describes Case1 as a computational problem.
SECURITY1 = {A, S, F, e | There is string e that does not belong to set S, TM A outputs F and halts on input e.}
Theorem. Problem SECURITY1 is recognizable and undecidable.
Proof. SECURITY1 is recognizable. We input in A string e that does not belong to S and we get F. If SECURITY1 is also decidable then for any input s that does not belong to S, we may decide whether machine A halts and outputs F or “NO”. As A can be any TM and s any string, these assumptions stand only if the halting problem is decidable and this results a contradiction.
Although both the developer of A and an intruder face the same problem, as they both have to solve one instance of SECURITY1 in order to build or hack A. Yet, the job of the intruder is easier than the job of the developer. The intruder has to solve the recognizable side of the problem while the defender has to approach the undecidable side of it.
In other words, if you try to intrude A you build a process described by string e and once you input it to A you can find out if it works. If you are a defender you can never be sure that the system that you build will be strong enough to resist attacks. Building A is a far more complex task than discovering e and that is why you see systems that are developed by experienced developing teams to have been hacked by some technology enthusiast teenager.
Case2 refers to the complexity of w or else to the complexity of the procedure of entering user identification data. There is no point on examining the computability of such processes as most of them are in theory computable. For instance, when a user has to enter a 4 digit PIN in a system the number of possible PINs that may be entered is bounded and in theory it is computable. If we enter the restriction that the user has only three attempts to enter the correct PIN then it gets difficult for an intruder to guess the PIN.
The proposed approach by Lipton and Regan actually increases the complexity of the procedure of identification. It gets harder for an intruder to guess w which may be subject to frequent changes if the questions asked by the system change frequently.
The image on this post is freely available under Creative Commons License. Check the source for the original image and many more.
The problem of developing a secure system regards the development of a system that will maintain a set of information F which will only be accessible by authorised users. As a result, any unauthorised user will be excluded from accessing F. A good way to study the system is to model it using Turing thesis.
Any user has to provide some input like username and password in order to get clearance and usually F is represented by some text or binary information, so let's define F as a set of strings and w as a string which is the user input. Also let A be a Turing Machine that simulates the system we have developed and we consider it secure.
Some systems in order to identify authorised users demand extensive interaction which results that users have to fill questionnaires and forms. Without loss of generality, we may enclose all the user input in w also the acceptance or rejection of it by the system may be modeled efficiently by the operation of A. On this assumption, A will accept a valid w and output F or reject input from not authorized users and output "NO".
Let S be a set of strings so that if A decides that input w belongs to S then the user is considered as authorized and F is outputted. There are two cases where this model does not work as expected.
Case1. There is string e that does not belong to S and on input in A, it outputs F. So an intruder may use e in order to gain access to F without knowing w.
Case2. String w may be discovered algorithmically by unauthorized users.
Some explanation on these cases.
In case1 I model the cases of the real world where A is hacked. The intruder does not input the identification of some authorized user in order to access F, so user input does not belong to S. The intruder inputs a string that will cause A to get undesired behavior and this will result the output of F. In real world, sometimes the intruders exploit security holes and send data to the systems (like macros, telnet commands or Trojans) that cause such undesired behavior.
Case2 refers to the complexity of w and how easily it can be guessed.
Intentionally I leave out the cases where w has been stolen from its owner with techniques likes phishing or similar; this seems to be more of a social than a computational problem.
The computability of security and why hacking is easier that defending
The following language describes Case1 as a computational problem.
SECURITY1 = {A, S, F, e | There is string e that does not belong to set S, TM A outputs F and halts on input e.}
Theorem. Problem SECURITY1 is recognizable and undecidable.
Proof. SECURITY1 is recognizable. We input in A string e that does not belong to S and we get F. If SECURITY1 is also decidable then for any input s that does not belong to S, we may decide whether machine A halts and outputs F or “NO”. As A can be any TM and s any string, these assumptions stand only if the halting problem is decidable and this results a contradiction.
Although both the developer of A and an intruder face the same problem, as they both have to solve one instance of SECURITY1 in order to build or hack A. Yet, the job of the intruder is easier than the job of the developer. The intruder has to solve the recognizable side of the problem while the defender has to approach the undecidable side of it.
In other words, if you try to intrude A you build a process described by string e and once you input it to A you can find out if it works. If you are a defender you can never be sure that the system that you build will be strong enough to resist attacks. Building A is a far more complex task than discovering e and that is why you see systems that are developed by experienced developing teams to have been hacked by some technology enthusiast teenager.
About Case 2
Case2 refers to the complexity of w or else to the complexity of the procedure of entering user identification data. There is no point on examining the computability of such processes as most of them are in theory computable. For instance, when a user has to enter a 4 digit PIN in a system the number of possible PINs that may be entered is bounded and in theory it is computable. If we enter the restriction that the user has only three attempts to enter the correct PIN then it gets difficult for an intruder to guess the PIN.
The proposed approach by Lipton and Regan actually increases the complexity of the procedure of identification. It gets harder for an intruder to guess w which may be subject to frequent changes if the questions asked by the system change frequently.
The image on this post is freely available under Creative Commons License. Check the source for the original image and many more.