Introduction to Automata Theory, Languages, and Computation
June 01, 2017
I found this book quite accessible, practical, as well as theoretically intriguing. In particular it looks to answer questions of decidability as well as intractability. What does that mean? In the words of the authors themselves, decidability would be asking: What can computers do at all? While intractability would ask: What can computers do efficiently?
Keep in mind this book is more theory based. In terms of teaching a generation of coders to participate in industry, it may have been taught in the past, but isn't necessarily as relevant now. I say this in the sense that very little of its content would be applied directly to building software these days. Most of the software that uses such theory is already ubiquitous and thus packaged into many existing platforms, libraries, and development kits in today's digital economy.
If you're looking to learn the theoretical limits of computation, or just looking to grow your understanding of computing science as a way of thinking, then this book is better suited for you. My own motivation is to learn computing as a way of thinking, but I'm also designing my own programming language (at the time of this writing) and would like to build the first generation of its compiler. If you have a similar aim, this book is also for you.
As for the content and structure of the book itself: One way of reading this text is to interpret it as an overall attempt to mitigate the complexity of computation, in particular by stratifying it into layers. The vehicle for doing so being automata. A secondary theme woven through this stratification narrative worth praising is that each layer of complexity has a dual interpretation as both automata and as language with a given grammar, and for each theoretical level of language complexity, the description of its corresponding automata is such that it is very straightforward to implement in an actual programming language. lexers and parsers for example. As an aside, learning about Noam Chomsky's contribution to computing science with the Chomsky hierarchy is pretty cool as well.
Finally, this book does a really good job exploring and explaining deterministic vs non-deterministic algorithms, and what these actually mean. You can get away with a casual understanding in many industry contexts having to do with concurrency, but if you want to delve deeper this book is worth it.
In summary: If you really need (or desire) a strong theoretical understanding of the foundations of computing science, I highly recommend this book. If you don't, you can get by without it, relying on more localized education for whatever your software context is. This may sound harsh, but it is a pretty rigorous book, it's worth it, but also know what you're getting into.
- Chapter 1 - Automata: The Methods and the Madness
- Chapter 2 - Finite Automata
- Chapter 3 - Regular Expressions and Languages
- Chapter 4 - Properties of Regular Languages
- Chapter 5 - Context-Free Grammars and Languages
- Chapter 6 - Pushdown Automata
- Chapter 7 - Properties of Context-Free Languages
- Chapter 8 - Introduction to Turing Machines
- Chapter 9 - Undecidability
- Chapter 10 - Intractable Problems
- Chapter 11 - Additional Classes of Problems
Chapter 1 - Automata: The Methods and the Madness
This chapter introduces the initial terms, and how they might relate to computational complexity. Beyond that it also introduces the fact that we're expected to hold a high level of mathematical rigour: Several sections are spent going over the idea of a formal proof and types of proofs. For the record, I actually like that as I come from a pure math background, but I also understand for those who don't it might scare them off. For their sake, I hope it doesn't.
Chapter 2 - Finite Automata
In this chapter we explore the basic mechanics of automata, what they are, how they're built, how to use them. We look at the differences in deterministic vs nondeterministic automata, not to mention they are actually equivalent in their potential, which is to say they can be translated into each other.
This chapter didn't mean much to me on first glance. It was very thorough, and as the idea of automata were completely new to me at the time, it was a lot to take in, with little to no motivation for such strangely behaving creatures. The application of automata to text search was a first step in seeing their use, but just barely. Mostly they just seem like a convoluted way to search text as their are more direct approaches.
Regardless, as everything builds on this chapter, don't take it for granted, and don't rush over it.
Chapter 3 - Regular Expressions and Languages
This is the first chapter where things really began to click for me. I've seen regular expressions in bash, the linux commandline. I learned to use them a bit in grep, but not knowing the classical history of unix's development, I just assumed—although very clever and inventive—that regular expressions were an ad-hoc design. I didn't realize they had actual theoretical underpinnings. Not only that, but with a rigorous treatment of their mechanics, I have become much more comfortable with their use.
Furthermore, seeing them in application to text search and how that tied in with the finite automata from the previous chapter, I started to see the depth of this theory. I mean I've seen text searching algorithms before, and for as elegant, efficient, and powerful as they are, I've only ever seen exact text matching. With regular expressions you can match patterns, and if you didn't know any better you'd think the underlying algorithms for such pattern matching would be somewhat complex, when translated into finite automata they're actually quite simple and straightforward. So much so the whole process can be coded as a lexer. That's really amazing!
Chapter 4 - Properties of Regular Languages
This chapter gets back to theory, proving more classical mathematical properties of regular languages such as closure, decidability, equivalence. This of course relates back to regular expressions and finite automata because they're all equivalent.
At first glance, I could see the theoretical value of proving these theoretical aspects of regular languages, and reducing a DFA to its simplest form has an obvious practical application of algorithm optimization. At the same time though this chapter otherwise seemed tedious. Necessary, important, but kind of dry, you know? I didn't get it at first.
I was wrong! I was wrong on account of the pumping lemma, which they actually introduce early on. It seems kind of abstract, and it lacks motivation, at least for me. Usually lemmas are only important in proving more powerful theorems, so I knew not to take it for granted, but I didn't appreciate it immediately. To be honest, I only clued into its importance a few chapters ahead when they reintroduce the pumping lemma for another class of languages.
Here's the realization so you don't make the same mistake I did: Don't be impatient with the pumping lemma! It is used to prove the validity in even bothering to classify and stratify different levels of languages and their automata. I mentioned the narrative of this book is to mitigate computational complexity by stratification. This is an engineering tactic, but in this case, it's not only practical, it's theoretically warranted. That's amazing!
Chapter 5 - Context-Free Grammars and Languages
This is a big chapter. Don't jump ahead. Sit with it for a while, contemplate it, let it sink in.
Here we effectively learn what kind of strings we can derive (or recognize) using tree-recursion. With regular languages for example, we have access to recursion (the star operator), but it's limited to the recursion of repeated insertion. Here we can recurse from a finite collection of template patterns until we insert their final yields.
It's hard to explain without actually reading the chapter, but the concepts and ways of thinking offered here should not be taken for granted. What blew my mind though was when it all started connecting back to trees and parsers.
The other part of this chapter dealt with ambiguity of context free grammars, which is to say grammars in which you could derive a string in more than one way. The issue here is in practice, if you took a string, and broke it down into its derivation tree, you would then use that derivation tree to interpret and evaluate the meaning of that string. If there's more than one possible derivation, it's also possible there's more than one interpretation and evaluation which you do not want for a programming language.
This part of the chapter is devoted to showing just because a particular grammar for a language is ambiguous, doesn't necessarily mean all such grammars for that language are. In such cases, you can often find alternative grammars which are not ambiguous. Problem solved. Only: And I hope I'm not giving spoilers here, but the general decidability of ambiguity is known to be negative. This is to say, if you're given a random context-free language, there is no single universal algorithm to determine if that language is ambiguous (all its grammars are ambigous). This is to say: ambiguity is undecidable. All is not lost of course, it just becomes a matter if it being case-by-case, but knowing this theoretical result could save a lot of time as it lets us know our limits.
Chapter 6 - Pushdown Automata
I thought my mind was done being blown already, but pushdown automata blew my mind. Basically they're just finite automata, but this time their equipped with some memory. I'll cheat here, as we're coders and we're already familiar with various data structures, but the memory structure these automata are equipped with is simply a stack. A stack, is in push, pop, first in last out (filo), that sort of thing.
It blows my mind because these PDAs are equivalent to context free grammars, meaning parsers, tree-derviations, are effectively equivalent to repeat recursion with some stack memory. That's weird. Haha.
Chapter 7 - Properties of Context-Free Languages
This chapter is in a similar spirit to that of chapter 4, where we looked at broader mathematical properties of context-free languages. In particular we first learn to convert grammars into chomsky normal form, which allow us to more readily prove these broader mathematical properties such as closure.
We also take a look at the pumping lemma again. Strictly speaking it's not the same pumping lemma, as it is both expressed and proven for a different class of languages. Regardless, it's actually quite similar, and it serves the same purpose of proving there exist languages that aren't context-free, which can only be higher up in the hierarchy of our stratification.
Chapter 8 - Introduction to Turing Machines
This is where we start showing Turing machines as the final layer, the boundary with no higher layers of complexity. It ends up being more about theoretical limitations than the mechanics of Turing machines themselves. As for Turing machines themselves, this chapter devotes a lot of space to showing various implementations of Turing machines to be equivalent. It only briefly introduces the idea of halting.
Chapter 9 - Undecidability
Having reached the end of our automata hierarchy, from here on out we look to discover boundary limitations in the potential and capabilities of these machines.
The main way to look at it, is a given language (of strings) equipped with a given Turing machine is either decidable or undecidable. If it's decidable, it is recursive, meaning the Turing machine corresponding to it will halt for any possible string you give it, even if that string doesn't belong to the language the Turing machine generates.
A given language with its Turing machine is undecidable if, for any string not in the corresponding language, the Turing machine never halts. It halts of course if the string is in the language, as that's the nature of the Turing machine—it accepts and generates its corresponding language.
We don't stop there though: So far we have languages which are decidable, and we call them recursive. We have languages which are undecidable, but there's more than one variety of undecidable it turns out. The first class of undecidable languages are called recursively enumerable. The language such that the universal Turing machine corresponds to it is one such example. Things get funky when we learn there are languages which are not recursively enumerable. What this means is that there are no corresponding Turing machines, or rather there is no way to access such languages in full through the toolset of Turing machines. That's fascinating! By the way, the diagonalization language is the classical example. Think of it as a translation of Cantor's diagonalization argument applied to automata.
Beyond this, we look at Rice's Theorem, and Post's Correspondence Problem, and take an inventory of other well known undecidability problems.
Chapter 10 - Intractable Problems
We've reached the end of our automata hierarchy, and now with the previous chapter we've explored the theoretical boundaries and limits in potential of our Turing machines. All that's left is to look at problems which are absolutely (theoretically) decidable, but are relatively (practically) not.
This is to say, we look at problems where we know we potentially could solve them, but in real life application we might not have the actual time or resources to do so. We split such problems up into classes P, NP, where it's assumed but not proven P does not equal NP.
Finally, we're introduced to the class NP-complete. Think of a problem where every other problem which is just as hard or harder can be translated into this problem. Not only that, but every other such problem can be translated efficiently enough we can ignore the cost of the translation. Such a problem as this is a sink, a terminal object. The value of identifying a problem such as this is that it can be viewed as a bottleneck, or a representative of its class. The idea then is, if we take all such problems in NP and group them together, we call these problems NP-complete. As it turns out no one has ever shown an NP-complete problem to be in the class P, which adds further inductive evidence to the claim that P does not equal NP.
I have to admit, this strategy of comparing computational complexities through representatives ("reducing one problem to another") is of both practical and philosophical interest, as convoluted as it might be, haha.
Chapter 11 - Additional Classes of Problems
Final chapter. Basically we extend the previous chapter to include other classes of solvable problems of varying degrees of computational complexity. The main variation in theme here is looking at polynomial and non-deterministic polynomial space classes of problems rather than just polynomial and non-deterministic polynomail time classes as before.
Beyond that, it's really more a taxonomy of such classes, only useful for hardcore algorithmics experts.