How is it practically possible to compute an automaton inside a function and then return it?

36 Views Asked by At

I'm trying to follow Cormen - Algorithms, 3rd edition. Specifically, Chapter VII, 32 "String Matching". In general, I find this book extremely hard to follow, due to the abundance of math-speak and too theoretical, rather than practical, nature. That's why I only visit it selectively, at the algorithms I'm currently interested in.

I've been following the string chapter pretty well. It was not easy, but I managed to get most of the stuff in the Brute-force and Quick-hash algorithms... However, the second part, left me grasping for air, a bit. I have almost 0 background in automata theory, but I still managed to grasp the concept of the algorithm. However, one thing is unfathomable for me:

enter image description here

OK, so here everything seems to be pretty obvious. Automaton changes states depending on the previous state q and a character from text, by passing it into transition function small delta. But wait... What in the World is this transition function? Well, here it is:

enter image description here

OK, now, here, it is much less obvious of what is even going on. But my biggest problem is not in understanding how this particular piece of pseudocode works. I can't grasp the practicality of this process. How is it even possible to generate function at runtime in another function, and then return it like some kind of variable, so that calling function can execute it?

Can anybody give me directions of what basic things I need to understand in order to get the practical side of it? Maybe you can give some short and simple enough description of how this can be practically implemented?

My only guess, for now, is virtual machine... Theoretically, you can generate sequence of VM instructions inside a function, and then execute them, yes... But VMs have high overhead and this kind of defeats the whole purpose of fast string matching...

Another possible way, albeit, a bit crazy, is to actually generate native machine code instructions directly inside a function, during runtime. Then, write them to memory area allocated with PAGE_EXECUTE rights, and then just return the pointer to this function. Calling code can then just call the function, as any other hardcoded function. I have no idea if this is even possible or practical, though.

1

There are 1 best solutions below

0
Luatic On

This is pseudocode. It does not need to be implementable in a "standard" programming language verbatim. If you look at the usage of the transition function, you will find that only three things are done with it:

  • Pass it around: Return it from ComputeTransitionFunction, pass it to FiniteAutomatonMatcher.
  • Set transitions: delta(q, a) = k
  • Get transitions: delta(q, a)

That is, small delta is effectively just a map from Q x Sigma → Q. This is exactly what functions do in math. Pseudocode tends to use notation which is closer to math than programming languages.

In practice you would usually implement this as a 2-dimensional array (if you don't care about memory usage) or as a hash map (if you want to reduce memory usage at the expense of some CPU work).

A hardcoded map can be implemented using a bunch of nested switches just as well:

enum State {
    Q1,
    Q2
};
enum State delta(enum State q, char c) {
    switch (q) {
        case Q1:
            switch (c) {
                case 'a': return Q1;
                case 'b': return Q2;
            }
        case Q2:
            switch (c) {
                case 'a': return Q2;
                case 'b': return Q2;
            }
    }
}

This tree of switches encodes delta(Q1, a) = Q1, delta(Q1, b) = Q2, delta(Q2, a) = Q2, delta(Q2, b) = Q2. It omits default cases for c not in {a, b} or q not in {Q1, Q2}.

Parser generators will in practice often generate code like this - actual functions - for performance reasons. Note that this happens at compile time rather than run time.


How is it even possible to generate function at runtime in another function, and then return it like some kind of variable, so that calling function can execute it?

If your function's code exists at "compile time" already, this is called a closure: The function's code doesn't change, only the context changes. These variables in the context - effectively local variables of parent scopes - are called upvalues. Creating a closure is effectively just allocating an implicit struct which holds these upvalues and a function pointer; roughly speaking, calling the closure calls the function pointer, passing the struct such that the closure can access its upvalues.

Of course this closure, which is just a pointer to a struct or similar, can be passed around freely. Being able to treat functions as values - which enables fully dynamic dispatch - makes functions first class.

First-class closures are fundamental to lambda calculus and functional programming in general. Many scripting languages have also adopted first-class closures. Here is a standard "curried addition" example in Lua:

function adder(a)
    return function(b)
        return a + b
    end
end
add1 = adder(1)
add2 = adder(2)
print(add1(41)) -- 42
print(add2(40)) -- 42