How to manage variable scopes in symbol table

Question

How to manage variable scopes in symbol table

530 Views Asked by ishaangupte At 20 December 2021 at 20:51

I was writing a toy compiler and I generated a working C++ parser using ANTLR. I did some research and found out that the best way to handle scopes in a symbol table is to make a different table for each scope. I was planning to implement this using a stack (push when entering a scope and pop when exiting) but I realized that for accessing variables from higher schools, I would need to be able to access the scopes that are not at the top of the stack. This would get a little messy as there can be (theoretically) hundreds of scopes and popping all of those to find a variable would be inefficient. Is there a better way to implement this?

Original Q&A

There are 1 best solutions below

**rici** · Accepted Answer · 2021-12-21T01:27:10.073000

It's actually pretty uncommon for a program to have "hundreds" of nested scopes. The C standard, for example, only requires a compiler to handle "127 nesting levels of blocks" [Note 1].

However, there is a pretty simple algorithm which I used in my first toy language, which would have been more than half a century ago; I doubt whether I thought it up myself, but it's far too long ago to recall what the source might have been.

The idea is to use a single associative array ("hash table") whose keys at any point in a parse are the identifiers which are currently visible ("in scope"). Each scope is a simple linear array of names, each associated with some information about the name, such as the type of the defined variable [Note 2]. The values in the associative array contain a pointer to the scope in which the associated name is currently defined, and the index in the scope array of that definition.

There is also a pushdown stack which contains tuples consisting of a name and its previous definition (that is, the scope and frame index of the name). Every time a definition of a name is encountered, the name and its previous definition (or a Null marker) is pushed onto that stack before the name is entered into the symbol table associative array.

Finally, there is another pushdown stack of scopes; every time a scope is entered, a reference to the current top of the definition pushdown stack is pushed onto the scope stack, and when the scope ends, the definition stack is popped back to where it was at scope entry. As the definition stack is popped, each name's entry in the associative array is replaced with the data which was saved on the definition stack at scope entry.

That's all quite efficient; all the operations are basically constant time [Note 3], and I still think it's quite a pretty solution. But there's an argument that the overhead of the hash table is not justified because most name references are close to the name definition, and consequently a linear search in a pushdown stack of definitions could be faster on average, since it's much more cache-friendly than a hash-table lookup. I haven't bothered to benchmark, though.

Note that many real languages have much more complicated name lookup rules, and you might need to deal with multiple namespaces (structure members, explicit namespaces, etc.), with different nesting rules. (Without even starting to discuss the complications of name lookup in C++.)

Notes

If you want to get technical, it doesn't even require that the compiler handle any program with blocks nested no more than 127 levels deep. It requires that "an implementation shall be able to translate and execute at least one program that contains at least one instance of [127 nesting levels of blocks]", which is really a completely toothless restriction serving only as a guideline. But aside from machine-generated code, it seems unlikely that production code would come close to this limit anyway.
An array has much less overhead than a hash table, and a single large hash table has much less overhead than a bunch of small hash tables. The idea here is that aside from associating a name in the source code with an appropriate definition, there is no need for an associative array. When the call frame is actually being implemented, it's only necessary to loop through the defined variables in order; their names are probably irrelevant at that point, aside from debugging.
Scope exit takes time linear in the number of names defined in that scope, but if you look at it from the perspective of amortized time, you can think of the cost of popping the name at scope exit being accounted for in the moment of pushing the name during the scope, from which we can see that the total cost of maintaining the definition and scope stacks is essentially linear in the number of declarations in the program, or constant time per definition.

How to manage variable scopes in symbol table

There are 1 best solutions below

Notes

Related Questions in COMPILER-CONSTRUCTION

Related Questions in SYMBOL-TABLE

Trending Questions

Popular # Hahtags

Popular Questions