I was writing a toy compiler and I generated a working C++ parser using ANTLR. I did some research and found out that the best way to handle scopes in a symbol table is to make a different table for each scope. I was planning to implement this using a stack (push when entering a scope and pop when exiting) but I realized that for accessing variables from higher schools, I would need to be able to access the scopes that are not at the top of the stack. This would get a little messy as there can be (theoretically) hundreds of scopes and popping all of those to find a variable would be inefficient. Is there a better way to implement this?
How to manage variable scopes in symbol table
530 Views Asked by ishaangupte At
1
There are 1 best solutions below
Related Questions in COMPILER-CONSTRUCTION
- Reference: Crafting Interpreters. Print statement is not able to evaluate expression. Help me fix this (details below)
- Load function written in amd64 assembly into memory and call it
- I have implemented till Statements and State in Tree Walk Interpreter. I am pissed with an error
- Resolve shift/reduction conflict in grammar for expressions in PLY for calls to embedded functions
- Grammar for access to properties and calls to embedded functions
- LLVM code generation causes problems with pointer arithmetic
- what does react compiler mean actually?
- Errors on Recursive Descent Parsing Java
- Java CUP produces Shift-Reduce conflict when parsing a grammar for a C++ type language
- Three-Address-Code (TAC) and Conjunction/Disjunction
- How do I write an implicit cast for my strongly typed interpreter? (C++)
- Yacc parser not reducing specific production rules as intended
- Why is the function version tag consistently "Base" in HDF5 library?
- Sly parser, how are recursively defined types implemented?
- Does a non terminal token need an explicit definition?
Related Questions in SYMBOL-TABLE
- Why symbol table contains no variable name?
- how symboltable track functions defined later?
- What is causing the unexpected interaction between the string `eval`, code attributes, and the symbol table?
- How to specify symbol name when creating objects from raw binary files using objcopy?
- How can I avoid making this static reference to a non-static field?
- Problem is there in symbol table when building gdb binary for version 13.1 and 13.2
- Symbol table entry when using auto return type deduction with templates in C++
- Symbol Table Code in C: "segmentation fault (core dumped)" error with Insert function
- How do I get a function name in the symbol table to point to a different function?
- Why Add Only Present-Symbols to A Shadowing-List?
- How do I count the frequency of an item in a given range?
- PHP Symbol Table
- Where is swift compiler's symbol table?
- In symbol table how to mark variable out of scope?
- Building a basic symbol table in C
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
It's actually pretty uncommon for a program to have "hundreds" of nested scopes. The C standard, for example, only requires a compiler to handle "127 nesting levels of blocks" [Note 1].
However, there is a pretty simple algorithm which I used in my first toy language, which would have been more than half a century ago; I doubt whether I thought it up myself, but it's far too long ago to recall what the source might have been.
The idea is to use a single associative array ("hash table") whose keys at any point in a parse are the identifiers which are currently visible ("in scope"). Each scope is a simple linear array of names, each associated with some information about the name, such as the type of the defined variable [Note 2]. The values in the associative array contain a pointer to the scope in which the associated name is currently defined, and the index in the scope array of that definition.
There is also a pushdown stack which contains tuples consisting of a name and its previous definition (that is, the scope and frame index of the name). Every time a definition of a name is encountered, the name and its previous definition (or a Null marker) is pushed onto that stack before the name is entered into the symbol table associative array.
Finally, there is another pushdown stack of scopes; every time a scope is entered, a reference to the current top of the definition pushdown stack is pushed onto the scope stack, and when the scope ends, the definition stack is popped back to where it was at scope entry. As the definition stack is popped, each name's entry in the associative array is replaced with the data which was saved on the definition stack at scope entry.
That's all quite efficient; all the operations are basically constant time [Note 3], and I still think it's quite a pretty solution. But there's an argument that the overhead of the hash table is not justified because most name references are close to the name definition, and consequently a linear search in a pushdown stack of definitions could be faster on average, since it's much more cache-friendly than a hash-table lookup. I haven't bothered to benchmark, though.
Note that many real languages have much more complicated name lookup rules, and you might need to deal with multiple namespaces (structure members, explicit namespaces, etc.), with different nesting rules. (Without even starting to discuss the complications of name lookup in C++.)
Notes
If you want to get technical, it doesn't even require that the compiler handle any program with blocks nested no more than 127 levels deep. It requires that "an implementation shall be able to translate and execute at least one program that contains at least one instance of [127 nesting levels of blocks]", which is really a completely toothless restriction serving only as a guideline. But aside from machine-generated code, it seems unlikely that production code would come close to this limit anyway.
An array has much less overhead than a hash table, and a single large hash table has much less overhead than a bunch of small hash tables. The idea here is that aside from associating a name in the source code with an appropriate definition, there is no need for an associative array. When the call frame is actually being implemented, it's only necessary to loop through the defined variables in order; their names are probably irrelevant at that point, aside from debugging.
Scope exit takes time linear in the number of names defined in that scope, but if you look at it from the perspective of amortized time, you can think of the cost of popping the name at scope exit being accounted for in the moment of pushing the name during the scope, from which we can see that the total cost of maintaining the definition and scope stacks is essentially linear in the number of declarations in the program, or constant time per definition.