I am working on a compiler in TypeScript and thinking a lot about lexical scope. In particular I'm wondering about how you handle the situation where you have hoisted functions, where variables can be undefined at one point, and then defined at another point. For example:
function a() {
let d = 10
b()
return b
let g = 40
function b() {
c()
let f = 30
return c
function c() {
console.log(d, e, f, g)
}
}
let e = 20
}
const x = a()() // returns b, returns c
x() // log the variables
Here, what is the "lexical scope" inside the function c?
- Is it all the variables it could possibly/potentially have access to (at some point) during possible code evaluation?
(d, e, f, g) - Is it what it only will eventually be defined within its runtime context?
(d, f) - Is it what won't be undefined and throw an error?
(d) - Is it every combination of possible parents all at once (like a cartesian product sort of thing, times however many parents in the tree)?
- etc.
At first, for a while (for many years), I thought of lexical scope as a simple tree. You had higher scopes and nested scopes, and each nested scope introduced new variables. Nice and clean, easy-peasy. But this, this throws it for a loop. Now it's like, focusing on c, there are steps of evaluation inside b, and at each step, a possible different parent scope may exist. Then a might have a suite of parent scopes too. So it is like a natural flower/tree that blossoms upward (which is hard to imagine in programming). There are more and more possible parents. What I'm thinking is, the lexical scope is the combination of every combination of parents on upward. But then how do you use this scope in a compiler?
Now I am confused. What am I supposed to use the lexical scope for in a compiler if I can't tell for a nested function what 1 scope definition it is? I assumed each nested function has 1 scope, but no, it's parents are vast and combinatory. You can only really tell what the scope is when it is bound ("binding", as opposed to "scope"). That is, at the specific step in code evaluation, what the value of the scope is.
So how do you use this information in a compiler? I can tell if a variable is in scope only seemingly if I evaluate/simulate the code running itself. Basically, how do I use this lexical scope in a compiler now with this confusion?
I was going to use it as the source of truth for what variables were defined in a nested lexical scope block. But now I can't, because I need to know at each step what the values of the variables are before I can know what the scope is. Or am I missing something?
Thinking of trying something like this:
type SiteContainerScopeType = {
like: Site.ContainerScopeType
parent?: SiteContainerScopeType
children: Array<SiteContainerStepScopeType>
declarations: Record<string, SitePropertyDeclarationType>
}
type SiteContainerStepScopeType = {
like: Site.ContainerStepScope
previous: SiteContainerStepScopeType
context: SiteContainerScopeType
declarations: Record<string, SitePropertyDeclarationType>
}
If I understand correctly the semantics described in Understanding Hoisting in JavaScript, the code can be reorganized as follows, ordering first declarations, then variable initialization and finally the rest of the expressions.
When doing this, we find a more usual structure of having variables declared before being initialized and used.