I've been making an interpreter in python and I've run into a problem, it identifies strings, variables, numbers and expressions. while testing the .lang file I noticed it outputted {'$var': 'EQUALS'} instead of the variable's string or num.
it outputs the "print", "num", "expr" values perfectly but not variables, at first I tried re-evaluating the code and changed symbols[varname[4:]] = varvalue to symbols[varname[4:]] = varvalue[6:] which made me get {'$var': ' '}.
it recognizes "variable" as the string associated with $var but outputs {'$var': 'EQUALS'} as in the equal before the string, num or expr. I want it to store "variable" as the string of the variable, as in {'$var': 'STRING: "variable"'}.
I seem to be not realizing something but I think the problem may be in my parse or doASSIGN function, could someone please tell me or give me a hint at what I might be doing wrong?
OUTPUT:
PS C:\Users\<user>\Desktop\spl> python basic.py test.lang
hello world
55
48
{'$var': 'EQUALS'}
test.lang:
print "hello world"
print 55
print (10 + 2) * 4
$var = "variable"
from sys import *
tokens = []
num_stack = []
symbols = {}
def open_file(filename):
data = open(filename, 'r').read()
data += "<EOF>"
return data
def lex(filecontents):
tok = ""
state = 0
varstarted = 0
var = ""
string = ""
expr = ""
n = ""
isexpr = 0
for char in filecontents:
tok += char
if tok == " ":
if state == 0:
tok = ""
else:
tok = " "
elif tok == "\n" or tok =="<EOF>":
if expr != "" and isexpr == 1:
tokens.append("EXPR:" + expr)
expr = ""
elif expr != "" and isexpr == 0:
tokens.append("NUM:" + expr)
expr = ""
elif var != "":
tokens.append("VAR:" + var)
var = ""
varstarted = 0
tok = ""
elif tok == "=" and state == 0:
if var != "":
tokens.append("VAR:" + var)
var = ""
varstarted = 0
tokens.append("EQUALS")
tok = ""
elif tok == "$" and state == 0:
varstarted = 1
var += tok
tok = ""
elif varstarted == 1:
var += tok
tok = ""
elif tok == "PRINT" or tok == "print":
tokens.append("PRINT")
tok = ""
elif tok == "0" or tok == "1" or tok == "2" or tok == "3" or tok == "4" or tok == "5" or tok == "6" or tok == "7" or tok == "8" or tok == "9":
expr += tok
tok = ""
elif tok == "+" or tok == "-" or tok == "*" or tok == "/" or tok == "(" or tok == ")":
isexpr = 1
expr += tok
tok = ""
elif tok == "\"":
if state == 0:
state = 1
elif state == 1:
tokens.append("STRING:" + string + "\"")
string = ""
state = 0
tok = ""
elif state == 1:
string += tok
tok = ""
#print(tokens)
#return ''
return tokens
def evalExpression(expr):
return eval(expr)
def doPRINT(toPRINT):
if(toPRINT[0:6] == "STRING"):
toPRINT = toPRINT[8:]
toPRINT = toPRINT[:-1]
elif(toPRINT[0:3] == "NUM"):
toPRINT = toPRINT[4:]
elif(toPRINT[0:4] == "EXPR"):
toPRINT = evalExpression(toPRINT[5:])
print(toPRINT)
def doASSIGN(varname, varvalue):
symbols[varname[4:]] = varvalue
def parse(toks):
i = 0
while(i < len(toks) - 1):
if toks[i] + " " + toks[i+1][0:6] == "PRINT STRING" or toks[i] + " " + toks[i+1][0:3] == "PRINT NUM" or toks[i] + " " + toks[i+1][0:4] == "PRINT EXPR":
if toks[i+1][0:6] == "STRING":
doPRINT(toks[i+1])
elif toks[i+1][0:3] == "NUM":
doPRINT(toks[i+1])
elif toks[i+1][0:4] == "EXPR":
doPRINT(toks[i+1])
i+= 2
if toks[i][0:3] + " " + toks[i+1] + " " + toks[i+2][0:6] == "VAR EQUALS STRING" or toks[i][0:3] + " " + toks[i+1] + " " + toks[i+2][0:3] == "VAR EQUALS NUM" or toks[i][0:3] + " " + toks[i+1] + " " + toks[i+2][0:4] == "VAR EQUALS EXPR":
if toks[i+2][0:6] == "STRING":
doASSIGN(toks[i],toks[i+1])
elif toks[i+2][0:3] == "NUM":
doASSIGN(toks[i],toks[i+1])
elif toks[i+2][0:4] == "EXPR":
doASSIGN(evalExpression(toks[i+2][5:]))
i += 3
print(symbols)
def run():
data = open_file(argv[1])
toks = lex(data)
parse(toks)
run()
We can actually fix the problem by only changing one character in your parser! As someone has already pointed out, using the following snippet, you will append
EQUALSevery time a=is found somewhere in the source program:Explanation
This is because your program iterates through your source file (
test.lang) and tries to find any symbol matching the corresponding token in meaning. As soon as your lexer finds a=,EQUALSis appended to the tokens list, no matter the context. Implementing context to theEQUALStoken is a challenging yet not totally impossible task.So, while there is a way to implement a fix for this by making some changes to the lexer, it is not necessary if the language itself does not become any more complicated than right now. By this I mean that the structure for declaring variables doesn't get more complex than
$var = "string". Once it exceeds this in complexity, changes to the lexer would be adequate (by complex, I mean things like type declaration:$var str = "string").Solution
Anyway, let's take a look at a simple fix:
Instead of using
doASSIGN(toks[I], toks[i+1]I useddoASSIGN(toks[i],toks[i+2])astoks[i+1]would land directly onEQUALSinstead ofSTRING, therefore resulting in{'$var', 'EQUALS'}. After playing around with the program for a while, I found no reason not to implement this, as including theEQUALSwould be merely a formality which you could technically include if you decide to implement program optimization.Now, is this the most beautiful and best-practice fix? Probably not, but it is frankly the easiest and fastest one. You can always create a fallback variable by adding a third, optional parameter to your
doASSIGNfunction if you need to record theEQUALSpresence somewhere.However, it certainly is the fix that achieves the same as all other fixes without much effort and hardly any rewriting necessary. Alternatively, you could also try to implement look-ahead techniques using
itertools. This would also be a very popular method of lexing a source file.All other solutions would require your program to be rewritten for the most part, which is something I want to avoid doing, as it should remain your own work. Nonetheless, I would suggest you look a little bit deeper into Interpreter design and check out some best practices! Furthermore, you might want to check out popular libraries for building Interpreters and Compilers! :)