I've implemented the CYK parsing algorithm which uses a bottoms-up approach to build a parse tree. As it traverses the algorithm, the path to the ultimate solution is stored in backpointers. From the backpointers, we construct the tree. This final step is what I am having issues with.
This is the data structure I'm using to store the tree:
class GrammarTree(object):
def __init__(self, data):
self.data = data
self.left = None
self.right = None
def insertLeft(self, new_node):
self.left = GrammarTree(new_node)
def insertRight(self, new_node):
self.right = GrammarTree(new_node)
The following is how I build the tree, where back stores a tuple where split is the index used to split the tree and left_rule and right_rule are the rules for the respective tree represented by int. If a leaf node is reached there is no tuple, just an int representing the terminal rule.
def build_tree(start,end,idx,back):
tree = GrammarTree(idx)
node = back[start][end][idx]
if isinstance(node,tuple):
split,left_rule,right_rule = node
tree.insertLeft(build_tree(start,split,left_rule,back))
tree.insertRight(build_tree(split,end,right_rule,back))
return tree
else:
tree.insertLeft(GrammarTree(node))
return tree
The problem is that when function is done building the tree, there are extra branches being added, i.e. the nodes aren't being properly glued together.
This is what it looks like:
Lvl0 root
/ \
Lvl1 L1 R1
/ | \ / | \
/ | \ / | \
/ | \ / | \
/ | \ / | \
/ | \ / | \
Lvl2 L1.left=None L1.right=None L1.data R1.left=None R1.right=None R1.data
/ \ / \
Lvl3 L2 R2 L3 R3
There shouldn't be a data node between the trees.
Edit:
The problem is not that there is an extra data node (above statement is wrong), it's that after Lvl1, instead of new branches being added to L1.left/right and R1.left/right on Lvl2, they are added to L1 and R1's data fields. So L1/R1.data ends up being a tree in and of itself and L1.left/right and R1.left/right are not used and therefore None.
It should look like this:
root
/ \
/ \
L1=root.left R1=root.right
/ \ / \
/ \ / \
/ \ / \
L2=L1.left R2=L1.right L3=R1.left R3=R1.right
This is where I call build tree:
back = [[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2], [0, 0, 0, (1, 6, 7), 0, 3, 0, (1, 7, 7)], [0, 0, 0, (1, 6, 7), 0, (1, 7, 3), 0, (1, 7, 7)], [0, 0, 0, (1, 6, 7), 0, (2, 7, 3), 0, (1, 7, 7)]],\
[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2], [0, 0, 0, (2, 6, 7), 0, (2, 7, 3), 0, (2, 7, 7)], [0, 0, 0, (2, 6, 7), 0, (2, 7, 3), 0, (3, 7, 7)]],\
[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2], [0, 0, 0, (3, 6, 7), 0, 3, 0, (3, 7, 7)]],\
[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2]],\
[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]]
build_tree(0,4,5,back)
The problem was in the
insertLeft()andinsertRight()methods of theGrammarTreeclass. Instead of simply connecting the branches, you'll seet that I was calling theGrammarTreeconstructor so I was essentially wrapping a tree inside another tree.I fixed the problem by removing the call to the constructor.