I'm trying to implement the CYK pseudo code provided by wikipedia. The example sentence I input should be outputting True, however it is outputting false. I think I'm getting issues on the indexing considering the provided example starts from 1.
Code:
def is_in_language(self, tokens):
n = len(tokens)
rules = self.grammar.lhs_to_rules
table = defaultdict(lambda: defaultdict(dict))
#Initialize dictionary table[row][column][nonterminal r] = boolean
for row in range(n+1):
for col in range(n+1):
for r in rules:
table[row][col][r] = False
for i in range(n):
nonTerminalList = self.grammar.rhs_to_rules[(tokens[i],)]
print(nonTerminalList)
for nonTerminal in nonTerminalList:
(r,right,prob) = nonTerminal
table[0][i][r] = True
for l in range(2,n+1):
for s in range(n-l+1):
for p in range(l-1+1):
for B in rules:
for C in rules:
AList = self.grammar.rhs_to_rules[B,C]
if(len(AList) > 0):
for A in AList:
(leftA, rightBC, prob) = A
try:
if(table[p][s][B] and table[l-p][s+p][C]):
table[l][s][leftA] = True
except:
pass
print(table[n][0][self.grammar.startsymbol])
return table
The following code in
pythonimplements the CYK dynamic programming algorithm (described here), which can be used to solve the membership problem for CFG, i.e., given an input string w and a CFG grammar G in chomosky normal form (CNF) it can find whether w is in L(G) in O(n^3|w|) time.Now, let's use the above implementation of algorithm for the following simple CFG G (already in CNF):
and the input string w = baaba to test membership of w in L(G).
The following animation shows how the DP table gets constructed:
A probabilistic version of this DP algorithm with back-pointers can be used for the PCFGs to construct parse trees for natural language sentences in NLP.