The production rules of a context free grammar are formalised as pairs, just a set of relations...
(α,β) ∈ R
where α is a non-terminal and β is either a terminal or a non-terminal.
thus S → A could be written as (S,A) ∈ R
But when parsing tagged natural language trees for probabilitic CFG's. Many of there rules are of the form:
NP → NNP POS
that is, the right hand side is not always a single terminal or non-terminal
Is there a way of formalising these production rules? As I can't see the relation method working...
unless they were perhaps more like (NP → NNP) → POS
Or is it that they are not the exact production rules,
A context-free grammar is defined by a four-tuple
(V, T, P, S):v → ωwherev ∈ Vandω ∈ (V ⋃ T)*V, the start symbolTechnically, you could derive
VandTfromP. However, everyone does roughly as above (with some variation of names, and occasionally usingVandV ⋃ Tas primitives instead ofVandT).The important point (in bold above) is that the right-hand side of a production is not "a terminal or a non-terminal" but rather "an element of
(V ⋃ T)*". If you couldn't expand a non-terminal into more than one symbol, your language would only consist of single-element strings.