Pattern-matching aspects of Data-Oriented Parsing

Guy De Pauw (University of Antwerp)


Data-Oriented Parsing (DOP) holds its ground among the best parsing
schemes, pairing state-of-the art parsing accuracy to the
psycholinguistic insight that larger chunks of syntactic structures
are relevant grammatical and probabilistic units. Parsing with the
DOP-model, however, seems to involve a lot of CPU cycles and a
considerable amount of double work, brought on by the concept of
"derivations ", which is necessary for probabilistic processing, but
which is not convincingly related to a proper linguistic backbone.  It
is however possible to re-interpret the DOP-model as a
pattern-matching model, which tries to maximize the size of the
substructures that construe the parse, rather than the probability of
the parse. By emphasizing the memory-based aspect of the DOP-model,
it is possible to do away with the notion of "derivation", opening up
possibilities for efficient Viteribi-style optimizations, while still
retaining acceptable parsing accuracy through enhanced
context-sensitivity.