A Data-Oriented Approach to Tree Insertion Grammar

Lars Hoogweg University of Amsterdam


In Data-Oriented Parsing (DOP) an annotated corpus is used as
a stochastic grammar. The most probable analysis of a new input
sentence is constructed by combining sub-analyses from the corpus
in the most probable way. This paper presents a model in which
the DOP model as developed by Bod is enriched with the insertion
operation, thus yielding a stochastic Tree Insertion Grammar (TIG).
TIG is related to Tree-Adjoining Grammar. Since the adjunction
permitted in TIG is restricted, TIG can embed the elegance of the
analyses found in Tree-Adjoining Grammar without allowing for
context-sensitive languages. In addition to presenting the model,
the paper reports on some experiments for measuring the
disambiguation-accuracy of this model on the ATIS domain.