Bootstrapping Structure using Similarity

Menno van Zaanen (University of Leeds)

In this paper we apply a new similarity-based learning algorithm, inspired by
string edit-distance (Wagner and Fischer 1974), to the problem of bootstrapping
structure from scratch. The algorithm takes a corpus of unannotated sentences
as input and returns a corpus of bracketed sentences. The method works on
pairs of unstructured (or already partially bracketed) sentences that have
one or more words in common. When the two sentences are divided into parts
that are the same in both sentences and parts that are different, this
information can be used to find parts that are interchangeable (i.e. the
parts of the sentences that are different in both sentences). These parts are
taken as possible constituents of the same type. While this corresponds to the
basic bootstrapping step of the algorithm, further structure may be learned
when comparing with other (similar) sentences.

We used our method for bootstrapping structure from the flat ATIS sentences, and compared the learned brackets with the brackets in the Penn Treebank ATIS corpus. While our results are encouraging (we obtained 90.0 % non-crossing brackets accuracy), we will go into some of the shortcomings of our approach and suggest possible solutions.