The D-Coi project

The Dutch Language Corpus Initiative (D-Coi) was a project funded within the Dutch -Flemish STEVIN programme. The D-Coi project was a preparatory project which aimed to produce a blueprint and the tools needed for the construction of a 500-million-word reference corpus of contemporary written Dutch. The project ran between June 2005 and December 2006 and was carried out by a joint Dutch-Flemish consortium of six academic institutions in collobaration with an industrial partner. Most partners had previously contributed to the successful compilation of the Spoken Dutch Corpus.

Semantic role annotation in D-Coi

The STEVIN-programme has identified semantic annotation as one of its priorities. Within D-Coi, we are developing guidelines for the semantic role annotation of a Dutch written corpus. Our focus is on PropBank labeling and a merging approach between PropBank and FrameNet methods.

Furthermore, we are investigating methods for automatic labeling of semantic roles. We have developed a rule-based tagger (XARA) and applied machine learning methods to a hand-labeled corpus. Details on our approach can be found in the internal documents section.