Information retrieval
[ Top | Cursus | Literatuur | Werkcollege | Tentamen ]

Periode 3: November 2003 - Januari 2004

Docent Paola Monachesi
tel: 030-2536065
Computationele Linguistiek
Trans 10, kamer 2.13
3512 JK Utrecht
Spreekuur: Wednesday 11.00-12.00  

[ Top | Cursus | Literatuur | Werkcollege | Tentamen ]

The World Wide Web has the potential to become the primary source for storing and accessing data. However, its content is marked up in such a way that it is accessible only to humans.

Current Web search engines have serious difficulties in processing search queries. Even though they return impressive results, their level of precision and recall clearly shows their limitations.

An interesting alternative is the creation of a Semantic Web in which meaning is made explicit, allowing machines to process and integrate Web resources intelligently. This technology might allow for quick and accurate web search and facilitate communication among heterogeneous web-accesible devices.

The aim of the course is to examine this area of research by reading and discussing papers. We will also extend the techniques developed within the Semantic Web to deal with the integration of heterogeneous linguistic data encoded in various language resources such as corpora and databases.

[ Top | Cursus | Literatuur | Werkcollege | Tentamen ]

Searching the world wide web


Semantic web


  1. T. Berners-Lee et al. (2001). The semantic web The Scientific American




  1. B.C. Vickery (1997). Ontologies Journal of Information Science, 23 (4), pp.277-286

Ontology software

Linguistic Ontology


  1. Farrar, S., D. T. Langendoen (to appear 2003) A Linguistic ontology for the Semantic Web GLOT International.
  2. Farrar, Scott (2003) New ways of thinking about lexical Resources: a proposal for the semantic web Presented at the ISO Preparation Workshop on Lexicons Nicoletta Calzolari, Peter Wittenburg (Organizers) Feb. 27, 2003 Munich, Germany.
  3. Langendoen, D. T., S. Farrar, W. D. Lewis (2002) Bridging the Markup Gap: Smart Search Engines for Language Researchers International Workshop on Resources and Tools in Field Linguistics, prior to 3rd annual Language Resources and Evaluation Conference, May 26-27, Las Palmas, Canary Islands, Spain.
  4. Lewis, William, Farrar, Scott, and Langendoen, Terry (2001) Building a Knowledge base of Morphosyntactic terminology In S. Bird, P. Buneman, and M. Liberman (Eds.) Proceedings of the IRCS Workshop on Linguistic Databases, 11-13 December 2001, pp. 150-156.
  5. A. Dimitriadis, P. Monachesi (2002): Integrating different data types in a Typological Database System. In: Proceedings of the conference on Language Resources and Evaluation (LREC 2002). ELRA. Paris.


Web Ontology Language



Agents, ontology communication and wrap up


  1. J. Hendler. (1999) Is There an Intelligent Agent in Your Future? Nature.
  2. J. Hendler. (2001) Agents and the Semantic Web IEEE INTELLIGENT SYSTEMS 2001 IEEE Vol. 16No. 2; MARCH/APRIL 2001, pp. 30-37
  3. H. Stuckenschmidt and Timm, I. (2002) Adapting Communication Vocabularies using Shared Ontologies. Proceedings of the Second International Workshop on Ontologies in Agent Systems. 6-12
  4. H. Wache, T. Voegele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann and S. Huebner (2001) Ontology-Based Integration of Information - A Survey of Existing Approaches Proceedings of the IJCAI-01 Workshop on Ontologies and Information Sharing Seattle, USA, August 4-5, 2001. (108-118)
  5. Michel Klein (2001) Combining and relating ontologies: an analysis of problems and solutions. Proceedings of the IJCAI-01 Workshop on Ontologies and Information Sharing Seattle, USA, August 4-5, 2001. (108-118) (53-62)
  6. J. Heflin and J. Hendler (2001) A Portrait of the Semantic Web in Action IEEE INTELLIGENT SYSTEMS 2001 IEEE Vol. 16No. 2; MARCH/APRIL 2001, pp. 60-71
  7. Noy et al. (2001) Creating Semantic Web Contents with Protege-2000 IEEE INTELLIGENT SYSTEMS 2001 IEEE Vol. 16No. 2; MARCH/APRIL 2001, pp. 60-71

[ Top | Cursus | Literatuur | Werkcollege | Tentamen ]
Weekdag: Plaats: Tijd:
Wednesday KNG80, Room 108 14.00 - 17.00
Friday KNG29, Room 005 12.00 - 15.00


Week 1: Introduction
Week 2: Readings about the semantic web. Make a summary . Discussion 26/11
Week 3: Readings about searching the www. Make a summary. Discussion 3/12
Week 4: Readings about ontologies (papers + links).
Make a summary and make it available through the forum + discussion + questions.
Jitske, Trude and Rob make a summary about search engines and make it available through the forum.
find information about the technical aspects of search engines.
Bjorn and Sander: correct summary Semantic Web.
Install protegee and do the tutorial.
Discussion 10/12
Week 5: Readings about linguistic ontologies. Make a summary. Make a proposal for the project and a division of work + planning. Discussion 17/12
Week 6: Readings about semantic web languages. Make a summary. XML tutorial. Discussion 7/1
Week 7: Readings about agents and the semantic web. Make a summary. Discussion 14/1
Week 8: The project. Discussion 21/1
Week 9: The project. Wrap up

[ Top | Cursus | Literatuur | Werkcollege | Tentamen ]

Final Project: building an ontology for linguistic concepts (Week 1-11)

Language resources such as corpora, databases, as well as electronic dictionaries and grammars are being developed and made available to the linguistic community via the World-Wide Web. Quite frequently these resources have been built independently resulting in heterogeneous systems and in severe limits with respect to their appropriate and efficient use. Heterogeneity is found at two different levels: technical level (different hardware platforms, operating systems, etc.) and conceptual level (different data representations and data models for similar objects as well as semantic differences and ambiguities).

An interesting way to overcome the heterogeneity at the conceptual level makes extensive use of an ontology of linguistic notions. We will develop (fragments) of such an ontology and we will explore whether the signature which is at the basis of the HPSG theory (Pollard and Sag 1994) creates the basis for the creation of such an ontology.


  • Project = 50%
  • Summaries = 30%
  • Class participation = 20%

In order to pass the course you have to have at least a 5.5.
You can find the results of the course here.