Assigning Timecodes to Video-related Textual Documents in OLIVE

Ielka F. van der Sluis (University of Twente)

The Olive project aims the development of a multilingual indexing tool 
for broadcast material based on speech recognition,which automatically 
produces indexes from the sound track of a program (television or 
radio). Such a tool allows multimedia archives to be searched by keywords 
and corresponding fragments to be retrieved.

This talk will report on the alignment module, which is one of the components of Olive. It assigns timecodes to non-time-coded textual documents that are describing the content of the video. Timecoding of these textual documents will increase the overall level of disclosure. Basis for the assignment is some simularity measure between the non-timed-coded texts and subtitle files or the transcripts from speech recognition.

The core of the alignment module is a generic algorithm that generates the links that are the basis for the insertion of timecodes into non-time-coded texts. The data used in the testing phases are closed-caption files of NOS news-broadcasts and the auto-cue files of these broadcasts. A series of experiments resulted in an algorithm enriched with a threshold related to the sentencelength, and using a high- and low-frequency term stoplist compiled from the time-coded text under consideration.

The talk will focus on the iterative development and testing phases, and the applied similarity and performance measures.