This talk will report on the alignment module, which is one of the components of Olive. It assigns timecodes to non-time-coded textual documents that are describing the content of the video. Timecoding of these textual documents will increase the overall level of disclosure. Basis for the assignment is some simularity measure between the non-timed-coded texts and subtitle files or the transcripts from speech recognition.
The core of the alignment module is a generic algorithm that generates the links that are the basis for the insertion of timecodes into non-time-coded texts. The data used in the testing phases are closed-caption files of NOS news-broadcasts and the auto-cue files of these broadcasts. A series of experiments resulted in an algorithm enriched with a threshold related to the sentencelength, and using a high- and low-frequency term stoplist compiled from the time-coded text under consideration.
The talk will focus on the iterative development and testing phases, and the applied similarity and performance measures.