Aspects of Assimilation, session 4
Partial or Complete Assimilation

LOT Summer School 1998, Thursday 25 June 1998.


The goal of today's tasks is to make synthetic versions of utterances with and without assimilation. To this end, we will use a system using diphone speech synthesis, which is the basis of the Fluent Dutch system presented earlier.
We use the MBROLA diphone synthesizer, developed by Thierry Dutoit and Vincent Pagel (Faculté Polytechnique de Mons), in combination with a diphone database for Dutch (NL2, built by Arthur Dirksen and Ludmila Menert.

The MBROLA synthesiser is controlled by means of command files. These files contain the following information:

Information about the 53 phoneme and allophone symbols for this Dutch database is available in a separate table ( This is an example of a command file for the Dutch utterance Hallo!:
; Utterance: "Hallo!"
_ 100 100 120 
h 96 
A 48 
l 76 5 100 75 120 
o 224 25 85 
_ 100 40 70 
The first line is comment. Comment lines start with a semicolon (;). The other lines contain the following information:

More about diphones

Diphones are short fragments of speech, recorded and processed. When you are synthesising an utterance, the appropriate diphones are taken from the database, concatenated, given the requested duration and intonation (using a PSOLA-like procedure), and con verted to sound.
Hence, end users of the synthesizer have no control over the degree of assimilation in the output speech. We have to wait and hear to what extent the original speaker of the database has applied coarticulation or assimilation in the speech fragments. We c an, however, request certain phonemes in the phonetic transcription, thus forcing the synthesizer to 'complete' assimilation.


1. Log in in your Unix-account, make a new directory diphone and change to that directory:
	mkdir difoon
	cd difoon

2. Copy the following archive to your diphone directory:

	cp /www/users/Hugo.Quene/onderwijs/tns9798/difoon/difoon.tar .

3. Unpack the archive into separate files:

	tar xovf difoon.tar

4. Enter the following command:

	dsyn hallo
If all has gone well, then the MBROLA synthesizer will now speak the command file hallo.pho. Then the output sound file hallo.aiff is played. You will hear Hallo. You can play the audio file once more by entering the command
	playaiff hallo.aiff  --or--
	sfplay hallo.aiff    --or--
	usplay hallo.aiff  

1. A bad example

Copy the command file zoutzuur.pho to your diphone directory, and synthesise this file with the command dsyn zoutzuur. What is wrong with it? Inspect the command file, with a text editor or with the Unix command cat zoutzuur.pho.

2. You can do better

Open the command file zoutzuur.pho in a text editor. [*] Adjust the durations of the critical VCCV segments. Save the file, and synthesize it again. Do both realisations sound perfect to your ears? If not, go back to the point marked [*] above. W hat is the ratio between C and V2 durations, in both realisations? What is the sum of their durations?

3. Now on your own

Using the now perfect file zoutzuur.pho as an example (e.g. for the pitch contour), your task is now to make synthetic versions of all utterances which were measured in session 2.
Make versions both with (complete) assimilation, and without assimilation. (Use the phoneme table to determine the appropriate transcription symbols). Do this for corresponding 'viable' and 'unviable' cases.
Compare the synthetic versions with the natural ones. You can even specify the 'natural' segment durations in the command files. How does that affect the perceived segmentation?

Last updated on June 24, 1998, by Hugo Quené.