The Romance Languages Database
For the languages apart from French, Portuguese and Italian it was impossible to find any phonetic transcriptions in the libraries we checked (Letterenbibliotheek Utrecht, Centrale Bibliotheek Utrecht, Centrale Bibliotheek Apeldoorn). The internet is too expanded to find any helpful information on phonetic transcriptions. To be able to compare the languages in a proper way, we decided to use the Swadesh list that we found in another database made last year. The words in the Swadesh list are basic words used in every language.
All the languages we added to our database are, as we already said, Romance languages. What we want to find out, is how similar they are, which languages are most similar, which word groups are most similar etc.
After finding the dictionaries, it still took a lot of time to translate the Swadesh list into the different languages. Especially the phonetic transcription took a lot of time.
Creating the database
Creating the actual database
The core of the database is the table with the orthographical transcription of Latin. This table contains the orthographic translation of the standard Swadish-list in Latin. Every entry in this table has its own identification number (ID). This table also consists of columns with the syntactic-tagging and with the original Swadesh-list in English. This table is the spill because the identification number is a unique number and is used trough the entire database to make relations between tables.
There is also a table called Commentaar. This table presents the comment which is present in the original Swadesh-list. This comment makes the original words unambiguous. All through the entire database the entry’s are linked with the ID from the Latin table. We have chosen for a separate table because there is not a lot of comment and otherwise there will be a lot of empty records in the Latin table. Another reason to make a separate table is the possibility to insert comment for a different language. The table Commentaar can been seen as metadata, just like the table Talen. This table has nothing to do with the current research but gives additional information about the translator and the sources used for the transcription.
The other tables in the database represents the transcriptions of a language. The design of these tables is the same: a column for the ID, one for the orthographic transcription and when it was available also a column for the phonetic transcription. Microsoft Access also needs a extra column. This column contains the name RealID. The idea behind this structure is that the translator looks in the column ID for a Latin word and translates it. The data can been entered directly in the righthand table but it is also possible to use a form.
With the collected data is it possible to make queries about similarities or inequalities of the Romance languages.
We made some queries to see which words matched exactly to Latin in which languages. Italian is the language that has most exact matches and Romanian has no exact matches at all. We shall give a table with 3 of the words that match. We leave out Romanian because it does not match any words.
We will show how many of the words matched for each language. And obviously this is compared to Latin.
We expected that there would be more words that are exactly the same. The
words do look a lot alike, just not many are exactly the same. We think that the
language are obviously family, but did have their separate changes troughout the
Another conclusion we reached was that Latin influenced all languages. Although there are not a lot of perfect matches, all the languages show some similarities with Latin. We will show this by giving a table with some perfect matches and very similar words.
It was difficult to compare the phonetic transcriptions to each other because you need similar spelled words to see the difference. We did see that French hardly ever looks similar to the Italian and Portuguese words. Italian and Portuguese always look more alike but there are hardly any perfect matches either. We chose the words that looked most alike and compared the phonetic transcriptions. This is the table:
We could not figure out how to insert the Lucida Sans Unicode font into the html-file so these were the only examples we can give in the table. What did strike us that Italian seems to have longer vowels than the others, and the words have more /e/ sounds as medial and final sounds of the word. Portuguese words have schwa sounds or nothing where Italian has the /e/ sound.
Thoughts on the databaseWe want to give our thoughts about how the database works and what it looks like. What we considered a problem was that it was difficult to compare the results because the database gives a choice between the exact matches in the different languages or the entire lists of words in one table. It would have been convenient if it had been possible to view words that are very similar in one table.
Apart from this we think making a database to compare languages is useful. It gives you the opportunity to find information about languages that look alike. This could be an enormous advantage if you are doing research for certain phenomena in languages.
In our database we also included phonetic transcriptions. As we already showed in our results it is very difficult to make conclusions on just the sight of the phonetic transcriptions. It would be timeconsuming to draw relevant conclusions, because we can only compare words that are exactly the same or very similar. This was beyond our timescope.
Conclusions about the courseWe have now come to the end of our course Databestanden. We enjoyed doing the work eventhough it took us a fair amount of time. The exercises we had to do in the first 5 weeks of the course took quite some time. The final assignment: making the database, was very timeconsuming, but it was quite nice. The translating of the Swadesh list and the development of the database took most of the time. Viewing the results was rather difficult, so it took us far more time than we expected.
Overall we can see that we learned a lot in this course. We did not know anything about corpora and databases and now we do know something about it. What we liked as well is that we worked with HTML a lot and learned a lot about that as well.
We hope we did what you expected for the course. We spent a lot of time on it and enjoyed it.
Ekatherina, Albert, Khiet en Gina.