In this section, we will first give a short characterization of the language varieties and the speakers who were recorded for our investigation. Next, we will present the nature of the recordings and the transcriptions which formed the basis for linguistic distance measurements.
Since our main interest was the Frisian language and its linguistic position within the Germanic language group we wished to represent this language as well as possible. For this reason, we included seven Frisian varieties, spread over the Frisian language area. Furthermore, our material contained eight Germanic standard languages. First, we will describe the Frisian varieties and next the standard languages.
As far as the Frisian varieties are concerned, we chose varieties from different parts of the province, both from the coastal area and from the inland. The varieties are spoken in different dialect areas according to the traditional classification (see below) and they represent different stages of conservatism. The precise choice of the seven varieties was determined by speaker availability for recordings in our vicinity and at the Fryske Akademy in Leeuwarden. In Figure 2, the geographical position of the seven Frisian language varieties in the province of Friesland is shown.
Due to the absence of major geographical barriers, the Frisian language area is relatively uniform. The major dialectal distinctions are primarily phonological. Traditionally, three main dialect areas are distinguished (see e.g. Hof, 1933; Visser, 1997): Klaaifrysk (clay Frisian) in the west, Wâldfrysk (forest Frisian) in the east and Súdwesthoeksk (southwest quarter) in the southwest. In our material Klaaifrysk is represented by the dialects of Oosterbierum and Hijum, Wâldfrysk by Wetsens and Westergeest, and Súdwesthoeksk by Tjerkgaast. Hindeloopen is in the area of Súdwesthoeksk. However, this dialect represents a highly conservative area. The phonological distance between Hindeloopen and the main dialects is substantial (van der Veen, 2001). Finally, our material contains the variety spoken in Leeuwarden (see note 1). This is an example of Town Frisian, which is also spoken in other cities of Friesland. Town Frisian is a Dutch dialect strongly influenced by Frisian but stripped of the most characteristic Frisian elements (Goossens, 1977).
Figure 2. The geographical position of the seven Frisian language varieties in the province of Friesland.
In addition to the Frisian dialects, the following eight standard languages were included: Icelandic, Faroese, Norwegian, Swedish, Danish, English, Dutch, and German. We had meant to include all standard Germanic languages in our material. However, due to practical limitations a few smaller languages were not included.
As for Norwegian, there is no official standard variety. The varieties spoken around the capital of Oslo in the southeast, however, are often considered to represent the standard language. We based the present investigation on prior research on Norwegian dialects (see Heeringa and Gooskens, 2003; Gooskens and Heeringa, submitted), and we chose the recording which to Norwegians sounded most standard, namely the Lillehammer recording12. It was our aim to select standard speakers from all countries, but it is possible that the speech of some speakers contains slight regional influences. The speakers from Iceland, the Faroe Islands and Sweden spoke the standard varieties of the capitals. The Danish speaker came from Jutland, the German speaker from Kiel, the English speaker from Birmingham and the Dutch speaker had lived at different places in the Netherlands, including a long period in the West during adolescence.
The speakers all read aloud translations of the same text, namely the fable ‘The North Wind and the Sun’. This text has often been used for phonetic investigations; see for example The International Phonetic Association (1949 and 1999) where the same text has been transcribed in a large number of different languages. A database of Norwegian transcriptions of the same text has been compiled by J. Almberg (see note 3). As mentioned in the previous section, we only used the transcription of Lillehammer from this database. In future, we would like to investigate the relations between Norwegian and other Germanic varieties, using the greater part of the transcriptions in this database. Therefore, our new transcriptions should be as comparable as possible with the existing Norwegian ones. To ensure this, our point of departure was the Norwegian text. This text consists of 91 words (58 different words) which were used to calculate Levenshtein distances (see Section 4). The text was translated word for word from Norwegian into each of the Germanic language varieties. We are aware of the fact that this may result in less natural speech: sentences were often syntactically wrong. However, it guarantees that for each of the 58 words a translation was obtained. The words were not recorded as a word list, but as sentences. Therefore in the new recordings words appear in a similar context as in the Norwegian varieties. This ensures that the influence of assimilation phenomena on the results is as comparable as possible.
Most new recordings were transcribed phonetically by one of the authors. To ensure consistency with the existing Norwegian transcriptions, our new transcriptions were corrected by J. Almberg, the transcriber of the Norwegian recordings. In most cases we incorporated the corrections. The transcription of the Faroese language was completely done by J. Almberg. The transcriptions were made in IPA as well as in X-SAMPA (eXtended Speech Assessment Methods Phonetic Alphabet). This is a machine-readable phonetic alphabet, which is also readable by people. Basically, it maps IPA-symbols to the 7 bit printable ASCII/ANSI characters13. The transcriptions were used to calculate the linguistic distances between varieties (see Section 4).