On the Boundaries of Phonology and Phonetics

1.2.Categorical nature of intonational contrasts

By intonation or speech melody we mean the pattern of rises and falls in the time-course of the pitch of spoken sentences. Melodic patterns in speech vary systematically across languages, and even within languages across dialects. The cross-linguistic differences can be parameterized and described in much the same way as has been done for the segmentals in language: a set of distinctive features defines an inventory of abstract units, which can be organized in higher-order units subject to wellformedness constraints. Moreover, intonational contrasts are used to perform grammatical functions that can also be expressed by lexico-syntactic means, such as turning statements into questions, and putting constituents in focus. For these reasons it has become widely accepted that intonation is part of the linguistic system (Ladd, 1996: 8). Yet, there have always been adherents of the view that speech melody should be considered as something outside the realm of linguistics proper, i.e., that intonation is a para­linguistic pheno­menon at best, to be treated on a par with the expression of attitudes or emotions. Typically, the communication of emotions (such as anger, fear, joy, surprise) or of attitudes (such as sarcasm) is non-categorical: the speaker shows himself more or less angry, fearful, or sarcastic in a continuous, gradient fashion.

A relatively recent insight, therefore, is that a division should be made in melodic phenomena occurring in speech between linguistic versus paralinguistic contrasts. Obviously, only the former but not the latter type of phenomena should be described by the grammar and explained by linguistic theory. This, however, begs the question how the difference can be made between linguistic and paralinguistic phenomena within the realm of speech melody.3 Ladd & Morton (1997) were the first to suggest that the traditional diagnostic for categorical perception should be applicable to intonational categories in much the same ways as it works for segmental contrasts. Only if a peak in the discrimination function is found for adjacent members on a tone continuum straddling a boundary between tonal categories, are the categories part of the linguistic system, i.e., phonological categories. If no categorical perception of the tone categories can be established, the categories are ‘just’ the extremes of a paralinguistic or phonetic tonal continuum. Ladd & Morton tested the traditional diagnostic on a tone continuum between normal and emphatic accent in English and noted that it failed. This – to me – indicates that the contrast is not part of the phonology of English.

Remijsen & van Heuven (1999, 2003) tested the traditional diagnostic on a tone continuum between ‘L%’ and ‘H%’ in Dutch, and showed that indeed there was a discrimination peak for adjacent members along the continuum straddling the boundary – indicating that the ‘L%’ and ‘H%’ categories are part of the phonology of Dutch. At the same time, however, we had to take recourse to listener-individual normalization of the category boundary, a complication that is not generally needed when dealing with contrasts in the segmental phonology.4

Van Heuven & Kirsner (2002) suggested that the relatively weak categorical effects in Remijsen & van Heuven could have been the result of an incorrect subdivision of the ‘L%’ to ‘H%’ tone range. Van Heuven & Kirsner (2002) showed that Dutch listeners were perfectly able to categorize a range of final pitches between low and high in terms of three categories, functionally denoted as command intonation, continuation, and question. However, we did not run the full diagnostic involving both identification and discrimination procedures. Moreover, Van Heuven & Kirsner forced their listeners to choose between three response alternatives, viz. command, conditional and question. Although the extremes of the range, i.e. command versus question are unchallenged categories, it may well be the case that the conditional is not necessarily distinct from the question type. After all, in the grammar developed by ‘t Hart, Collier & Cohen (1990) any type of non-low terminal pitch falls into the same category, indicating non-finality. It occurred to us that we should take the precaution to run the experiment several times, using different response alternatives, such that two separate binary (‘command’ ~ ‘no command’ and ‘question ~ ‘no question’) response sets as well as the ternary response set (‘command’ ~ ‘conditional’ ~ ‘question’) were used by the same set of listeners. If the intermediate ‘conditional’ response category does constitute a clearly defined notion in the listeners’ minds, the binary and ternary divisions of the stimulus range should converge on the category boundaries.

The present paper seeks to remedy the infelicities of Van Heuven & Kirsner (2002). However, before I deal with the experiments, it is necessary to introduce the inventory of the domain-final boundary configurations that can be found in Dutch.

1.3.Dutch domain-final boundary tones

Over the past decades a major research effort has been spent on the formal description of the sentence melody of Dutch. In the present paper we concentrate on one small part of the intonation system of Dutch: the options that are available to the speaker to terminate an intonation phrase. It has become customary to model the intonation system of a language as a hierarchically organized structure in which the tonal primitives (or ‘atoms’) are combined into tonal configurations, which in turn combine into intonation phrases. One or more of such intonation phrases are combined into an utterance, which may combine with other utterances to form a prosodic paragraph. The intonation phrase (henceforth IP), then, is situated roughly in the middle of the prosodic hierarchy. Note that a short utterance may consist of just one IP. An IP is characterized as a stretch of speech between two IP boundaries, i.e., a break in the segment string that is signaled by either a pause (physical interruption of the sound stream), pre-boundary lengthening and/or by a boundary-marking tone. If the boundary is sentence medial, then yet another IP must follow in order to finish the utterance.

The first explicit and experimentally verified grammar of Dutch intonation was develop­ed at the Institute for Perception Research at Eindhoven (‘t Hart et al., 1990; Rietveld & van Heuven, 2001: 263-270). This grammar models the sentence melody of Dutch as a system of two gently declining reference lines, nominally 6 semitones (half an octave) apart, between which the pitch rises and falls in a limited number of patterns. The grammar provides for three different ways in which an IP may be terminated: (i) on the low reference line (‘0’), (ii) on the high reference line (‘’), or (iii) by executing a steep pitch rise (‘2’). Although the grammar is not completely explicit on this point, it appears that the offset of rise ‘2’ may exceed the level of the high reference line, specifically when the rise starts at the high reference line. The grammar then allows IPs to end at three different pitches: low, high, and extra high.

A more recent account of Dutch intonation is given by Gussenhoven and co-workers (Gussenhoven, Rietveld & Terken, 1999; Rietveld & van Heuven, 2001: 270-277). This model is constructed along the principles adopted by autosegmental intonologists, in which a sentence melody is basically a sequence of tonal targets of two types: ‘H’ (high) and ‘L’ (low). The ToDI system (Transcription of Dutch Intonation), which is an inventory of tonal configurations for surface-level transcriptions of Dutch sentence melodies using the autosegmental H/L notation format, provides three symbols for marking IP boundaries: (i) ‘L%’, i.e., the final pitch target extends below the baseline, (ii) ‘%’, i.e., the absence of a tonal IP boundary marker, and (iii) ‘H%’, i.e., the final pitch is higher than the preceding pitch.5 For details of the ToDI transcription system I refer to the ToDI website ( or to Rietveld & van Heuven (2001: 399-401).

Remijsen & van Heuven (1999, 2003) report an experiment which sought to establish the perceptual boundary between sentence-final statement and question intonation. They did this by varying the pitch configuration on the utterance-final syllable of the verb-less phrase De Dennenlaan(?) ‘Pine Lane(?)’ between a fall and a steep rise in eleven perceptually equal steps. Listeners were then asked to decide for each of the eleven pitch patterns whether they perceived it as a statement or a question. At the time we tacitly assumed that the continuum spanned just two pragmatic categories, i.e. statement versus question, and that there was no relevant intermediate category that could be interpreted as ‘non-finality’. In fact, Kirsner & van Heuven (1996) suggested a single abstract meaning for the non-low tonal category: ‘appeal (by the speaker to the hearer)’, asking for the hearer’s continued attention or for a verbal response to a question or a non-verbal compliance with a request. However, Caspers (1998) suggested that there is a functional difference between the non-tonal boundary (‘%’) following an earlier ‘H*’ target and the high boundary (‘H%’) following an earlier ‘H*’. She synthesized stimuli in which the terminal pitch after the accent-marking ‘H*’ was followed by either ‘H%’ (where the final pitch was raised further) or just % (where the pitch remained high but level after the accent). Her results indicate that listeners unequivocally expect the speaker to continue after the ‘H* ... %’ configuration, in contradistinction to the ‘H* ... H%’ pattern, for which the responses were equally divided between ‘same speaker will continue’ and ‘interlocutor will take over (with a response)’.

Note that the ‘%’ tone-less boundary as studied by Caspers is found only after a preceding H* accent. Strictly speaking, then, the ‘%’ boundary cannot be used as an intermediate category in between ‘L%’ and ‘H%’ when the preceding pitch is low. After ‘L’, any rise in pitch, whether strong or intermediate, is a perceptually relevant change in pitch, which must be coded by an ‘H%’ target. On the other hand, this formal constraint is in the way of an attractive generalization which would allow us to view the high level pitch (‘H* ... %’) pattern as a surface realization of the ‘H*L...%’ pattern from which the L target has been deleted – in much the same way as was suggested by Haan (2002) in order to account for the functional similarity between the ‘H*...H%’ and the ‘H*L…H%’ interrogative patterns, as exemplified in Figure 2.

Figure 2.
. Underlying tonal shape (dotted) and surface realization after ‘L’-deletion (solid) of an ‘H*L … H%’ sequence.
There seems to be a mismatch between the functions expressed by Caspers’ ‘%’ and ‘H%’ after ‘H*’. If we assume an iconic relationship between the terminal pitch of the utterance and the degree of submissiveness of the speaker towards the hearer, then we would reason that ‘H%’ should make more of an appeal to the hearer (expressing greater submissiveness) than just ‘%’. On the other hand, answering a question seems a bigger favor on the part of the hearer than merely waiting for the speaker to continue the utterance. It could be the case, of course, that even the highest terminal pitches used by Caspers were not high enough to elicit unambiguous ‘other speaker will take over’ (i.e. ‘question’) responses. Also, it is unclear if the unambiguous ‘same speaker will continue’ response crucially depends on a flat stretch of high declination (as is the case after an ‘H*’ accent) or if any terminal pitch of intermediate height would yield the same response.

In Caspers’ analysis the ‘%’ boundary – and arguably an ‘L … H%’ sequence with a moderately high terminal pitch – unambiguously signals continuation. This category would then be expected to be firmly represented in the listener’s cognitive system. Varying the terminal pitch from low to extremely high should then elicit two well-defined categories: (i) unambiguous statement for low pitches, (ii) unambiguous continuations for intermediate terminal pitches, and (iii) a poorly defined or non-unique interrogative category, which is also compatible with a continuation reading.

At this time, then, we do not know whether two or three formal tone categories should be postulated in IP-final position. It seems that the status of ‘L%’ as a linguistic category is unchallenged but the non-low part of the IP-final tone range is very much a matter of debate. Does the non-low part of the range form a continuum expressing lesser or greater appeal by the speaker in a paralinguistic manner, or should this part of the range be split into two discrete phonological categories, each expressing a distinct meaning of its own (i.e. ‘continuation’ ~ ‘question’, or – even worse – into two categories of which one is specific for ‘continuation’ and the other underspecified and compatible with both ‘question’ and ‘continuation’? These meanings, and a possible way of testing the categorical nature of tonal contrasts expressing them, are the topic of the next section.

