New Discoveries: Pashto Language and Intonation Patterns

Last month, I presented cutting-edge research on the Pashto language, which is a language from Afghanistan and Pakistan, at Interspeech in Stockholm. The paper, titled Pashto Intonation Patterns, was co-written with Appen’s Director of Linguistic Services, Judith Bishop, and Senior Linguistic Project Manager, Mim Corris. It was based on data used in a project that began in Autumn 2014 and was completed this Spring. Intonation is the movement of pitch (or tone) in a speaker’s voice. When we speak, our voices move between low and high pitch in ways that pattern differently depending on the language – though there are some universals, such as a very low tone at the end of a statement to signal completion. The paper represents a first acoustic-phonetic analysis of Pashto intonation patterns, using spontaneous speech data. We analysed a hand-labelled Pashto speech data set containing spontaneous conversations to propose an inventory of Pashto intonation patterns. We presented the basic intonation patterns observed in the language, and we also addressed the relationship between pitch accent (where a high or low tone is associated with a certain level of emphasis) and part of speech (PoS), which was also annotated for each word in the data set. Rhythm and pitch were annotated using a simplified version of the Rhythm and Pitch (RaP) labelling system (Dilley & Brown, 2005), which was chosen for its capacity to capture both rhythmic and intonational aspects of speech in parallel. The data set was annotated in two passes. First, two trained Pashto native speakers with a solid background in acoustic phonetics annotated rhythm. Then, two linguists trained in pitch annotation annotated the pitch contour of the utterances. IAC (Inter-Annotator Consistency) levels were calculated at regular intervals. The results show that Pashto intonation patterns are similar to Persian, a better-described and closely-related language, as well as reflecting common intonation patterns such as falling tone for statements and WH-questions (what, when, where, how, etc.), and yes/no questions ending in a rising tone. The data also shows that the most frequently used intonation pattern in Pashto is the so-called hat pattern, consisting in a rising and a following pitch movement, connected by a plateau (see figure). The phonetic realisation of contrastive focus (that is, how the words are emphasised in speech when two things are contrasted) appears to be conveyed with the same acoustic cues as in Persian, with a higher pitch excursion and longer duration of the stressed syllable of the word in focus. The data also suggests that post-focus compression (PFC) is present in Pashto. That is, following a contrastively stressed word, the pitch tends to be lower and flatter – thus more strongly highlighting the preceding stressed word. The distribution of pitch accent is quite free both in Persian and Pashto, but there is a stronger association of accents with content words (such as nouns or adjectives) than with function words (such as prepositions), as is typical of stress-accent languages. Our preliminary examination of the association between frequency of accentuation and PoS allowed us to confirm the hypothesis that pitch accents are attracted by the PoS categories that may be considered most information-rich, namely nouns, adjectives and adverbs. Our paper provides confirmation of the presence of typical intonation patterns observed in better-known languages, and a basis for future research into the phonetic realisation of accent and the association between accent and part of speech in Pashto. This research represents the breadth, variety, and in-depth expertise we bring to the linguistic community. Interested in reading the full paper? Email Luca.
Website for deploying AI with world class training data