Empowering a Community and Enabling Linguistic Research

Through a partnership with the ARC Centre of Excellence for the Dynamics of Language (CoEDL), our team of linguists works with researchers at several universities across Australia. We use our technical and linguistic skills to further their research and community projects. Many of us are former academics so we love being able to bridge the two worlds of academia and industry. One ongoing project is with Dr. Hannah Sarvasy, a Research Fellow at the Australian National University. Dr. Sarvasy works with Nungon, a language spoken by about 1000 people in a remote area in northeast Papua New Guinea. She has recently published a reference grammar of Nungon, and is currently working on a longitudinal study of child acquisition of Nungon. Child language acquisition is an exciting area of research that explores fundamental questions about the human capacity for language, and how language is linked to cognition and cultural practices. The majority of research into child language acquisition has been in a handful of major languages – English dominates, and there are also many studies of Spanish, French, German, Mandarin, and Japanese. When we look beyond these well-known languages, we can learn a lot about how children in different cultures acquire different languages. Papua New Guinea is home to 600-800 unique languages, or roughly 10% of the world’s living languages. Yet there has been only one previous study of how children acquire a language of Papua New Guinea. The Nungon child language acquisition study is one of three currently in progress, and is therefore an important contribution to this larger goal of diversifying child language research. One of the most exciting facets of Dr. Sarvasy’s research is the way she gets the community of speakers involved. While she’s not in the field, the Nungon team makes regular recordings of the children to track their language development, backs them up to their computers, and transcribes the recordings in Nungon. Quite an impressive task in a region with no roads or electricity! Furthermore, they’re working on computers with English interfaces, a language they don’t speak or understand. Here our combined skills in linguistics and information technology come into play again. We are assisting Dr. Sarvasy with the technical aspects of localising some software in the Nungon language. This means that the software will present its menus and dialogs and other text items in Nungon instead of English, meaning so that members of the community will not face the double hurdle of mastering both computer technology and the English language at the same time. We have already researched suitable pieces of software and obtained for those applications the list of strings which Dr. Sarvasy’s Nungon consultants need to translate, as well as advising on issues such as the format of the translated strings and what needs to be done with them to make them appear when the software is run. This, potentially very large, project is in its early days, and to begin with we are concentrating on the Notepad++ text editor and Classic Shell, an alternative, configurable interface to the Windows Start Menu and Windows Explorera simplified Windows interface called Classic Shell. We already have Notepad++ substantially localised; , and here’s a screen shot of the menu bar and File menu in Nungon: Once the recordings of the children have been transcribed and sent back to Australia, the files need to be transformed from their raw state into carefully annotated, translated, and sound-linked files. Transforming and annotating language data is what we do best, so we were glad to take on this challenge. Using a combination of custom-built scripts and our existing methods, we were able to automate much of this process, saving many hours of manual work. Dr. Sarvasy has now started to make this data publicly available through CHILDES. CHILDES is an important collection of child language data that includes many languages from around the world. It was established in the 1990s as increased computational power meant linguists could work with larger and larger amounts of data. Nungon is the first language from the entire Pacific region to be included in CHILDES. There is more fascinating information about the history and geography of the Uruwa River Region, the Nungon language, and Dr. Sarvasy’s project over on the CHILDES website: http://childes.talkbank.org/access/Other/Nungon/Sarvasy.html
Website for deploying AI with world class training data