Âé¶¹ÒùÔº

November 9, 2015

Researchers create automated tool for dialect analysis

Dartmouth scientists have created an automatic speech analysis tool that pushes the technological envelope for what types of sociolinguistic dialect research are possible.

Socio-phoneticians, who study how accents and vary in different communities, are often concerned with the sounds of vowels. Two people who have different accents, even within the United States, might produce their with markedly different resonance frequencies. These (also known as vowel formants) give linguists a precise, quantitative way to characterize accents. Previously, all analysis of vowel formants was done manually, but there has been recent interest in using to automate part of the process. One such program called (Forced Alignment & Vowel Extraction), which was developed at the University of Pennsylvania, automatically aligns a transcript with the speech and measures the formant values.

The bottleneck with such a program is the speech transcripts still need to be provided by humans, which makes it difficult to analyze large amounts of speech data. Given that automatic speech recognition, or ASR, is rapidly becoming more accurate, the Dartmouth researchers wanted to study whether it would be feasible to build a tool that automatically analyzes dialect features in speech data without requiring a human transcriber. It would then be possible to quickly analyze virtually limitless hours of recordings, such as videos from YouTube, publicly available archives and large-scale personal interviews.

Get free science updates with Science X Daily and Weekly Newsletters — to customize your preferences!

The Dartmouth researchers have developed a fully automated, open-access, user-friendly web application called (Dartmouth Linguistic Automation), which automatically generates transcriptions of uploaded data using recognition, filters out noisy tokens, and measures and plots formant frequencies, in formats convenient for linguistic analysis. Part of the system uses technology from the FAVE project at Penn. It also provides several options for users needing different levels of precision in their results.

The Dartmouth team has published DARLA-related work in and in the and presented a workshop at the conference last month.

"Fully automated vowel extraction methods still have a long way to go, but as ASR technologies continue to improve, we believe the DARLA system will be useful for more and more sociolinguistic research questions," says DARLA co-developer Jim Stanford, an associate professor and sociolinguist. "We anticipate that a large amount of sociolinguistic research in the future will eventually use fully automated methods like DARLA for measuring vowel data, and so our work helps take a step in that direction."

, the lead researcher on the project, designed, wrote and implemented the DARLA computational system beginning with her Neukom post-doctoral fellowship at Dartmouth and continuing since that fellowship. Dartmouth student Irene Feng also helped with the initial website development.

Provided by Dartmouth College

Load comments (0)

This article has been reviewed according to Science X's and . have highlighted the following attributes while ensuring the content's credibility:

Get Instant Summarized Text (GIST)

This summary was automatically generated using LLM.