Publication: Vision perceptually restores auditory spectral dynamics in speech

 

In this article, we demonstrate that the human perceptual system exploits the natural correlation between mouth shape and auditory signal frequency to facilitate speech perception. Due to the acoustics physics of the oral cavity, changing the shape of the mouth (i.e., its width or narrowness) can be used to predict frequencies of auditory signals (oral resonances; “formants”) during speech. For example, when the lips are protruded (visually narrow), the oral cavity is elongated, producing a lower resonant frequency, like extending the slide on a trombone. By contrast, when the lips are retracted (visually wide), a higher resonant frequency is produced. In our study, participants were able to use visual speech cues to perceptually recover auditory frequency cues that were digitally degraded in our experiments. This process appeared to occur automatically, even when participants did not recognize audiovisual speech cues as being speech-related. Altogether, our results suggest that the perceptual system uses natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to facilitate speech perception.