USC Repository :: Browsing by Author "Falk, Tiago H."

Browsing by Author "Falk, Tiago H."

Now showing 1 - 3 of 3

Fusion of auditory inspired amplitude modulation spectrum and cepstral features for whispered and normal speech speaker verification
(Academic Press, 2017-03-25) Sarria Paja, Milton; Falk, Tiago H.
Whispered speech is a natural speaking style that despite its reduced perceptibility, still contains relevant information regarding the intended message (i.e., intelligibility), as well as the speaker identity and gender. Given the acoustic differences between whispered and normally-phonated speech, however, speech applications trained on the latter but tested with the former exhibit unacceptable performance levels. Within an automated speaker verification task, previous research has shown that i) conventional features (e.g., mel-frequency cepstral coefficients, MFCCs) do not convey sufficient speaker discrimination cues across the two vocal efforts, and ii) multi-condition training, while improving the performance for whispered speech, tends to deteriorate the performance for normal speech. In this paper, we aim to tackle both shortcomings by proposing three innovative features, which when fused at the score level, are shown to result in reliable results for both normal and whispered speech. Overall, relative improvements of 66% and 63% are obtained for whispered and normal speech, respectively, over a baseline system based on MFCCs and multi-condition training.
Fusion of bottleneck, spectral and modulation spectral features for improved speaker verification of neutral and whispered speech
(Elsevier B.V., 2018-07-27) Sarria Paja, Milton; Falk, Tiago H.
Speech based biometrics is becoming a preferred method of identity management amongst users and companies. Current state-of-the-art speaker verification (SV) systems, however, are known to be strongly dependent on the condition of the speech material provided as input, and can be affected by unexpected variability presented during testing, such as with environmental noise or changes in vocal effort. In this paper, SV using whispered speech is explored, as whispered speech is known to be a natural speaking style with reduced perceptibility but containing relevant information regarding speaker identity and gender. We propose to fuse information from spectral, modulation spectral and so-called bottleneck features computed via deep neural networks at the feature- and score-levels. Bottleneck features have been recently shown to provide robustness against train/test mismatch conditions and have yet to be tested for whispered speech. Experimental results showed that relative improvements as high as 79% and 60% could be achieved for neutral and whispered speech, respectively, relative to a baseline system trained with i-vectors extracted from mel frequency cepstral coefficients. Results from our fusion experiments, show that the proposed strategies allow to efficiently use the limited resources available and to result in whispered speech performance inline with that obtained with normal speech.
Variants of mel-frequency cepstral coefficients for improved whispered speech speaker verification in mismatched conditions
(Institute of Electrical and Electronics Engineers Inc., 2017-10-26) Sarria Paja, Milton; Falk, Tiago H.
In this paper, automatic speaker verification using normal and whispered speech is explored. Typically, for speaker verification systems, varying vocal effort inputs during the testing stage significantly degrades system performance. Solutions such as feature mapping or addition of multi-style data during training and enrollment stages have been proposed but do not show similar advantages for the involved speaking styles. Herein, we focus attention on the extraction of invariant speaker-dependent information from normal and whispered speech, thus allowing for improved multi vocal effort speaker verification. We base our search on previously reported perceptual and acoustic insights and propose variants of the mel-frequency cepstral coefficients (MFCC). We show the complementarity of the proposed features via three fusion schemes. Gains as high as 39% and 43% can be achieved for normal and whispered speech, respectively, relative to the existing systems based on conventional MFCC features.

Browsing by Author "Falk, Tiago H."

Results Per Page

Sort Options