Repository logo
  • English
  • Español
  • Log In
    New user? Click here to register. Have you forgotten your password?
Repository logo
  • Communities & Collections
  • All
  • English
  • Español
  • Log In
    New user? Click here to register. Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Falk, Tiago H."

Now showing 1 - 3 of 3
Results Per Page
Sort Options
  • No Thumbnail Available
    Item
    Fusion of auditory inspired amplitude modulation spectrum and cepstral features for whispered and normal speech speaker verification
    (Academic Press, 2017-03-25) Sarria Paja, Milton; Falk, Tiago H.
    Whispered speech is a natural speaking style that despite its reduced perceptibility, still contains relevant information regarding the intended message (i.e., intelligibility), as well as the speaker identity and gender. Given the acoustic differences between whispered and normally-phonated speech, however, speech applications trained on the latter but tested with the former exhibit unacceptable performance levels. Within an automated speaker verification task, previous research has shown that i) conventional features (e.g., mel-frequency cepstral coefficients, MFCCs) do not convey sufficient speaker discrimination cues across the two vocal efforts, and ii) multi-condition training, while improving the performance for whispered speech, tends to deteriorate the performance for normal speech. In this paper, we aim to tackle both shortcomings by proposing three innovative features, which when fused at the score level, are shown to result in reliable results for both normal and whispered speech. Overall, relative improvements of 66% and 63% are obtained for whispered and normal speech, respectively, over a baseline system based on MFCCs and multi-condition training.
  • No Thumbnail Available
    Item
    Fusion of bottleneck, spectral and modulation spectral features for improved speaker verification of neutral and whispered speech
    (Elsevier B.V., 2018-07-27) Sarria Paja, Milton; Falk, Tiago H.
    Speech based biometrics is becoming a preferred method of identity management amongst users and companies. Current state-of-the-art speaker verification (SV) systems, however, are known to be strongly dependent on the condition of the speech material provided as input, and can be affected by unexpected variability presented during testing, such as with environmental noise or changes in vocal effort. In this paper, SV using whispered speech is explored, as whispered speech is known to be a natural speaking style with reduced perceptibility but containing relevant information regarding speaker identity and gender. We propose to fuse information from spectral, modulation spectral and so-called bottleneck features computed via deep neural networks at the feature- and score-levels. Bottleneck features have been recently shown to provide robustness against train/test mismatch conditions and have yet to be tested for whispered speech. Experimental results showed that relative improvements as high as 79% and 60% could be achieved for neutral and whispered speech, respectively, relative to a baseline system trained with i-vectors extracted from mel frequency cepstral coefficients. Results from our fusion experiments, show that the proposed strategies allow to efficiently use the limited resources available and to result in whispered speech performance inline with that obtained with normal speech.
  • No Thumbnail Available
    Item
    Variants of mel-frequency cepstral coefficients for improved whispered speech speaker verification in mismatched conditions
    (Institute of Electrical and Electronics Engineers Inc., 2017-10-26) Sarria Paja, Milton; Falk, Tiago H.
    In this paper, automatic speaker verification using normal and whispered speech is explored. Typically, for speaker verification systems, varying vocal effort inputs during the testing stage significantly degrades system performance. Solutions such as feature mapping or addition of multi-style data during training and enrollment stages have been proposed but do not show similar advantages for the involved speaking styles. Herein, we focus attention on the extraction of invariant speaker-dependent information from normal and whispered speech, thus allowing for improved multi vocal effort speaker verification. We base our search on previously reported perceptual and acoustic insights and propose variants of the mel-frequency cepstral coefficients (MFCC). We show the complementarity of the proposed features via three fusion schemes. Gains as high as 39% and 43% can be achieved for normal and whispered speech, respectively, relative to the existing systems based on conventional MFCC features.

Higher Education Institution subject to inspection and surveillance by the Ministry of National Education.
Legal status granted by the Ministry of Justice through Resolution No. 2,800 of September 2, 1959.
Recognized as a University by Decree No. 1297 of 1964 issued by the Ministry of National Education.

Institutionally Accredited in High Quality through Resolution No. 018144 of September 27, 2021, issued by the Ministry of National Education.

Ciudadela Pampalinda

Calle 5 # 62-00 Barrio Pampalinda
PBX: +57 (602) 518 3000
Santiago de Cali, Valle del Cauca
Colombia

Headquarters Centro

Carrera 8 # 8-17 Barrio Santa Rosa
PBX: +57 (602) 518 3000
Santiago de Cali, Valle del Cauca
Colombia

Palmira Section

Carrera 29 # 38-47 Barrio Alfonso López
PBX: +57 (602) 284 4006
Palmira, Valle del Cauca
Colombia

DSpace software copyright © 2002-2025 LYRASIS

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback

Hosting & Support