000 | 00000nam c2200205 c 4500 | |
001 | 000045841301 | |
005 | 20150826162359 | |
007 | ta | |
008 | 150628s2015 ulkad bmAC 000c eng | |
040 | ▼a 211009 ▼c 211009 ▼d 211009 | |
085 | 0 | ▼a 0510 ▼2 KDCP |
090 | ▼a 0510 ▼b 6YD36 ▼c 294 | |
100 | 1 | ▼a 유인철 ▼g 兪仁哲 |
245 | 1 0 | ▼a Robust voice activity detection using formant frequencies / ▼d Inchul Yoo |
260 | ▼a Seoul : ▼b Graduate School, Korea University, ▼c 2015 | |
300 | ▼a viii, 83장 : ▼b 삽화, 도표 ; ▼c 26 cm | |
500 | ▼a 지도교수: 陸東錫 | |
502 | 1 | ▼a 學位論文(博士)-- ▼b 高麗大學校 大學院 : ▼c 컴퓨터·電波通信工學科, ▼d 2015. 8 |
504 | ▼a 참고문헌: 장 79-83 | |
530 | ▼a PDF 파일로도 이용가능; ▼c Requires PDF file reader(application/pdf) | |
653 | ▼a voice activity detection | |
776 | 0 | ▼t Robust Voice Activity Detection Using Formant Frequencies ▼w (DCOLL211009)000000060022 |
900 | 1 0 | ▼a Yoo, In-chul, ▼e 저 |
900 | 1 0 | ▼a 육동석 ▼g 陸東錫, ▼e 지도교수 |
900 | 1 0 | ▼a Yook, Dong-suk, ▼e 지도교수 |
945 | ▼a KLPA |
Electronic Information
No. | Title | Service |
---|---|---|
1 | Robust voice activity detection using formant frequencies (29회 열람) |
View PDF Abstract Table of Contents |
Holdings Information
No. | Location | Call Number | Accession No. | Availability | Due Date | Make a Reservation | Service |
---|---|---|---|---|---|---|---|
No. 1 | Location Science & Engineering Library/Stacks(Thesis)/ | Call Number 0510 6YD36 294 | Accession No. 123052365 | Availability Available | Due Date | Make a Reservation | Service |
Contents information
Abstract
Voice activity detection (VAD) can distinguish human speech from other sounds. Various applications?including speech coding and speech recognition?can benefit from VAD. To accurately detect voice activity, the algorithm must take into account the characteristic features of human speech and/or background noise. For many real-life applications, noise can frequently occur in an unexpected manner, and it is therefore difficult to accurately determine the characteristics of noise in such situations. As a result, robust VAD algorithms that are less dependent on correct noise estimates are more desirable for real-life applications. Formants are the major spectral peaks of human voice and are highly useful for distinguishing human vowel sounds. Because of the characteristics of their spectral peaks, formants are likely to survive in a signal after severe corruption by noise, making them attractive features for voice activity detection under low signal-to-noise ratio (SNR) conditions. However, nonrelevant spectral peaks from background noise make it difficult to accurately extract formants from noisy signals. In this paper, a simple formant-based VAD algorithm is proposed that overcomes the problem of formant detection under conditions with severe noise. The proposed method has much faster processing time and outperforms standard VAD algorithms under various noise conditions. The robustness against various types of noise and the light computational load of the proposed method make it suitable for various applications.
Table of Contents
CHAPTER 1 INTRODUCTION 1 CHAPTER 2 RELATED WORKS 5 2.1 Speech-Related Features 7 2.1.1 Energy and zero-crossing rate (ZCR) 7 2.1.2 Spectral entropy 9 2.1.3 Band-partitioned spectral entropy 10 2.2 Statistical Methods 12 2.2.1 Likelihood ratio test (LRT)-based method 12 2.2.2 Distributional modeling of speech signals 14 2.2.3 Parametric representation of speech signals 15 2.3 G.729 Annex.B Algorithm 16 2.3.1 Feature extraction 17 2.3.2 Background noise parameter estimation 19 2.3.3 Multiboundary VAD decision 20 2.3.4 VAD decision smoothing 22 2.4 ETSI AMR Option 1 Algorithm 23 2.4.1 Feature extraction 24 2.4.2 Background noise parameter estimation 24 2.4.3 Initial VAD decision 25 2.4.4 Hang-over addition 25 2.5 ETSI AMR Option 2 Algorithm 26 2.5.1 Feature extraction 27 2.5.2 Background noise parameter estimation 28 2.5.3 VAD decision 29 2.5.4 Hang-over addition 31 2.6 Summary 33 CHAPTER 3 IN-DEPTH ANALYSIS OF SIGNAL CORRUPTIONS BY NOISES 34 3.1 Analysis of Spectral Peaks 36 3.2 Vector Distance Metrics 39 3.2.1 Unnormalized vector distance metric 39 3.2.2 Normalized vector distance metric by total energies 41 3.2.3 Normalized vector distance metric by maximum energies 43 3.3 Spectral Peak-Based Metric 46 3.3.1 Direct comparison of spectral peak bands 46 3.3.2 Peak extraction-based approach 48 3.4 Summary 50 CHAPTER 4 DIRECT SIMILARITY COMPUTATION BETWEEN PEAK SIGNATURE AND CORRUPTED SPECTRUM 51 4.1 Peak Valley Difference (PVD) 52 4.1.1 Analysis of differences in average energy 52 4.1.2 VAD using average energy differences 54 4.1.3 Remarks on PVD algorithm 55 4.2 Peak-Neighbor Difference (PND) 56 4.2.1 VAD using formant frequencies 56 4.2.2 Band-limited computation for increased robustness against noises 58 4.2.3 Threshold calculation and post processing 60 CHAPTER 5 EXPERIMENTS 61 5.1 Experimental Conditions 61 5.1.1 Data preparation 61 5.1.2 Evaluation metrics 62 5.1.3 Noise mixing using FaNT 63 5.1.4 Baseline systems 64 5.1.5 Test sets 64 5.2 Aurora-2 Results 66 5.2.1 Averaged accuracy by noise type 66 5.2.2 Averaged accuracy by SNR level 67 5.3 NOISEX-92 Results 68 5.3.1 Averaged accuracy by noise type 68 5.3.2 Averaged accuracy by SNR level 69 5.4 Music Results 70 5.4.1 Averaged accuracy by noise type 70 5.4.2 Averaged accuracy by SNR level 71 5.5 Contours of VAD algorithms 72 5.6 Computational overheads 75 CHAPTER 6 CONCLUSION 77