000 | 00000nam c2200205 c 4500 | |
001 | 000046071897 | |
005 | 20210326140855 | |
007 | ta | |
008 | 201231s2021 ulkd bmAC 000c eng | |
040 | ▼a 211009 ▼c 211009 ▼d 211009 | |
041 | 0 | ▼a eng ▼b kor |
085 | 0 | ▼a 0510 ▼2 KDCP |
090 | ▼a 0510 ▼b 6D36 ▼c 1120 | |
100 | 1 | ▼a 윤현욱, ▼g 尹現煜 |
245 | 1 0 | ▼a Audio dequantization in flow-based neural vocoder for the accurate waveform generation / ▼d 尹現煜 |
260 | ▼a Seoul : ▼b Graduate School, Korea University, ▼c 2021 | |
300 | ▼a 26장 : ▼b 도표 ; ▼c 26 cm | |
500 | ▼a 지도교수: 이성환 | |
502 | 0 | ▼a 학위논문(석사)-- ▼b 고려대학교 대학원: ▼c 컴퓨터·전파통신공학과, ▼d 2021. 2 |
504 | ▼a 참고문헌: 장 26-33 | |
530 | ▼a PDF 파일로도 이용가능; ▼c Requires PDF file reader(application/pdf) | |
653 | ▼a 음성 생성 모델 | |
776 | 0 | ▼t Audio Dequantization in Flow-based Neural Vocoder for the Accurate Waveform Generation ▼w (DCOLL211009)000000235776 |
900 | 1 0 | ▼a 이성환, ▼g 李晟瑍, ▼e 지도교수 |
900 | 1 0 | ▼a Yoon, Hyun Wook, ▼e 저 |
945 | ▼a KLPA |
Electronic Information
No. | Title | Service |
---|---|---|
1 | Audio dequantization in flow-based neural vocoder for the accurate waveform generation (15회 열람) |
View PDF Abstract Table of Contents |
Holdings Information
No. | Location | Call Number | Accession No. | Availability | Due Date | Make a Reservation | Service |
---|---|---|---|---|---|---|---|
No. 1 | Location Science & Engineering Library/Stacks(Thesis)/ | Call Number 0510 6D36 1120 | Accession No. 123066018 | Availability Available | Due Date | Make a Reservation | Service |
No. 2 | Location Science & Engineering Library/Stacks(Thesis)/ | Call Number 0510 6D36 1120 | Accession No. 123066019 | Availability Available | Due Date | Make a Reservation | Service |
Contents information
Abstract
Flow-based neural vocoders, such as FloWaveNet and WaveGlow, have recently shown significant improvement in the real-time speech generation system. By using these models, the sequence of randomly generated noises can be transformed into an audio waveform in parallel. However, train the model to learn target continuous density function with quantized data can degrade model performance due to the topological difference between the target and source distribution. To resolve this issue, we propose various audio dequantization methods that can be easily implemented to any flow-based neural vocoder and improve the model performance. Inspired by the well-known method in image generation, data dequantization, the audio dequantization can help the model to learn topologically more fitted distribution. As a result, the degradation during the inference can be reduced. We implemented various audio dequantization methods to flow-based neural vocoders and investigated the effect on the generated audio. We conducted various objective performance assessments and subjective evaluations to show that audio dequantization can help improving audio generation quality. From our experiments, using audio dequantization produces waveform audio with better harmonic structure and fewer digital artifacts.
Table of Contents
1 Introduction1 2 Related works3 2.1 Speech synthesis3 2.2 Neural vocoder4 2.3 Data dequantization5 3 Methods9 3.1 Problem Definition9 3.2 Uniform dequantization10 3.3 Gaussian dequantization11 3.4 Variational dequantization12 4 Experiment15 4.1 Experimental setting15 4.2 Multi-speaker audio generation task16 4.3 Modification on uniform dequantization18 4.4 Model generalization21 5 Conclusion25