HOME > Detail View

Detail View

Audio dequantization in flow-based neural vocoder for the accurate waveform generation

Audio dequantization in flow-based neural vocoder for the accurate waveform generation

Material type
학위논문
Personal Author
윤현욱, 尹現煜
Title Statement
Audio dequantization in flow-based neural vocoder for the accurate waveform generation / 尹現煜
Publication, Distribution, etc
Seoul :   Graduate School, Korea University,   2021  
Physical Medium
26장 : 도표 ; 26 cm
기타형태 저록
Audio Dequantization in Flow-based Neural Vocoder for the Accurate Waveform Generation   (DCOLL211009)000000235776  
학위논문주기
학위논문(석사)-- 고려대학교 대학원: 컴퓨터·전파통신공학과, 2021. 2
학과코드
0510   6D36   1120  
General Note
지도교수: 이성환  
Bibliography, Etc. Note
참고문헌: 장 26-33
이용가능한 다른형태자료
PDF 파일로도 이용가능;   Requires PDF file reader(application/pdf)  
비통제주제어
음성 생성 모델,,
000 00000nam c2200205 c 4500
001 000046071897
005 20210326140855
007 ta
008 201231s2021 ulkd bmAC 000c eng
040 ▼a 211009 ▼c 211009 ▼d 211009
041 0 ▼a eng ▼b kor
085 0 ▼a 0510 ▼2 KDCP
090 ▼a 0510 ▼b 6D36 ▼c 1120
100 1 ▼a 윤현욱, ▼g 尹現煜
245 1 0 ▼a Audio dequantization in flow-based neural vocoder for the accurate waveform generation / ▼d 尹現煜
260 ▼a Seoul : ▼b Graduate School, Korea University, ▼c 2021
300 ▼a 26장 : ▼b 도표 ; ▼c 26 cm
500 ▼a 지도교수: 이성환
502 0 ▼a 학위논문(석사)-- ▼b 고려대학교 대학원: ▼c 컴퓨터·전파통신공학과, ▼d 2021. 2
504 ▼a 참고문헌: 장 26-33
530 ▼a PDF 파일로도 이용가능; ▼c Requires PDF file reader(application/pdf)
653 ▼a 음성 생성 모델
776 0 ▼t Audio Dequantization in Flow-based Neural Vocoder for the Accurate Waveform Generation ▼w (DCOLL211009)000000235776
900 1 0 ▼a 이성환, ▼g 李晟瑍, ▼e 지도교수
900 1 0 ▼a Yoon, Hyun Wook, ▼e
945 ▼a KLPA

Electronic Information

No. Title Service
1
Audio dequantization in flow-based neural vocoder for the accurate waveform generation (15회 열람)
View PDF Abstract Table of Contents

Holdings Information

No. Location Call Number Accession No. Availability Due Date Make a Reservation Service
No. 1 Location Science & Engineering Library/Stacks(Thesis)/ Call Number 0510 6D36 1120 Accession No. 123066018 Availability Available Due Date Make a Reservation Service B M
No. 2 Location Science & Engineering Library/Stacks(Thesis)/ Call Number 0510 6D36 1120 Accession No. 123066019 Availability Available Due Date Make a Reservation Service B M

Contents information

Abstract

Flow-based neural vocoders, such as FloWaveNet and WaveGlow, have recently shown significant improvement in the real-time speech generation system. By using these models, the sequence of randomly generated noises can be transformed into an audio waveform in parallel. However, train the model to learn target continuous density function with quantized data can degrade model performance due to the topological difference between the target and source distribution. 
To resolve this issue, we propose various audio dequantization methods that can be easily implemented to any flow-based neural vocoder and improve the model performance. Inspired by the well-known method in image generation, data dequantization, the audio dequantization can help the model to learn topologically more fitted distribution. As a result, the degradation during the inference can be reduced. We implemented various audio dequantization methods to flow-based neural vocoders and investigated the effect on the generated audio. 
We conducted various objective performance assessments and subjective evaluations to show that audio dequantization can help improving audio generation quality. From our experiments, using audio dequantization produces waveform audio with better harmonic structure and fewer digital artifacts.

Table of Contents

1  Introduction1
2  Related works3
2.1 Speech synthesis3
2.2 Neural vocoder4
2.3 Data dequantization5
3  Methods9
3.1 Problem Definition9
3.2 Uniform dequantization10
3.3 Gaussian dequantization11
3.4 Variational dequantization12
4  Experiment15
4.1 Experimental setting15
4.2 Multi-speaker audio generation task16
4.3 Modification on uniform dequantization18
4.4 Model generalization21
5  Conclusion25