HOME > 상세정보

상세정보

Audio dequantization in flow-based neural vocoder for the accurate waveform generation

Audio dequantization in flow-based neural vocoder for the accurate waveform generation

자료유형
학위논문
개인저자
윤현욱, 尹現煜
서명 / 저자사항
Audio dequantization in flow-based neural vocoder for the accurate waveform generation / 尹現煜
발행사항
Seoul :   Graduate School, Korea University,   2021  
형태사항
26장 : 도표 ; 26 cm
기타형태 저록
Audio Dequantization in Flow-based Neural Vocoder for the Accurate Waveform Generation   (DCOLL211009)000000235776  
학위논문주기
학위논문(석사)-- 고려대학교 대학원: 컴퓨터·전파통신공학과, 2021. 2
학과코드
0510   6D36   1120  
일반주기
지도교수: 이성환  
서지주기
참고문헌: 장 26-33
이용가능한 다른형태자료
PDF 파일로도 이용가능;   Requires PDF file reader(application/pdf)  
비통제주제어
음성 생성 모델,,
000 00000nam c2200205 c 4500
001 000046071897
005 20210326140855
007 ta
008 201231s2021 ulkd bmAC 000c eng
040 ▼a 211009 ▼c 211009 ▼d 211009
041 0 ▼a eng ▼b kor
085 0 ▼a 0510 ▼2 KDCP
090 ▼a 0510 ▼b 6D36 ▼c 1120
100 1 ▼a 윤현욱, ▼g 尹現煜
245 1 0 ▼a Audio dequantization in flow-based neural vocoder for the accurate waveform generation / ▼d 尹現煜
260 ▼a Seoul : ▼b Graduate School, Korea University, ▼c 2021
300 ▼a 26장 : ▼b 도표 ; ▼c 26 cm
500 ▼a 지도교수: 이성환
502 0 ▼a 학위논문(석사)-- ▼b 고려대학교 대학원: ▼c 컴퓨터·전파통신공학과, ▼d 2021. 2
504 ▼a 참고문헌: 장 26-33
530 ▼a PDF 파일로도 이용가능; ▼c Requires PDF file reader(application/pdf)
653 ▼a 음성 생성 모델
776 0 ▼t Audio Dequantization in Flow-based Neural Vocoder for the Accurate Waveform Generation ▼w (DCOLL211009)000000235776
900 1 0 ▼a 이성환, ▼g 李晟瑍, ▼e 지도교수
900 1 0 ▼a Yoon, Hyun Wook, ▼e
945 ▼a KLPA

전자정보

No. 원문명 서비스
1
Audio dequantization in flow-based neural vocoder for the accurate waveform generation (15회 열람)
PDF 초록 목차

소장정보

No. 소장처 청구기호 등록번호 도서상태 반납예정일 예약 서비스
No. 1 소장처 과학도서관/학위논문서고/ 청구기호 0510 6D36 1120 등록번호 123066018 도서상태 대출가능 반납예정일 예약 서비스 B M
No. 2 소장처 과학도서관/학위논문서고/ 청구기호 0510 6D36 1120 등록번호 123066019 도서상태 대출가능 반납예정일 예약 서비스 B M

컨텐츠정보

초록

Flow-based neural vocoders, such as FloWaveNet and WaveGlow, have recently shown significant improvement in the real-time speech generation system. By using these models, the sequence of randomly generated noises can be transformed into an audio waveform in parallel. However, train the model to learn target continuous density function with quantized data can degrade model performance due to the topological difference between the target and source distribution. 
To resolve this issue, we propose various audio dequantization methods that can be easily implemented to any flow-based neural vocoder and improve the model performance. Inspired by the well-known method in image generation, data dequantization, the audio dequantization can help the model to learn topologically more fitted distribution. As a result, the degradation during the inference can be reduced. We implemented various audio dequantization methods to flow-based neural vocoders and investigated the effect on the generated audio. 
We conducted various objective performance assessments and subjective evaluations to show that audio dequantization can help improving audio generation quality. From our experiments, using audio dequantization produces waveform audio with better harmonic structure and fewer digital artifacts.

목차

1  Introduction1
2  Related works3
2.1 Speech synthesis3
2.2 Neural vocoder4
2.3 Data dequantization5
3  Methods9
3.1 Problem Definition9
3.2 Uniform dequantization10
3.3 Gaussian dequantization11
3.4 Variational dequantization12
4  Experiment15
4.1 Experimental setting15
4.2 Multi-speaker audio generation task16
4.3 Modification on uniform dequantization18
4.4 Model generalization21
5  Conclusion25