HOME > 상세정보

상세정보

Word embedding technique with channel attention mechanism for Korean language

Word embedding technique with channel attention mechanism for Korean language

자료유형
학위논문
개인저자
권오준, 權五俊
서명 / 저자사항
Word embedding technique with channel attention mechanism for Korean language / Ohjoon Kwon
발행사항
Seoul :   Graduate School, Korea University,   2020  
형태사항
v, 39장 : 천연색삽화, 도표 ; 26 cm
기타형태 저록
Word Embedding Technique with Channel Attention Mechanism for Korean Language   (DCOLL211009)000000127351  
학위논문주기
학위논문(석사)-- 고려대학교 대학원: 컴퓨터·전파통신공학과, 2020. 2
학과코드
0510   6D36   1105  
일반주기
지도교수: 이상근  
서지주기
참고문헌: 장 35-39
이용가능한 다른형태자료
PDF 파일로도 이용가능;   Requires PDF file reader(application/pdf)  
비통제주제어
typos embedding , word embedding,,
000 00000nam c2200205 c 4500
001 000046026314
005 20200428152911
007 ta
008 191226s2020 ulkad bmAC 000c eng
040 ▼a 211009 ▼c 211009 ▼d 211009
085 0 ▼a 0510 ▼2 KDCP
090 ▼a 0510 ▼b 6D36 ▼c 1105
100 1 ▼a 권오준, ▼g 權五俊
245 1 0 ▼a Word embedding technique with channel attention mechanism for Korean language / ▼d Ohjoon Kwon
260 ▼a Seoul : ▼b Graduate School, Korea University, ▼c 2020
300 ▼a v, 39장 : ▼b 천연색삽화, 도표 ; ▼c 26 cm
500 ▼a 지도교수: 이상근
502 0 ▼a 학위논문(석사)-- ▼b 고려대학교 대학원: ▼c 컴퓨터·전파통신공학과, ▼d 2020. 2
504 ▼a 참고문헌: 장 35-39
530 ▼a PDF 파일로도 이용가능; ▼c Requires PDF file reader(application/pdf)
653 ▼a typos embedding ▼a word embedding
776 0 ▼t Word Embedding Technique with Channel Attention Mechanism for Korean Language ▼w (DCOLL211009)000000127351
900 1 0 ▼a Kwon, Oh-joon, ▼e
900 1 0 ▼a 이상근, ▼g 李尙根, ▼e 지도교수
945 ▼a KLPA

전자정보

No. 원문명 서비스
1
Word embedding technique with channel attention mechanism for Korean language (27회 열람)
PDF 초록 목차

소장정보

No. 소장처 청구기호 등록번호 도서상태 반납예정일 예약 서비스
No. 1 소장처 과학도서관/학위논문서고/ 청구기호 0510 6D36 1105 등록번호 123063725 도서상태 대출가능 반납예정일 예약 서비스 B M
No. 2 소장처 과학도서관/학위논문서고/ 청구기호 0510 6D36 1105 등록번호 123063726 도서상태 대출가능 반납예정일 예약 서비스 B M

컨텐츠정보

초록

Word embedding is considered as an essential factor in improving the performance of various Natural Language Processing (NLP) models. However, since word embedding that is used in general research is derived from a well-refined dataset, it is often less applicable in a real-world dataset. Particularly, in the case of Hangeul (Korean language), which has a unique writing system, different kinds of Out-Of-Vocabulary (OOV) appear based on typos or newly coined words. In this thesis, we propose a stable Hangeul word embedding technique that can maintain performance even for the noisy texts with various typos. We create a word vector that mixes correct words with intentionally generated typos and performed end-to-end training using the contextual information of the embedded word. In order to demonstrate the effectiveness of our model, we conduct intrinsic, extrinsic, and attention score visualization tests. While the existing embedding techniques failed to prove their accuracy as the noise level increased, the embedding technique developed in this thesis shows stable performances.

목차

1 Introduction 1
2 Related Work 5
 2.1 Typo Word Embedding Methods for English 5
 2.2 Word Embedding Methods for Korean 7
3 Model 10
 3.1 Generating Korean Typo 10
 3.2 Jamo-level Convolution Neural Network with Channel Attention 12
 3.3 Training and Deriving Word Embeddings 15
4 Experiments 17
 4.1 Experiments Settings 17
 4.2 Word Analogy Task 18
  4.2.1 Datasets 19
  4.2.2 Results 19
 4.3 Language Model Task 20
  4.3.1 Datasets 20
  4.3.2 Results 21
 4.4 Sentiment Classification Task 22
  4.4.1 Datasets 22
  4.4.2 Results 23
5 Analysis 24
 5.1 Effects of Channel Attention 24
 5.2 Nearest Neighbor of Words 26
 5.3 Robustness to Noise Level 30
 5.4 Training Time 32
6 Conclusion 34
Bibliography 35