000 | 00000nam c2200205 c 4500 | |
001 | 000046026314 | |
005 | 20230712091830 | |
007 | ta | |
008 | 191226s2020 ulkad bmAC 000c eng | |
040 | ▼a 211009 ▼c 211009 ▼d 211009 | |
085 | 0 | ▼a 0510 ▼2 KDCP |
090 | ▼a 0510 ▼b 6D36 ▼c 1105 | |
100 | 1 | ▼a 권오준, ▼g 權五俊 |
245 | 1 0 | ▼a Word embedding technique with channel attention mechanism for Korean language / ▼d Ohjoon Kwon |
260 | ▼a Seoul : ▼b Graduate School, Korea University, ▼c 2020 | |
300 | ▼a v, 39장 : ▼b 천연색삽화, 도표 ; ▼c 26 cm | |
500 | ▼a 지도교수: 이상근 | |
502 | 0 | ▼a 학위논문(석사)-- ▼b 고려대학교 대학원, ▼c 컴퓨터·전파통신공학과, ▼d 2020. 2 |
504 | ▼a 참고문헌: 장 35-39 | |
530 | ▼a PDF 파일로도 이용가능; ▼c Requires PDF file reader(application/pdf) | |
653 | ▼a typos embedding ▼a word embedding | |
776 | 0 | ▼t Word Embedding Technique with Channel Attention Mechanism for Korean Language ▼w (DCOLL211009)000000127351 |
900 | 1 0 | ▼a Kwon, Oh-joon, ▼e 저 |
900 | 1 0 | ▼a 이상근, ▼g 李尙根, ▼d 1971-, ▼e 지도교수 ▼0 AUTH(211009)153285 |
945 | ▼a KLPA |
전자정보
소장정보
No. | 소장처 | 청구기호 | 등록번호 | 도서상태 | 반납예정일 | 예약 | 서비스 |
---|---|---|---|---|---|---|---|
No. 1 | 소장처 과학도서관/학위논문서고/ | 청구기호 0510 6D36 1105 | 등록번호 123063725 | 도서상태 대출가능 | 반납예정일 | 예약 | 서비스 |
No. 2 | 소장처 과학도서관/학위논문서고/ | 청구기호 0510 6D36 1105 | 등록번호 123063726 | 도서상태 대출가능 | 반납예정일 | 예약 | 서비스 |
컨텐츠정보
초록
Word embedding is considered as an essential factor in improving the performance of various Natural Language Processing (NLP) models. However, since word embedding that is used in general research is derived from a well-refined dataset, it is often less applicable in a real-world dataset. Particularly, in the case of Hangeul (Korean language), which has a unique writing system, different kinds of Out-Of-Vocabulary (OOV) appear based on typos or newly coined words. In this thesis, we propose a stable Hangeul word embedding technique that can maintain performance even for the noisy texts with various typos. We create a word vector that mixes correct words with intentionally generated typos and performed end-to-end training using the contextual information of the embedded word. In order to demonstrate the effectiveness of our model, we conduct intrinsic, extrinsic, and attention score visualization tests. While the existing embedding techniques failed to prove their accuracy as the noise level increased, the embedding technique developed in this thesis shows stable performances.
목차
1 Introduction 1 2 Related Work 5 2.1 Typo Word Embedding Methods for English 5 2.2 Word Embedding Methods for Korean 7 3 Model 10 3.1 Generating Korean Typo 10 3.2 Jamo-level Convolution Neural Network with Channel Attention 12 3.3 Training and Deriving Word Embeddings 15 4 Experiments 17 4.1 Experiments Settings 17 4.2 Word Analogy Task 18 4.2.1 Datasets 19 4.2.2 Results 19 4.3 Language Model Task 20 4.3.1 Datasets 20 4.3.2 Results 21 4.4 Sentiment Classification Task 22 4.4.1 Datasets 22 4.4.2 Results 23 5 Analysis 24 5.1 Effects of Channel Attention 24 5.2 Nearest Neighbor of Words 26 5.3 Robustness to Noise Level 30 5.4 Training Time 32 6 Conclusion 34 Bibliography 35