HOME > 상세정보

상세정보

Toward word embedding techniques to handle out-of-vocabulary problem with subword information

Toward word embedding techniques to handle out-of-vocabulary problem with subword information

자료유형
학위논문
개인저자
김예찬
서명 / 저자사항
Toward word embedding techniques to handle out-of-vocabulary problem with subword information / Yeachan Kim
발행사항
Seoul :   Graduate School, Korea Unversity,   2019  
형태사항
v, 41장 : 도표 ; 26 cm
기타형태 저록
Toward Word Embedding Techniques to Handle Out-of-Vocabulary Problem with Subword Information   (DCOLL211009)000000083458  
학위논문주기
학위논문(석사)-- 고려대학교 대학원: 컴퓨터·전파통신공학과, 2019. 2
학과코드
0510   6D36   1098  
일반주기
지도교수: 이상근  
서지주기
참고문헌: 장 37-41
이용가능한 다른형태자료
PDF 파일로도 이용가능;   Requires PDF file reader(application/pdf)  
비통제주제어
Word Embeddings , NLP,,
000 00000nam c2200205 c 4500
001 000045978962
005 20190416165931
007 ta
008 181226s2019 ulkd bmAC 000c eng
040 ▼a 211009 ▼c 211009 ▼d 211009
085 0 ▼a 0510 ▼2 KDCP
090 ▼a 0510 ▼b 6D36 ▼c 1098
100 1 ▼a 김예찬
245 1 0 ▼a Toward word embedding techniques to handle out-of-vocabulary problem with subword information / ▼d Yeachan Kim
260 ▼a Seoul : ▼b Graduate School, Korea Unversity, ▼c 2019
300 ▼a v, 41장 : ▼b 도표 ; ▼c 26 cm
500 ▼a 지도교수: 이상근
502 0 ▼a 학위논문(석사)-- ▼b 고려대학교 대학원: ▼c 컴퓨터·전파통신공학과, ▼d 2019. 2
504 ▼a 참고문헌: 장 37-41
530 ▼a PDF 파일로도 이용가능; ▼c Requires PDF file reader(application/pdf)
653 ▼a Word Embeddings ▼a NLP
776 0 ▼t Toward Word Embedding Techniques to Handle Out-of-Vocabulary Problem with Subword Information ▼w (DCOLL211009)000000083458
900 1 0 ▼a Kim, Yea-chan, ▼e
900 1 0 ▼a 이상근 ▼g 李尙根, ▼e 지도교수
945 ▼a KLPA

전자정보

No. 원문명 서비스
1
Toward word embedding techniques to handle out-of-vocabulary problem with subword information (36회 열람)
PDF 초록 목차

소장정보

No. 소장처 청구기호 등록번호 도서상태 반납예정일 예약 서비스
No. 1 소장처 과학도서관/학위논문서고/ 청구기호 0510 6D36 1098 등록번호 123060863 도서상태 대출가능 반납예정일 예약 서비스 B M
No. 2 소장처 과학도서관/학위논문서고/ 청구기호 0510 6D36 1098 등록번호 123060864 도서상태 대출가능 반납예정일 예약 서비스 B M

컨텐츠정보

초록

Word embeddings have been a crucial component in natural language processing (NLP) models. In particular, pre-trained word embeddings (e.g., word2vec, GloVe) have proven to be invaluable for improving the performance in NLP tasks. However, such embeddings are usually blind to the relatedness between words. This gives rise to a fatal constraint in representing out-of-vocabulary (OOV) words, even if such words are lexically related to vocabulary words. In this thesis, we present two kinds of methodologies to handle this problem. Our first approach is to expand a vocabulary by transforming word embeddings itself. To this end, we propose a novel deep neural network which takes a set of pre-trained word embeddings and generalizes it to word entries involving OOV words. The second approach is to modify the encodings (e.g., one-hot encoding) that are used in a look-up function of an embedding-layer. To build a new encoding, we propose a neural network that takes a word as an input and outputs an encoding of the input word. In particular, we seek to inject the relatedness between into the encodings. This makes OOV words have their distinct encodings which are represented by their related words. The common characteristic of the aforementioned methods is that they use subword information to represent words. This allows us to consider the relatedness between words and represent any word effectively. Experimental results and our in-depth analysis show that our two methodologies lead to excellent performance improvement by generating OOV words and demonstrate that our methods produce meaningful representations for OOV words.  

목차

1 Introduction 1
2 RelatedWorks 5
 2.1 Representing Words using Subword Information 5
 2.2 Utilizing Linguistic Resources to Handle OOV Words 6
3 Embedding-based Technique 7
 3.1 Character-level Convolutional Neural Networks for Embeddings 7
 3.2 Highway Network 9
 3.3 Training and Deriving Word Embeddings 10
4 Encoding-based Technique 12
 4.1 Character-level Convolutional Neural Networks for Encodings 13
 4.2 Knowledge Distillation into Character-based Encodings 15
 4.3 Training and Deriving Word Embeddings 16
5 Experiments 17
 5.1 Experimental Settings 17
 5.2 Experiments for Embedding-based Technique 19
  5.2.1 Word Similarity 19
  5.2.2 Language Modeling 21
 5.3 Experiments for Encoding-based Technique 23
  5.3.1 Word Similarity 23
  5.3.2 Word Analogy 24
  5.3.3 Chunking 27
 5.4 Comparison between Embedding and Encoding-based Technique 28
  5.4.1 Large-scale Text Classification 28
6 Analysis 31
 6.1 Analysis for Embedding-based Technique 31
  6.1.1 Effect of Highway Networks 31
  6.1.2 Nearest Neighbor of Words 33
 6.2 Analysis for Encoding-based Technique 34
  6.2.1 Effect of the Number of Elements in PEN 34
  6.2.2 Nearest Neighbor of Words 35
7 Conclusion 36
Bibliography 37
Acknowledgement  42

관련분야 신착자료