000 | 00000nam c2200205 c 4500 | |
001 | 000045999321 | |
005 | 20191017130945 | |
007 | ta | |
008 | 190624s2019 ulkd bmAC 000c eng | |
040 | ▼a 211009 ▼c 211009 ▼d 211009 | |
085 | 0 | ▼a 0510 ▼2 KDCP |
090 | ▼a 0510 ▼b 6YD36 ▼c 365 | |
100 | 1 | ▼a 이진혁 ▼g 李眞赫 |
245 | 1 0 | ▼a Towards domain-specific question answering with large unstructured knowledge source / ▼d Jinhyuk Lee |
260 | ▼a Seoul : ▼b Graduate School, Korea University, ▼c 2019 | |
300 | ▼a v, 52장 : ▼b 도표 ; ▼c 26 cm | |
500 | ▼a 지도교수: 강재우 | |
502 | 1 | ▼a 학위논문(박사)-- ▼b 고려대학교 대학원: ▼c 컴퓨터·전파통신공학과, ▼d 2019. 8 |
504 | ▼a 참고문헌: 장 45-52 | |
530 | ▼a PDF 파일로도 이용가능; ▼c Requires PDF file reader(application/pdf) | |
653 | ▼a Question Answering ▼a Natural Language Processing | |
776 | 0 | ▼t Towards Domain-Specific Question Answering with Large Unstructured Knowledge Source ▼w (DCOLL211009)000000084425 |
900 | 1 0 | ▼a Lee, Jin-hyuk, ▼e 저 |
900 | 1 0 | ▼a 강재우 ▼g 姜在雨, ▼e 지도교수 |
945 | ▼a KLPA |
Electronic Information
No. | Title | Service |
---|---|---|
1 | Towards domain-specific question answering with large unstructured knowledge source (62회 열람) |
View PDF Abstract Table of Contents |
Holdings Information
No. | Location | Call Number | Accession No. | Availability | Due Date | Make a Reservation | Service |
---|---|---|---|---|---|---|---|
No. 1 | Location Science & Engineering Library/Stacks(Thesis)/ | Call Number 0510 6YD36 365 | Accession No. 123062325 | Availability Available | Due Date | Make a Reservation | Service |
No. 2 | Location Science & Engineering Library/Stacks(Thesis)/ | Call Number 0510 6YD36 365 | Accession No. 123062326 | Availability Available | Due Date | Make a Reservation | Service |
No. 3 | Location Sejong Academic Information Center/Thesis(5F)/ | Call Number 0510 6YD36 365 | Accession No. 153083337 | Availability Available | Due Date | Make a Reservation | Service |
No. | Location | Call Number | Accession No. | Availability | Due Date | Make a Reservation | Service |
---|---|---|---|---|---|---|---|
No. 1 | Location Science & Engineering Library/Stacks(Thesis)/ | Call Number 0510 6YD36 365 | Accession No. 123062325 | Availability Available | Due Date | Make a Reservation | Service |
No. 2 | Location Science & Engineering Library/Stacks(Thesis)/ | Call Number 0510 6YD36 365 | Accession No. 123062326 | Availability Available | Due Date | Make a Reservation | Service |
No. | Location | Call Number | Accession No. | Availability | Due Date | Make a Reservation | Service |
---|---|---|---|---|---|---|---|
No. 1 | Location Sejong Academic Information Center/Thesis(5F)/ | Call Number 0510 6YD36 365 | Accession No. 153083337 | Availability Available | Due Date | Make a Reservation | Service |
Contents information
Abstract
Question answering has served as a primary task for understanding natural language by answering questions posed in natural language. Over the past few years, question answering has been re-phrased as `machine comprehension', meaning reading comprehension ability of machines, and have started to outperform humans with the help of deep learning. Combined with search engines, recent machine comprehension models are even leveraged to answer questions with a very large unstructured corpus, which is often called open-domain question answering. As practical usage of machine comprehension models often includes understanding language of experts (e.g., biomedical, legal texts), machine comprehension models that answer domain-specific questions with large unstructured knowledge source could benefit various fields. However, machine comprehension models are still far from understanding domain specific texts such as biomedical corpus as these texts have very different word distribution compared to the general domain corpora. Also the performance of open-domain question answering is still very low considering the recent success of machine comprehension with relatively short paragraphs. In this paper, we investigate how open-domain question answering could be improved with paragraph re-ranking, and also how machine comprehension models could be improved in domain specific texts, focusing on biomedical texts. First, we introduce the current question answering system for a large unstructured corpus, and show how the open-domain question answering systems can be improved with paragraph re-ranking. Then, we describe bidirecitional encoder representations from transformers for biomedical text mining (BioBERT) which understand biomedical texts with contextualized word representations. BioBERT gives large improvement not only in biomedical named entity recognition, relation extraction, but also in question answering. Finally, we show how machine comprehension models trained on general domain corpora could be also leveraged in domain specific texts without using any biomedical corpora. We conclude by showing the possibility of building question answering systems that answer domain-specific questions with huge unstructured biomedical corpus.
Table of Contents
Abstract Contents i List of Figures iii List of Tables iv 1 Introduction 1 2 Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering 4 2.1 Background 4 2.2 Open-Domain QA Pipeline 6 2.2.1 Paragraph Ranker 6 2.2.2 Answer Aggregation 8 2.3 Experiments 9 2.3.1 Datasets 9 2.3.2 Implementation Details 9 2.3.3 Results 9 2.3.4 Analysis 10 2.4 Discussion 11 3 BioBERT: a pre-trained biomedical language representation model for biomedicaltext mining 12 3.1 Background 12 3.2 Approach 14 3.3 Methods 15 3.3.1 BERT: Bidirectional Encoder Representations from Transformers 15 3.3.2 Pre-training BioBERT 15 3.3.3 Fine-tuning BioBERT 17 3.4 Results 19 3.4.1 Datasets 19 3.4.2 Experimental Setups 20 3.4.3 Experimental Results 21 3.5 Discussion 25 4 Kernelized Sparse Phrase Representation Learning for Question Answering 27 4.1 Background 27 4.2 Analyzing Dense Phrase Encoding 29 4.2.1 Dense Phrase Representation 29 4.2.2 Analyses 30 4.3 Sparse Phrase Encoding 31 4.4 Experiments 35 4.4.1 Results 37 4.4.2 Qualitative Analyses 40 4.5 Related Work 42 4.6 Discussion 43 5 Conclusion 44 Bibliography 45