HOME > Detail View

Detail View

Towards domain-specific question answering with large unstructured knowledge source

Towards domain-specific question answering with large unstructured knowledge source

Material type
학위논문
Personal Author
이진혁 李眞赫
Title Statement
Towards domain-specific question answering with large unstructured knowledge source / Jinhyuk Lee
Publication, Distribution, etc
Seoul :   Graduate School, Korea University,   2019  
Physical Medium
v, 52장 : 도표 ; 26 cm
기타형태 저록
Towards Domain-Specific Question Answering with Large Unstructured Knowledge Source   (DCOLL211009)000000084425  
학위논문주기
학위논문(박사)-- 고려대학교 대학원: 컴퓨터·전파통신공학과, 2019. 8
학과코드
0510   6YD36   365  
General Note
지도교수: 강재우  
Bibliography, Etc. Note
참고문헌: 장 45-52
이용가능한 다른형태자료
PDF 파일로도 이용가능;   Requires PDF file reader(application/pdf)  
비통제주제어
Question Answering , Natural Language Processing,,
000 00000nam c2200205 c 4500
001 000045999321
005 20191017130945
007 ta
008 190624s2019 ulkd bmAC 000c eng
040 ▼a 211009 ▼c 211009 ▼d 211009
085 0 ▼a 0510 ▼2 KDCP
090 ▼a 0510 ▼b 6YD36 ▼c 365
100 1 ▼a 이진혁 ▼g 李眞赫
245 1 0 ▼a Towards domain-specific question answering with large unstructured knowledge source / ▼d Jinhyuk Lee
260 ▼a Seoul : ▼b Graduate School, Korea University, ▼c 2019
300 ▼a v, 52장 : ▼b 도표 ; ▼c 26 cm
500 ▼a 지도교수: 강재우
502 1 ▼a 학위논문(박사)-- ▼b 고려대학교 대학원: ▼c 컴퓨터·전파통신공학과, ▼d 2019. 8
504 ▼a 참고문헌: 장 45-52
530 ▼a PDF 파일로도 이용가능; ▼c Requires PDF file reader(application/pdf)
653 ▼a Question Answering ▼a Natural Language Processing
776 0 ▼t Towards Domain-Specific Question Answering with Large Unstructured Knowledge Source ▼w (DCOLL211009)000000084425
900 1 0 ▼a Lee, Jin-hyuk, ▼e
900 1 0 ▼a 강재우 ▼g 姜在雨, ▼e 지도교수
945 ▼a KLPA

Electronic Information

No. Title Service
1
Towards domain-specific question answering with large unstructured knowledge source (62회 열람)
View PDF Abstract Table of Contents
No. Location Call Number Accession No. Availability Due Date Make a Reservation Service
No. 1 Location Science & Engineering Library/Stacks(Thesis)/ Call Number 0510 6YD36 365 Accession No. 123062325 Availability Available Due Date Make a Reservation Service B M
No. 2 Location Science & Engineering Library/Stacks(Thesis)/ Call Number 0510 6YD36 365 Accession No. 123062326 Availability Available Due Date Make a Reservation Service B M
No. 3 Location Sejong Academic Information Center/Thesis(5F)/ Call Number 0510 6YD36 365 Accession No. 153083337 Availability Available Due Date Make a Reservation Service M
No. Location Call Number Accession No. Availability Due Date Make a Reservation Service
No. 1 Location Science & Engineering Library/Stacks(Thesis)/ Call Number 0510 6YD36 365 Accession No. 123062325 Availability Available Due Date Make a Reservation Service B M
No. 2 Location Science & Engineering Library/Stacks(Thesis)/ Call Number 0510 6YD36 365 Accession No. 123062326 Availability Available Due Date Make a Reservation Service B M
No. Location Call Number Accession No. Availability Due Date Make a Reservation Service
No. 1 Location Sejong Academic Information Center/Thesis(5F)/ Call Number 0510 6YD36 365 Accession No. 153083337 Availability Available Due Date Make a Reservation Service M

Contents information

Abstract

Question answering has served as a primary task for understanding natural language by answering questions posed in natural language.
Over the past few years, question answering has been re-phrased as `machine comprehension', meaning reading comprehension ability of machines, and have started to outperform humans with the help of deep learning.
Combined with search engines, recent machine comprehension models are even leveraged to answer questions with a very large unstructured corpus, which is often called open-domain question answering.
As practical usage of machine comprehension models often includes understanding language of experts (e.g., biomedical, legal texts), machine comprehension models that answer domain-specific questions with large unstructured knowledge source could benefit various fields.

However, machine comprehension models are still far from understanding domain specific texts such as biomedical corpus as these texts have very different word distribution compared to the general domain corpora.
Also the performance of open-domain question answering is still very low considering the recent success of machine comprehension with relatively short paragraphs.
In this paper, we investigate how open-domain question answering could be improved with paragraph re-ranking, and also how machine comprehension models could be improved in domain specific texts, focusing on biomedical texts.
First, we introduce the current question answering system for a large unstructured corpus, and show how the open-domain question answering systems can be improved with paragraph re-ranking.
Then, we describe bidirecitional encoder representations from transformers for biomedical text mining (BioBERT) which understand biomedical texts with contextualized word representations.
BioBERT gives large improvement not only in biomedical named entity recognition, relation extraction, but also in question answering.
Finally, we show how machine comprehension models trained on general domain corpora could be also leveraged in domain specific texts without using any biomedical corpora.
We conclude by showing the possibility of building question answering systems that answer domain-specific questions with huge unstructured biomedical corpus.

Table of Contents

Abstract
Contents i
List of Figures iii
List of Tables iv
1    Introduction 1
2    Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering 4
 2.1 Background   4
 2.2 Open-Domain QA Pipeline   6
  2.2.1 Paragraph Ranker  6
  2.2.2 Answer Aggregation   8
 2.3 Experiments   9
  2.3.1 Datasets  9
  2.3.2 Implementation Details  9
  2.3.3 Results  9
  2.3.4 Analysis    10
 2.4 Discussion 11
3    BioBERT: a pre-trained biomedical language representation model for biomedicaltext mining 12
 3.1 Background  12
 3.2 Approach  14
 3.3 Methods   15
  3.3.1 BERT: Bidirectional Encoder Representations from Transformers  15
  3.3.2 Pre-training BioBERT   15
  3.3.3 Fine-tuning BioBERT    17
 3.4 Results 19
  3.4.1 Datasets 19
  3.4.2 Experimental Setups   20
  3.4.3 Experimental Results  21
 3.5 Discussion 25
4    Kernelized Sparse Phrase Representation Learning for Question Answering 27
 4.1 Background   27
 4.2 Analyzing Dense Phrase Encoding  29
  4.2.1 Dense Phrase Representation  29
  4.2.2 Analyses   30
 4.3 Sparse Phrase Encoding 31
 4.4 Experiments  35
  4.4.1 Results   37
  4.4.2 Qualitative Analyses   40
 4.5 Related Work   42
 4.6 Discussion 43
5    Conclusion 44
Bibliography 45