HOME > 상세정보

상세정보

Towards domain-specific question answering with large unstructured knowledge source

Towards domain-specific question answering with large unstructured knowledge source

자료유형
학위논문
개인저자
이진혁 李眞赫
서명 / 저자사항
Towards domain-specific question answering with large unstructured knowledge source / Jinhyuk Lee
발행사항
Seoul :   Graduate School, Korea University,   2019  
형태사항
v, 52장 : 도표 ; 26 cm
기타형태 저록
Towards Domain-Specific Question Answering with Large Unstructured Knowledge Source   (DCOLL211009)000000084425  
학위논문주기
학위논문(박사)-- 고려대학교 대학원: 컴퓨터·전파통신공학과, 2019. 8
학과코드
0510   6YD36   365  
일반주기
지도교수: 강재우  
서지주기
참고문헌: 장 45-52
이용가능한 다른형태자료
PDF 파일로도 이용가능;   Requires PDF file reader(application/pdf)  
비통제주제어
Question Answering , Natural Language Processing,,
000 00000nam c2200205 c 4500
001 000045999321
005 20191017130945
007 ta
008 190624s2019 ulkd bmAC 000c eng
040 ▼a 211009 ▼c 211009 ▼d 211009
085 0 ▼a 0510 ▼2 KDCP
090 ▼a 0510 ▼b 6YD36 ▼c 365
100 1 ▼a 이진혁 ▼g 李眞赫
245 1 0 ▼a Towards domain-specific question answering with large unstructured knowledge source / ▼d Jinhyuk Lee
260 ▼a Seoul : ▼b Graduate School, Korea University, ▼c 2019
300 ▼a v, 52장 : ▼b 도표 ; ▼c 26 cm
500 ▼a 지도교수: 강재우
502 1 ▼a 학위논문(박사)-- ▼b 고려대학교 대학원: ▼c 컴퓨터·전파통신공학과, ▼d 2019. 8
504 ▼a 참고문헌: 장 45-52
530 ▼a PDF 파일로도 이용가능; ▼c Requires PDF file reader(application/pdf)
653 ▼a Question Answering ▼a Natural Language Processing
776 0 ▼t Towards Domain-Specific Question Answering with Large Unstructured Knowledge Source ▼w (DCOLL211009)000000084425
900 1 0 ▼a Lee, Jin-hyuk, ▼e
900 1 0 ▼a 강재우 ▼g 姜在雨, ▼e 지도교수
945 ▼a KLPA

전자정보

No. 원문명 서비스
1
Towards domain-specific question answering with large unstructured knowledge source (61회 열람)
PDF 초록 목차
No. 소장처 청구기호 등록번호 도서상태 반납예정일 예약 서비스
No. 1 소장처 과학도서관/학위논문서고/ 청구기호 0510 6YD36 365 등록번호 123062325 도서상태 대출가능 반납예정일 예약 서비스 B M
No. 2 소장처 과학도서관/학위논문서고/ 청구기호 0510 6YD36 365 등록번호 123062326 도서상태 대출가능 반납예정일 예약 서비스 B M
No. 3 소장처 세종학술정보원/5층 학위논문실/ 청구기호 0510 6YD36 365 등록번호 153083337 도서상태 대출가능 반납예정일 예약 서비스
No. 소장처 청구기호 등록번호 도서상태 반납예정일 예약 서비스
No. 1 소장처 과학도서관/학위논문서고/ 청구기호 0510 6YD36 365 등록번호 123062325 도서상태 대출가능 반납예정일 예약 서비스 B M
No. 2 소장처 과학도서관/학위논문서고/ 청구기호 0510 6YD36 365 등록번호 123062326 도서상태 대출가능 반납예정일 예약 서비스 B M
No. 소장처 청구기호 등록번호 도서상태 반납예정일 예약 서비스
No. 1 소장처 세종학술정보원/5층 학위논문실/ 청구기호 0510 6YD36 365 등록번호 153083337 도서상태 대출가능 반납예정일 예약 서비스

컨텐츠정보

초록

Question answering has served as a primary task for understanding natural language by answering questions posed in natural language.
Over the past few years, question answering has been re-phrased as `machine comprehension', meaning reading comprehension ability of machines, and have started to outperform humans with the help of deep learning.
Combined with search engines, recent machine comprehension models are even leveraged to answer questions with a very large unstructured corpus, which is often called open-domain question answering.
As practical usage of machine comprehension models often includes understanding language of experts (e.g., biomedical, legal texts), machine comprehension models that answer domain-specific questions with large unstructured knowledge source could benefit various fields.

However, machine comprehension models are still far from understanding domain specific texts such as biomedical corpus as these texts have very different word distribution compared to the general domain corpora.
Also the performance of open-domain question answering is still very low considering the recent success of machine comprehension with relatively short paragraphs.
In this paper, we investigate how open-domain question answering could be improved with paragraph re-ranking, and also how machine comprehension models could be improved in domain specific texts, focusing on biomedical texts.
First, we introduce the current question answering system for a large unstructured corpus, and show how the open-domain question answering systems can be improved with paragraph re-ranking.
Then, we describe bidirecitional encoder representations from transformers for biomedical text mining (BioBERT) which understand biomedical texts with contextualized word representations.
BioBERT gives large improvement not only in biomedical named entity recognition, relation extraction, but also in question answering.
Finally, we show how machine comprehension models trained on general domain corpora could be also leveraged in domain specific texts without using any biomedical corpora.
We conclude by showing the possibility of building question answering systems that answer domain-specific questions with huge unstructured biomedical corpus.

목차

Abstract
Contents i
List of Figures iii
List of Tables iv
1    Introduction 1
2    Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering 4
 2.1 Background   4
 2.2 Open-Domain QA Pipeline   6
  2.2.1 Paragraph Ranker  6
  2.2.2 Answer Aggregation   8
 2.3 Experiments   9
  2.3.1 Datasets  9
  2.3.2 Implementation Details  9
  2.3.3 Results  9
  2.3.4 Analysis    10
 2.4 Discussion 11
3    BioBERT: a pre-trained biomedical language representation model for biomedicaltext mining 12
 3.1 Background  12
 3.2 Approach  14
 3.3 Methods   15
  3.3.1 BERT: Bidirectional Encoder Representations from Transformers  15
  3.3.2 Pre-training BioBERT   15
  3.3.3 Fine-tuning BioBERT    17
 3.4 Results 19
  3.4.1 Datasets 19
  3.4.2 Experimental Setups   20
  3.4.3 Experimental Results  21
 3.5 Discussion 25
4    Kernelized Sparse Phrase Representation Learning for Question Answering 27
 4.1 Background   27
 4.2 Analyzing Dense Phrase Encoding  29
  4.2.1 Dense Phrase Representation  29
  4.2.2 Analyses   30
 4.3 Sparse Phrase Encoding 31
 4.4 Experiments  35
  4.4.1 Results   37
  4.4.2 Qualitative Analyses   40
 4.5 Related Work   42
 4.6 Discussion 43
5    Conclusion 44
Bibliography 45