Question answering has served as a primary task for understanding natural language by answering questions posed in natural language.
Over the past few years, question answering has been re-phrased as `machine comprehension', meaning reading comprehension ability of machines, and have started to outperform humans with the help of deep learning.
Combined with search engines, recent machine comprehension models are even leveraged to answer questions with a very large unstructured corpus, which is often called open-domain question answering.
As practical usage of machine comprehension models often includes understanding language of experts (e.g., biomedical, legal texts), machine comprehension models that answer domain-specific questions with large unstructured knowledge source could benefit various fields.
However, machine comprehension models are still far from understanding domain specific texts such as biomedical corpus as these texts have very different word distribution compared to the general domain corpora.
Also the performance of open-domain question answering is still very low considering the recent success of machine comprehension with relatively short paragraphs.
In this paper, we investigate how open-domain question answering could be improved with paragraph re-ranking, and also how machine comprehension models could be improved in domain specific texts, focusing on biomedical texts.
First, we introduce the current question answering system for a large unstructured corpus, and show how the open-domain question answering systems can be improved with paragraph re-ranking.
Then, we describe bidirecitional encoder representations from transformers for biomedical text mining (BioBERT) which understand biomedical texts with contextualized word representations.
BioBERT gives large improvement not only in biomedical named entity recognition, relation extraction, but also in question answering.
Finally, we show how machine comprehension models trained on general domain corpora could be also leveraged in domain specific texts without using any biomedical corpora.
We conclude by showing the possibility of building question answering systems that answer domain-specific questions with huge unstructured biomedical corpus.