As next generation sequencing techniques and high-throughput biomedical experiments continue to advance, the amount of biomedical big data continues to grow. In this era of precision medicine, it is becoming increasingly important to collect, manage, and utilize biomedical big data. However, much of the important knowledge is still published and shared in natural language form. Literature databases such as PubMed and PubMed Central collect biomedical literature daily, but knowledge in natural language form is still not the best format for utilizing or analyzing biomedical knowledge. Experts in each domain aim to build and reorganize knowledge bases on topics of their interest by manual curation; however, it is infeasible to read all the publications, and manually collect and organize the information. To overcome such limitations, text mining techniques for extracting knowledge and constructing knowledge bases can be used.
We have conducted a series of research studies on knowledge extraction from the literature, automatic curation, organizing, and utilization of knowledge.
In the first study, we aim to find genomic mutations in cancer-related literature and to create a corpus called BRONCO that contains related genes, diseases, drugs, and cell lines. This corpus can be used as a learning and evaluation data set for extracting information using text mining. Utilizing this corpus, we compare and analyze the performance of existing text mining technologies and tools.
In the second study, we use this corpus to construct an algorithm that extracts information from documents. Whereas traditional text mining techniques focus on target text, I utilize biomedical search engines to extract relationships between biomedical objects. I also used convolutional neural network (CNN) for relation classification method.
In the third study, we build an application that shows important information extracted from biomedical literature and provides more related knowledge to users. To make text mining results more accessible and available to readers who use PubMed or PubMed Central, we construct a biomedical entity network for each document using texts and other various sources.
This dissertation introduces a series of processes that use text mining to extract knowledge from biomedical literature.