Users of social network services (SNS), including Twitter, Yelp, and Facebook, generate a great deal of data in such forms as text documents, images, and videos. Of these forms, text documents such as messages, comments, and reviews are particularly revelatory of users’ intentions in a given social environment. Thus, much recent research has focused on analyzing and predicting social phenomena using text-based SNS data. Machine learning (ML), which requires a large amount of data for its performance, is an emerging tool for mining useful information from the data. However, because fully automated computation methods for ML do not provide clear understandings of the data and of careful interactions from the data analyst’s point of view, there is a growing interest in interactive visual interface systems based on machine learning techniques.
This thesis introduces two systems that tightly combine human-centered interactive systems with computational methods related to machine learning using large-scale text document data.
First, to support an efficient document labeling environment, I present a system called Attentive Interactive Labeling Assistant (AILA) . At its core, AILA uses Interactive Attention Module (IAM), a novel module that visually highlights words in a document that labelers may pay attention to when labeling a document. IAM utilizes attention-based Deep Neural Networks, which not only support a prediction of which words to highlight, but also enable labelers to indicate words that should be assigned high attention weights while labeling to improve the future quality of word prediction. The results our study showed that the participants’ labeling efficiency increased significantly under the condition with IAM than under the condition without IAM, while the two conditions maintained roughly the same labeling accuracy.
Second, detecting anomalous events of a particular area in a timely manner is an important task. Geo-tagged social media data are useful resource for this task, but the abundance of everyday language in them makes this task still challenging. To address such challenges, I present TopicOnTiles, a visual analytics system that can reveal the information relevant to anomalous events in a multi-level tile-based map interface by using social media data . To this end, I adopt and improve a recently proposed topic modeling method that can extract spatio-temporally exclusive topics corresponding to a particular region and a time point. Furthermore, I utilize a tile-based map interface to efficiently handle large-scale data in parallel. Our user interface effectively highlights anomalous tiles using our novel glyph visualization that encodes the degree of anomaly computed by our exclusive topic modeling processes. To show the effectiveness of our system, I present several usage scenarios using real-world datasets as well as comprehensive user study results.