• shareshare
  • link
  • cite
  • add
Other research product . 2021

Learning Robust Representations for Low-resource Information Extraction

Zhou, Yichao;
Open Access
Published: 01 Jan 2021
Publisher: eScholarship, University of California
Country: United States

Information extraction (IE) plays a significant role in automating the knowledge acquisition process from unstructured or semi-structured textual sources. Named entity recognition and relation extraction are the major tasks of IE discussed in this thesis. Traditional IE systems rely on high-quality datasets of large scale to learn the semantic and structural relationship between the observations and labels while such datasets are rare especially in the area of low-resource language processing (e.g. figurative language processing and clinical narrative curation). This leads to the problems of inadequate supervision and model over-fitting. In this thesis, we work on the low-resource IE algorithms and applications. We believe incorporating the supervision from domain-specific auxiliary knowledge and learning transferable representations can mitigate the deficiency of low-resource IE. Specifically, we explore pre-training domain-specific deep language models to acquire informative word/sentence embeddings to curate clinical narratives. We experiment with multi-modal learning techniques to recognize humor and to recommend keywords for advertisement designers. We also extract attributes of interest from the semi-structured web data by building transferable knowledge representations across different websites. For more applications of the low-resource IE, we build a COVID-19 surveillance system by inspecting users' daily social media data. Extensive experiments prove that our algorithms and systems outperform the state-of-the-art approaches and are of impressive interpretability as well.


Computer science, Information Extraction, Natural Language Processing, Text Mining

Related Organizations