An Overview of Tagalog Human Language Technology Capabilities from the IARPA MATERIAL Program

C.Rubino, I. Zavorin
Intelligence Advanced Research Projects Activity (IARPA), District of Columbia, United States

Keywords: machine translation, automatic speech recognition, human language technology, information retrieval, foreign language

This poster will report on the multiple language processing capabilities created by IARPA's MATERIAL program for the Filipino (Tagalog) language. The MATERIAL Program was conceived to challenge large, interdisciplinary, international teams to develop methods for finding speech and text content in low-resource languages that is relevant to domain-contextualized English queries. Such methods must use minimal training data, no humans in the loop, and be rapidly deployable to new languages and domains. The software detailed in this poster was created to work on multiple genres of text and speech, and provide evidence of relevance in English for each retrieved document to better assist the English-speaking analyst. We will describe the solutions provided so far from the four competing teams, the novel detection metric, Actual Query Weighted Value (AQWV), designed as the program measure of an end-to-end evaluation methodology used to assess the quality of cross-language information retrieval, and the program's goal of propelling the state of the art in a number of human language technologies, including machine translation, cross-language information retrieval, automatic speech recognition, cross-language summarization, cross-lingual word embedding for semantic induction, and machine learning techniques as applied to low resource languages.