Free

Actions and Detail Panel

Free

Event Information

Share this event

Date and time

Location

Location

Online event

Event description
Documenting and modelling inflectional paradigms in under-resourced languages

About this event

The Unit for Linguistic Data at the Insight SFI Research Centre for Data Analytics / Data Science Institute, National University of Ireland Galway is delighted to welcome Dr Ekaterina Vylomova, a Lecturer and a Postdoctoral Fellow at the University of Melbourne, to be the next speaker in our seminar series. She will talk about the UniMorph project, which attempts to create a universal (cross-lingual) annotation schema.

Abstract:

This talk will present the UniMorph project, an attempt to create a universal (cross-lingual) annotation schema. UniMorph allows an inflected word from any language to be defined by its lexical meaning, typically carried by the lemma, and a bundle of universal morphological features defined by the schema. Since 2016, the UniMorph database has been gradually developed and updated with new languages, and SIGMORPHON shared tasks served as a platform to compare computational models of inflectional morphology. During 2016–2021, the shared tasks made it possible to explore the data-driven systems’ ability to learn declension and conjugation paradigms and evaluate how well they generalize across typologically diverse languages. It is essential since the elaboration of formal techniques of cross-language generalization and prediction of universal entities across related languages should provide new potential to the modelling and documentation of under-resourced languages. The talk will outline the major challenges we faced while converting the language-specific features into the UniMorph schema, especially in under-resourced languages. In addition, we will discuss typical errors made by the majority of the systems, e.g. incorrectly predicted instances due to allomorphy, form variation, misspelt words, looping effects. Finally, it will provide case studies for Russian, Tibetan, and Nen.

About the Speaker:

Dr Ekaterina Vylomova is a Lecturer and a Postdoctoral Fellow at the University of Melbourne. Her research is focused on compositionality modelling for morphology, models of inflectional and derivational morphology, linguistic typology, diachronic language models, and neural machine translation. She co-organized SIGTYP 2019 – 2021 workshops and shared tasks and the SIGMORPHON 2017 – 2021 shared tasks on morphological re-inflection.

Host:

The seminar series is led by the Cardamom project team. The Cardamom project aims to close the resource gap for minority and under-resourced languages using deep-learning-based natural language processing (NLP) and exploiting similarities of closely related languages. The project further extends this idea to historical languages, which can be considered closely related to their modern form. It aims to provide NLP through both space and time for languages that current approaches have ignored.


		Cardamom Seminar Series image

		Cardamom Seminar Series image
Share with friends

Date and time

Location

Online event

{ _('Organizer Image')}

Organizer Unit for Linguistic Data, DSI, NUIG

Organizer of Cardamom Seminar Series

Cardamom Seminar Series is hosted by the Cardamom Project team. The Cardamom project is funded by the Irish Research Council under the Consolidator Laureate Award scheme (grant number IRCLA/2017/129), with John P. McCrae as principal investigator. The project will run from 2019–2023 and is hosted within the Unit for Linguistic Data in the Insight Centre for Data Analytics, Data Science Institute at the National University of Ireland Galway.

The Unit for Linguistic Data (ULD), led by Dr John P. McCrae, is concerned with the creation, improvement and maintenance of linguistic data (also known as language resources) through a variety of methods. The major applications of linguistic data are the use of already developed NLP technologies in new languages and domains. As such, a major part of this group's work is on under-resourced languages, and there is much ongoing work on the development of technologies for minority languages as well as active collaboration with the Irish Department and the Moore Institute on the development of NLP techniques for historical languages, in particular Old Irish. Furthermore, the unit is working on expanding WordNet to many under-resourced languages by means of machine translation.

Save This Event

Event Saved