Cardamom Seminar Series
Date and time
Location
Online event
Cardamom Seminar Series
About this event
The Unit for Linguistic Data at the Data Science Institute, National University of Ireland Galway, is delighted to welcome Dr Adriana Molina-Muñoz, a researcher at the University of Oxford, as the first speaker in our seminar series. In her talk, Dr Molina-Muñoz will give an overview of methodological and theoretical challenges that emerge when working with a historical language.
The aim of the seminar series is to connect researchers working to alleviate challenges around language resources and technologies for minority, historical, indigenous and lesser-resourced languages across the globe. The seminar series will provide us with a platform to discuss various types of problems and share our views to solve problems that researchers face during their research.
Abstract:
The fields of Corpus Linguistics and Natural Language Processing (NLP) have advanced an array of methods for studying different aspects of human languages using (large or small) collections of texts or corpora. One of the major contributions for syntactic studies has been the development of a wide range of treebanks, which has facilitated research on syntactic patterns (synchronic and diachronic), but mainly of the major Indo-European languages. Treebanks have enabled the investigation of predictions made by syntactic analyses, the search for rare constructions, and the extraction of enough data to support sophisticated statistical analyses. We now have access to large quantities of data and those data can now be extracted with ease and precision.
This talk discusses the main methodological and theoretical challenges that emerge when working with a historical language such as Sanskrit, either from a synchronic or diachronic perspective. In particular, the availability of Sanskrit digital texts, the challenges of building a corpus and tagging it. I present concrete examples from the project “Uncovering Sanskrit Syntax” (University of Oxford). This project is funded by a three-year Research Project Grant from the Leverhulme Trust (2019-2021), and it is led by John Lowe. The project goes beyond the standard two-way distinction between Vedic and Classical Sanskrit: by studying a wide variety of texts from different regions, genres and periods. This is the first large-scale project to analyse interclausal syntax in Sanskrit (anaphora, control and unbounded dependencies).
About the Speaker:
Dr Molina-Muñoz is a researcher on the project Uncovering Sanskrit Syntax. She did her PhD in Linguistics at the University of Illinois at Urbana-Champaign. Her research focuses on how syntactic phenomena are affected by other components of the language faculty, such as morphology and semantics/pragmatics. She has investigated synchronically and diachronically different interface phenomena such as word order, compounding, relativization, ergativity, and aspect; focusing largely on Sanskrit and Hindi, but she has also worked on Bribri (Chibchan family, Costa Rica).
Host:
The seminar series is led by the Cardamom project team. The Cardamom project aims to close the resource gap for minority and under-resourced languages by means of deep-learning-based natural language processing (NLP) and exploiting similarities of closely-related languages. The project further extends this idea to historical languages, which can be considered as closely related to their modern form, and as such aims to provide NLP through both space and time for languages that have been ignored by current approaches.