Unit for Linguistic Data, DSI,University of Galway

The Unit for Linguistic Data (ULD) is concerned with the creation, improvement and maintenance of linguistic data (also known as language resources) through a variety of methods. The term linguistic data refers to a range of data types that are of use to researchers in linguistics and natural language processing (NLP). Principally, linguistic data can be split into four major categories: firstly, lexical data contains descriptions of words and their meanings, syntax and relations; secondly, corpora consist of collections of texts made for a particular purpose; thirdly, language descriptions document typological properties of language to enable comparative studies; and finally, metadata about language resources and their availability.

As a primary research method, this group is focused on exploring the use of linked data technologies, that is Linguistic Linked Open Data (LLOD), as a method of processing linguistic data. This has led to the development of several key tools and resources that use linked data as a key part of its mechanism. One such tool, the Naisc tool is a novel tool developed by the group for linking together resources of different kinds and has been applied to the task of linking lexicographical resources in the context of the ELEXIS project. Another tool, Teanga, enables the construction of pipelines of NLP tools that can be composed and integrated through the use of linked data and standards for linguistic data, such as the OntoLex-Lemon standard developed in this project. Finally, ULD maintains and develops several catalogues for the discovery of resources of linguistic data, including the Linghub website as well as the Linked Open Data Cloud and its Linguistic Linked Open Data Subcloud. In the context of the Prêt-à-LLOD project, ULD is further exploring how the quality and availability of resources can be improved.

One of the major applications of linguistic data is the use of already developed NLP technologies in new languages and domains. As such, a major part of this group's work is on under-resourced languages, and there is much ongoing work on the development of technologies for minority languages as well as active collaboration with the Irish Department and the Moore Institute on the development of NLP techniques for historical languages, in particular Old Irish. Furthermore, the unit is working on expanding WordNet to many under-resourced languages by means of machine translation.

Areas of work:

Linked data, Under-resourced languages, Digital humanities, Language resources, Lexicography, Metadata, Linguistic linked open data, Linked-data-based services,

Upcoming (0)

Sorry, there are no upcoming events

Past (25)

Teanga Seminar Series primary image

Teanga Seminar Series

Thu, Feb 22, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Oct 9, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Jun 26, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, May 29, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Apr 24, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Mar 27, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Feb 27, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Jan 30, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Nov 21, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Oct 24, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Sep 26, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Jul 25, 5:00 PM GMT+1

Free

Teanga Seminar Series primary image

Teanga Seminar Series

Thu, Feb 22, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Oct 9, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Jun 26, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, May 29, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Apr 24, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Mar 27, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Feb 27, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Jan 30, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Nov 21, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Oct 24, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Sep 26, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Jul 25, 5:00 PM GMT+1

Free

The Unit for Linguistic Data (ULD) is concerned with the creation, improvement and maintenance of linguistic data (also known as language resources) through a variety of methods. The term linguistic data refers to a range of data types that are of use to researchers in linguistics and natural language processing (NLP). Principally, linguistic data can be split into four major categories: firstly, lexical data contains descriptions of words and their meanings, syntax and relations; secondly, corpora consist of collections of texts made for a particular purpose; thirdly, language descriptions document typological properties of language to enable comparative studies; and finally, metadata about language resources and their availability.

As a primary research method, this group is focused on exploring the use of linked data technologies, that is Linguistic Linked Open Data (LLOD), as a method of processing linguistic data. This has led to the development of several key tools and resources that use linked data as a key part of its mechanism. One such tool, the Naisc tool is a novel tool developed by the group for linking together resources of different kinds and has been applied to the task of linking lexicographical resources in the context of the ELEXIS project. Another tool, Teanga, enables the construction of pipelines of NLP tools that can be composed and integrated through the use of linked data and standards for linguistic data, such as the OntoLex-Lemon standard developed in this project. Finally, ULD maintains and develops several catalogues for the discovery of resources of linguistic data, including the Linghub website as well as the Linked Open Data Cloud and its Linguistic Linked Open Data Subcloud. In the context of the Prêt-à-LLOD project, ULD is further exploring how the quality and availability of resources can be improved.

One of the major applications of linguistic data is the use of already developed NLP technologies in new languages and domains. As such, a major part of this group's work is on under-resourced languages, and there is much ongoing work on the development of technologies for minority languages as well as active collaboration with the Irish Department and the Moore Institute on the development of NLP techniques for historical languages, in particular Old Irish. Furthermore, the unit is working on expanding WordNet to many under-resourced languages by means of machine translation.

Areas of work:

Linked data, Under-resourced languages, Digital humanities, Language resources, Lexicography, Metadata, Linguistic linked open data, Linked-data-based services,

Events

Sorry, there are no upcoming events
Teanga Seminar Series primary image

Teanga Seminar Series

Thu, Feb 22, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Oct 9, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Jun 26, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, May 29, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Apr 24, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Mar 27, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Feb 27, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Jan 30, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Nov 21, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Oct 24, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Sep 26, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Jul 25, 5:00 PM GMT+1

Free

Teanga Seminar Series primary image

Teanga Seminar Series

Thu, Feb 22, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Oct 9, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Jun 26, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, May 29, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Apr 24, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Mar 27, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Feb 27, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Jan 30, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Nov 21, 5:00 PM GMT

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Oct 24, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Sep 26, 5:00 PM GMT+1

Free

Cardamom Seminar Series primary image

Cardamom Seminar Series

Mon, Jul 25, 5:00 PM GMT+1

Free