Digital Resources for the Languages in Ireland and Britain
A new registry of resources
In September 2024, a new CLARIN knowledge centre – Digital Resources for the Languages in Ireland and Britain (DR-LIB) – was launched to support researchers searching for resources on the languages of Britain and Ireland in all their varieties – native, and non-native, contemporary and historic, standard and non-standard. DR-LIB is a virtual and distributed network that acts as a point of contact for all questions relating to digital resources and research on these languages.
One of DR-LIB’s first goals is to compile a list of the digital resources – such as corpora, lexicons, language taggers, etc. – currently available for the study and research of the languages in Ireland and Britain and share these resources with CLARIN to make them more adherent to the FAIR principles – i.e., we aim to make them more findable, accessible, interoperable, and reusable). CLARIN, as a European Consortium that provides access to language data and tools to support research, is the ideal organisation to help with this effort, and it has two infrastructure that can help with this effort: the CLARIN Resource Families, which are collections of known resources organised by type and language, and the CLARIN Virtual Language Observatory, which is an interface for searching across and within resources known to CLARIN.
Below is a list of the resources that we have found so far that we have confirmed are active. Please do email us if you would like us to add something to list or if you find that something is no longer active. We will regularly update this page.
Language |
Name |
Description |
Breton |
Tools for translation, spellcheckers, Breton keyboard, Breton fonts, Breton dictionaries. |
|
Breton |
Breton language technology portal, promoting various digital tools and resources. |
|
Cornish |
|
|
Cornish |
Cornish dictionary. |
|
Cornish |
Cornish corpus. |
|
English |
Corpus-based description of the core vocabulary of English. |
|
English Welsh, etc. |
Python Multilingual Ucrel Semantic Analysis System. |
|
English Irish Welsh |
Translation and S2T Models. |
|
Hiberno-English |
A publicly accessible, sustainable electronic correspondence corpus. |
|
Irish |
Project developing synthetic voices for Irish. |
|
Irish |
The National Irish Language Biographical Database. |
|
Irish |
The National Terminology Database for Irish |
|
Irish |
Open source grammar checking engine. |
|
Irish |
|
|
Irish Manx Scottish Gaelic |
Private company that provides tools to the Irish Language community. Tools include: An Gramadóir, Caighdeánaitheoir Gaeilge, Foclóir Gàidhlig-Gaeilge, Foclóir Manainnis-Gaeilge, GaelSpell, Historical Irish Corpus, Intergaelic, Líonra Séimeantach na Gaeilge, Cadhan Aonair UD treebank, amongst others hosted on this site. |
|
Irish |
Treebank for Irish. |
|
Irish |
Irish Language Standardiser. |
|
Irish |
CODECS: Collaborative Online Database and e-Resources for Celtic Studies |
Comprehensive database of sources of interest to Celtic studies. |
Irish |
National Corpus of Irish. |
|
Irish |
NLP/ ELCTRA BERT based models. |
|
Irish |
A roadmap for Irish-language technology developments 2023-2027. |
|
Irish |
National Folklore Collection UCD Digitisation Project. |
|
Irish |
Dictionary of Irish. |
|
Irish |
English-Irish dictionary. |
|
Irish Scottish Gaelic |
A bilingual dictionary between Irish and Scottish Gaelic. |
|
Irish Manx |
A bilingual dictionary between Irish and Manx. |
|
Irish |
Irish language spellchecker. |
|
Irish |
Gaois Research Group; contains numerous corpora and resources related to terminology, idioms, surnames, etc. |
|
Irish |
Repository containing datasets and code for measuring progress in Irish language NLP. Includes datasets for author identification, bilingual lexicon induction, chunking, etc. |
|
Irish Manx Scottish Gaelic |
Code repository related to Universal Dependences corpora for Irish, Manx, and Scottish Gaelic |
|
Irish |
Over 3000 texts published in Irish between 1600 and 1926. |
|
Irish Manx Scottish Gaelic |
Dictionary and translation engine between Irish, Scottish Gaelic and Manx Gaelic. |
|
Irish |
Tagset developed specifical for Irish. |
|
Irish |
Digital repository of Irish manuscripts |
|
Irish |
A Universal Dependencies 4910-sentence treebank for modern Irish. |
|
Irish |
The Irish Language Semantic Network. |
|
Irish |
Placenames Database of Ireland. |
|
Irish |
Group of translators and computer scientists creating Irish language versions of software. |
|
Irish Welsh |
Language development data. |
|
Irish |
Dictionary and language library. |
|
Irish |
The National Terminology Database for Irish. |
|
Irish Scottish Gaelic |
A searchable textbase of 20th-century Gaelic texts (mostly Irish, with some Scottish), best described as ‘continuity Gaelic’. |
|
Irish |
A Universal Dependencies 4910-sentence treebank for modern Irish. |
|
Manx |
Treebank for Manx Gaelic. |
|
Manx |
Online corpus and search. |
|
Scottish Gaelic |
Annotated Reference Corpus of Scottish Gaelic. |
|
Scottish Gaelic |
|
|
Scottish Gaelic |
An NLTK corpus reader for ngram files; supports several languages. |
|
Scottish Gaelic |
Digital archive of Scottish Gaelic. |
|
Scottish Gaelic |
A historical dictionary. |
|
Scottish Gaelic |
The Gaelic Linguistic Analyser. |
|
Scottish Gaelic |
Digitised collection. |
|
Scottish Gaelic |
Digital library. |
|
Scottish Gaelic |
A treebank of Scottish Gaelic based on the Annotated Reference Corpus Of Scottish Gaelic (ARCOSG). |
|
Welsh |
National Corpus of Contemporary Welsh. |
|
Welsh |
National Corpus of Contemporary Welsh KWIC tool. |
|
Welsh |
Machine translation tool. |
|
Welsh |
Welsh semantic tagger. |
|
Welsh |
Software package that includes the Cysill Welsh-langauge grammar and spelling checker as well as the Cysgeir collection of dictionaries. |
|
Welsh |
Welsh spellchecker. |
|
Welsh |
Online collection of freely available digital resources designed to support the exploration, analysis, learning, and referencing of the Welsh language. |
|
Welsh |
Dictionary for adult learners of Welsh. |
|
Welsh |
Dictionary of Welsh |
|
Welsh |
Open source Welsh language voice assistant similar to Alexa or the Google Assistant. |
|
Welsh |
Public translation memory sharing service. |
|
Welsh |
Welsh National Language Technologies Portal. |
|
Welsh |
Welsh summarization dataset. |
|
Welsh |
standardized terminology to use in teaching and learning |
|
Welsh |
Welsh transcriber. |
|
Welsh |
A collection of on-line written Welsh and bilingual corpora in an easily searchable format. |
|
Welsh |
GATE-based NLP pipeline. |
|
Welsh |
|
|
Welsh |
National Corpus of Contemporary Welsh pedagogic toolkit. |
|
Welsh |
Standardized terminology for the field of education. |
Thanks to Dr Mo El-Haj (VinUniversity) and others in the CLIDA network for starting to map these resources in 2024.