The CorCenCC corpus contains over 11 million words (circa 14.4m tokens) from written, spoken and electronic (online, digital texts) Welsh language sources, taken from a range of genres, language varieties (regional and social) and contexts. The contributors to CorCenCC are representative of the over half a million Welsh speakers in the country. The creation of CorCenCC was a community-driven project, which offered users of Welsh an opportunity to be proactive in contributing to a Welsh language resource that reflects how Welsh is currently used.
Find out more at the project website, and download the corpus from the Cardiff University Research Portal via the following link:
An entry for the corpus has also been made in the Oxford Text Archive repository, to facilitate its discovery via CLARIN (http://hdl.handle.net/20.500.14106/2564).