Natural Language Processing Group (Sheffield)

The Natural Language Processing (NLP) group at the University of Sheffield is one of the largest and most successful research groups in language and information in the EU. The group is active in fields such as NLP infrastructures (GATE), information extraction, linguistic and terminological standardisation, social media analysis, machine learning methods for NLP, terminology extraction, NLP methods for Knowledge Management, and Linked Data and the Semantic Web. 

The open-source GATE text mining infrastructure ( and its vibrant user community (41,000 software downloads in the past 12 months alone and 265,000 – in the past 9 years) has a repository of over 150 text mining and NLP models and algorithms, including Information Extraction (IE), biomedical text mining, ontology-based semantic annotation, machine learning for IE, and NLP evaluation tools, as well as many 3rd party provided text mining plugins. 

There are also two cloud-based deployments of GATE as text mining platforms-as-a-service: GATECloud (; aimed at researchers wanting to run GATE-provided or their own text mining pipelines on big data) and AnnoMarket (; aimed at companies and other users who want to use pre-packaged, highly scalable GATE text mining web services). 

Our ever growing  e-humanities expertise has been acquired in many previous EC and national projects, including CLARIN (EC, language resources infrastructure), ARCOMEM (EC, archives, museums, and libraries in the age of the Social Web), TEXTvre (UK JISC, humanities VREs) and EnviLOD (UK JISC, mining British Library collections). Further, GATE has been successfully deployed in several other humanities projects in the past decade, including ETCSL (literary Sumerian), OldBaileyIE (Named entity recognition on 17th century court reports), ArtEquAkt (composite descriptions of cultural artefacts and figures) and TALH (annotation of 16th century Aberdeen council records).

Visit the NLP group website at

Key resources:

  • The General Architecture for Text Engineering (GATE) environment and tools, including multilingual NER, linguistic analysis, text mining and information extraction.

The NLP Group  was a founding member of the CLARIN-UK consortium in 2015.