Welcome to the VLO!
Use the search bar below to start searching through hundreds of thousands of language resources, or continue to browse everything and use facets to narrow down to your area of interest or discover new resources.
See all records Learn more Take a quick tourUse the categories below to limit the search results to those matching the selected value(s).
Show more facetsThese levels provide an indication of the degree to which resources and tools are publicly accessible. Please check the specific conditions on any resource or tool that you end up using.
The corpus contains read speech of 10 different speakers. Each speaker has read approx. 1000 sentences from a German new…
The corpus contains read speech of 10 different speakers. Each speaker has read approx. 1000 sentences from a German news paper corpus, thus resulting in a total of approx. 10000 recorded utterances. The recording took place at the Institut fuer Phonetik, University of Munich, Germany in 1994.
The BigBrother Corpus is a speech corpus with recordings from the first season of the BigBrother show, sent on Norwegian…
The BigBrother Corpus is a speech corpus with recordings from the first season of the BigBrother show, sent on Norwegian television by TVNorge in the first half of 2001. The participants in BigBrother speak different dialects, but primarily they come from the east of Norway. They are aged 23-36 years. The BigBrother C…
Nordic Dialect Corpus v.4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. …
Nordic Dialect Corpus v.4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes from a variety of sources, (see homepa…
The LIA Treebank includes 7536 speech segments and 77 701 tokens from LIA Norwegian. The treebank is annotated with morp…
The LIA Treebank includes 7536 speech segments and 77 701 tokens from LIA Norwegian. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is available in three versions: A downloadable version in conllx format, a searchable version in the search inter…
CANS v.3.1 - Corpus of American Nordic Speech - is a speech corpus with speakers from USA and Canada speaking Norwegian …
CANS v.3.1 - Corpus of American Nordic Speech - is a speech corpus with speakers from USA and Canada speaking Norwegian and Swedish. Most of the informants learnt to speak their Nordic language as children at home. There are 268 speakers from 63 places in the corpus, all in all more than 774 000 tokens. CANS v.3.1 con…
Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloa…
Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepart…
This is the dataset corresponding to the GermEval 2014 NER Shared Task. The data is sampled from German Wikipedia articl…
This is the dataset corresponding to the GermEval 2014 NER Shared Task. The data is sampled from German Wikipedia articles and online news. It is annotated following the NoSta-D guidelines which are included in the dataset. The guidelines suggest four NER categories (PER, LOC, ORG, MICS/OTH) and are extended to account…
The treebank "Annotations of fiction text from 'Nynorskkorpuset ved Norsk Ordbok 2014' is a syntactically annotated corp…
The treebank "Annotations of fiction text from 'Nynorskkorpuset ved Norsk Ordbok 2014' is a syntactically annotated corpus which uses text extracts from Nynorskkorpuset ved Norsk Ordbok 2014 (no2014.uio.no). This treebank is part of INESS NorGramBank collection (see URL in metadata).; Text Preprocessing: When a corpus…
The experiment was conducted in a quiet experimental room with an SR Research Eye-Link 1000 eyetracker desktop mount wit…
The experiment was conducted in a quiet experimental room with an SR Research Eye-Link 1000 eyetracker desktop mount with a 35 mm lens, 13 point calibration and 1k sample rate and pacing interval. A game pad and keyboard were used to navigate in the experiment. Participants viewed the stimuli on a 21 in monitor 70 cm a…
Word and tag embeddings trained on TüDP-D/W and TüPP-D/Z using Wang2Vec.
Word and tag embeddings trained on TüDP-D/W and TüPP-D/Z using Wang2Vec.