VLO | CLARIN ERIC

BAS SI1000

(Part of Bavarian Archive for Speech Signals (BAS))

10
1
1

The corpus contains read speech of 10 different speakers. Each speaker has read approx. 1000 sentences from a German new…

The corpus contains read speech of 10 different speakers. Each speaker has read approx. 1000 sentences from a German news paper corpus, thus resulting in a total of approx. 10000 recorded utterances. The recording took place at the Institut fuer Phonetik, University of Munich, Germany in 1994.

English German

Landing page for this record

VCR

The BigBrother Corpus

(Part of Clarino - Textlab)

2

The BigBrother Corpus is a speech corpus with recordings from the first season of the BigBrother show, sent on Norwegian…

The BigBrother Corpus is a speech corpus with recordings from the first season of the BigBrother show, sent on Norwegian television by TVNorge in the first half of 2001. The participants in BigBrother speak different dialects, but primarily they come from the east of Norway. They are aged 23-36 years. The BigBrother C…

Norwegian Norwegian Bo..

Landing page for this record at www.tekstlab.uio.no

VCR

Nordic Dialect Corpus v. 4.0

(Part of Clarino - Textlab)

2

Nordic Dialect Corpus v.4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. …

Nordic Dialect Corpus v.4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes from a variety of sources, (see homepa…

Norwegian Bo.. Swedish Danish Icelandic Faroese; Fae..

Landing page for this record at www.tekstlab.uio.no

VCR

The LIA Treebank

(Part of Clarino - Textlab)

2
2

The LIA Treebank includes 7536 speech segments and 77 701 tokens from LIA Norwegian. The treebank is annotated with morp…

The LIA Treebank includes 7536 speech segments and 77 701 tokens from LIA Norwegian. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is available in three versions: A downloadable version in conllx format, a searchable version in the search inter…

Norwegian Norwegian Ny..

Landing page for this record at tekstlab.uio.no

VCR

Corpus of American Nordic Speech v.3.1

(Part of Clarino - Textlab)

2

CANS v.3.1 - Corpus of American Nordic Speech - is a speech corpus with speakers from USA and Canada speaking Norwegian …

CANS v.3.1 - Corpus of American Nordic Speech - is a speech corpus with speakers from USA and Canada speaking Norwegian and Swedish. Most of the informants learnt to speak their Nordic language as children at home. There are 268 speakers from 63 places in the corpus, all in all more than 774 000 tokens. CANS v.3.1 con…

Norwegian Bo.. Swedish

Landing page for this record at www.tekstlab.uio.no

VCR

NoWaC v 1.0 (Norwegian Web as Corpus)

(Part of Clarino - Textlab)

2

Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloa…

Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepart…

Norwegian Bo..

Landing page for this record at www.hf.uio.no

VCR

GermEval 2014 NER dataset

(Part of IMS, CLARIN-D Centre, University of Stuttgart)

1

This is the dataset corresponding to the GermEval 2014 NER Shared Task. The data is sampled from German Wikipedia articl…

This is the dataset corresponding to the GermEval 2014 NER Shared Task. The data is sampled from German Wikipedia articles and online news. It is annotated following the NoSta-D guidelines which are included in the dataset. The guidelines suggest four NER categories (PER, LOC, ORG, MICS/OTH) and are extended to account…

German

Landing page for this record

VCR

NorGramBank Annotations of fiction text from 'Nynorskkorpuset ved Norsk Ordbok 2014'

(Part of Clarino Bergen Centre - INESS)

1

The treebank "Annotations of fiction text from 'Nynorskkorpuset ved Norsk Ordbok 2014' is a syntactically annotated corp…

The treebank "Annotations of fiction text from 'Nynorskkorpuset ved Norsk Ordbok 2014' is a syntactically annotated corpus which uses text extracts from Nynorskkorpuset ved Norsk Ordbok 2014 (no2014.uio.no). This treebank is part of INESS NorGramBank collection (see URL in metadata).; Text Preprocessing: When a corpus…

Norwegian Norwegian Ny..

Landing page for this record

VCR

Motion Verbs Eye-Tracking During Reading Study

(Part of Tübingen Archive of Language Resources (TALAR))

The experiment was conducted in a quiet experimental room with an SR Research Eye-Link 1000 eyetracker desktop mount wit…

The experiment was conducted in a quiet experimental room with an SR Research Eye-Link 1000 eyetracker desktop mount with a 35 mm lens, 13 point calibration and 1k sample rate and pacing interval. A game pad and keyboard were used to navigate in the experiment. Participants viewed the stimuli on a 21 in monitor 70 cm a…

Landing page for this record

VCR

German Word/Tag Embeddings (Syntactic)

(Part of Tübingen Archive of Language Resources (TALAR))

2

Word and tag embeddings trained on TüDP-D/W and TüPP-D/Z using Wang2Vec.

German

Landing page for this record

VCR

CLARIN Virtual Language Observatory

Facets

Language

Collection

Resource type

Modality

Format

Keyword

Genre

Subject

Country

Organisation

Data provider

National project

Search options

Temporal Coverage

Availability

Search options

Search results

BAS SI1000

The BigBrother Corpus

Nordic Dialect Corpus v. 4.0

The LIA Treebank

Corpus of American Nordic Speech v.3.1

NoWaC v 1.0 (Norwegian Web as Corpus)

GermEval 2014 NER dataset

NorGramBank Annotations of fiction text from 'Nynorskkorpuset ved Norsk Ordbok 2014'

Motion Verbs Eye-Tracking During Reading Study

German Word/Tag Embeddings (Syntactic)