VLO | CLARIN ERIC

Norsk talespråkskorpus - Oslodelen

2

NoTa-Oslo is a speech corpus with interviews and conversations from 166 informants born and raised in Oslo and the Oslo …

NoTa-Oslo is a speech corpus with interviews and conversations from 166 informants born and raised in Oslo and the Oslo area. The informants are carefully selected w.r.t. sociolinguistic variables and therefore representative in terms of age, gender, place of residence and education. NoTa-Oslo consists of approx. 957 0…

Norwegian Norwegian Bo..

Landing page for this record at www.tekstlab.uio.no

VCR

LIA Norwegian - Corpus of historical dialect recordings

(Part of Clarino - Textlab)

2

LIA Norwegian is a speech corpus with old recordings (1939 - 1996) from four Norwegian universities: NTNU, UoB, UoO and …

LIA Norwegian is a speech corpus with old recordings (1939 - 1996) from four Norwegian universities: NTNU, UoB, UoO and UoT. The recordings are mainly made for dialect and onomastic research and the topics of the interviews and conversations are typically about old trades such as agriculture, fisheries, logging and lif…

Norwegian Norwegian Ny..

Landing page for this record at tekstlab.uio.no

VCR

Corpus of American Nordic Speech v.3.1

(Part of Clarino - Textlab)

2

CANS v.3.1 - Corpus of American Nordic Speech - is a speech corpus with speakers from USA and Canada speaking Norwegian …

CANS v.3.1 - Corpus of American Nordic Speech - is a speech corpus with speakers from USA and Canada speaking Norwegian and Swedish. Most of the informants learnt to speak their Nordic language as children at home. There are 268 speakers from 63 places in the corpus, all in all more than 774 000 tokens. CANS v.3.1 con…

Norwegian Bo.. Swedish

Landing page for this record at www.tekstlab.uio.no

VCR

NoWaC v 1.0 (Norwegian Web as Corpus)

(Part of Clarino - Textlab)

2

Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloa…

Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepart…

Norwegian Bo..

Landing page for this record at www.hf.uio.no

VCR

The NDC Treebank

(Part of Clarino - Textlab)

1
3

The NDC Treebank includes 4637 speech segments and 66 042 tokens from the Norwegian part of Nordic Dialect Corpus. The s…

The NDC Treebank includes 4637 speech segments and 66 042 tokens from the Norwegian part of Nordic Dialect Corpus. The segments are taken from 30 transcribed interviews from 17 places in Norway. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is …

Norwegian Norwegian Bo..

Landing page for this record at tekstlab.uio.no

VCR

GermEval 2014 NER dataset

(Part of IMS, CLARIN-D Centre, University of Stuttgart)

1

This is the dataset corresponding to the GermEval 2014 NER Shared Task. The data is sampled from German Wikipedia articl…

This is the dataset corresponding to the GermEval 2014 NER Shared Task. The data is sampled from German Wikipedia articles and online news. It is annotated following the NoSta-D guidelines which are included in the dataset. The guidelines suggest four NER categories (PER, LOC, ORG, MICS/OTH) and are extended to account…

German

Landing page for this record

VCR

German Word/Tag Embeddings (Syntactic)

(Part of Tübingen Archive of Language Resources (TALAR))

2

Word and tag embeddings trained on TüDP-D/W and TüPP-D/Z using Wang2Vec.

German

Landing page for this record

VCR

Motion Verbs Eye-Tracking During Reading Study

(Part of Tübingen Archive of Language Resources (TALAR))

The experiment was conducted in a quiet experimental room with an SR Research Eye-Link 1000 eyetracker desktop mount wit…

The experiment was conducted in a quiet experimental room with an SR Research Eye-Link 1000 eyetracker desktop mount with a 35 mm lens, 13 point calibration and 1k sample rate and pacing interval. A game pad and keyboard were used to navigate in the experiment. Participants viewed the stimuli on a 21 in monitor 70 cm a…

Landing page for this record

VCR

The Level Stress Recordings

(Part of CLARINO Bergen Centre)

1
1

This collection consists of scripted recordings from different rural dialects spoken in Norway and Sweden, in total 33 r…

This collection consists of scripted recordings from different rural dialects spoken in Norway and Sweden, in total 33 recordings of 46 different speakers. The speakers’ year of birth ranges from 1909 to 1973. The sets of target words were designed to capture the quantity system and tonal system of the different dialec…

Swedish Norwegian

Landing page for this record

VCR

Norwegian-German legal terminology

(Part of CLARINO Bergen Centre)

1
1

The resource NOJU is a terminological database containing terms, definitions and other conceptual information in Norwegi…

The resource NOJU is a terminological database containing terms, definitions and other conceptual information in Norwegian and German within legal domains.

Norwegian Ny.. Norwegian Bo.. German

Landing page for this record

VCR

CLARIN Virtual Language Observatory

Facets

Language

Collection

Resource type

Modality

Format

Keyword

Genre

Subject

Country

Organisation

Data provider

National project

Search options

Temporal Coverage

Availability

Search options

Search results

Norsk talespråkskorpus - Oslodelen

LIA Norwegian - Corpus of historical dialect recordings

Corpus of American Nordic Speech v.3.1

NoWaC v 1.0 (Norwegian Web as Corpus)

The NDC Treebank

GermEval 2014 NER dataset

German Word/Tag Embeddings (Syntactic)

Motion Verbs Eye-Tracking During Reading Study

The Level Stress Recordings

Norwegian-German legal terminology