VLO | CLARIN ERIC

NoWaC v 1.0 (Norwegian Web as Corpus)

2

Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloa…

Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepart…

Norwegian Bo..

Landing page for this record at www.hf.uio.no

VCR

Motion Verbs Eye-Tracking During Reading Study

(Part of Tübingen Archive of Language Resources (TALAR))

The experiment was conducted in a quiet experimental room with an SR Research Eye-Link 1000 eyetracker desktop mount wit…

The experiment was conducted in a quiet experimental room with an SR Research Eye-Link 1000 eyetracker desktop mount with a 35 mm lens, 13 point calibration and 1k sample rate and pacing interval. A game pad and keyboard were used to navigate in the experiment. Participants viewed the stimuli on a 21 in monitor 70 cm a…

Landing page for this record

VCR

German Word/Tag Embeddings (Syntactic)

(Part of Tübingen Archive of Language Resources (TALAR))

2

Word and tag embeddings trained on TüDP-D/W and TüPP-D/Z using Wang2Vec.

German

Landing page for this record

VCR

Parallel Corpus of Finnish and Easy-to-read Finnish from the Yle News Archive 2019-2020, source

1

The resource is available via Kielipankki – The Language Bank of Finland. This parallel dataset can be used for trainin…

The resource is available via Kielipankki – The Language Bank of Finland. This parallel dataset can be used for training simplification models and/or studying simplification strategies that experts apply for Finnish news articles. The languages of the dataset are Finnish and Easy-to-read Finnish. The articles of which…

Finnish

VCR

Yle Finnish News Archive 2011-2018, VRT

1

The corpus, containing the articles from YLE https://yle.fi from 2011-2018 is available in the download service of Kieli…

The corpus, containing the articles from YLE https://yle.fi from 2011-2018 is available in the download service of Kielipankki, the Language Bank of Finland, at korp.csc.fi/download.; Ylen uutisarkiston artikkeleita sivulta YLE https://yle.fi vuosilta 2011-2018 tullaan julkaisemaan Kielipankin latauspalvelussa korp.csc.fi/download.

Finnish

VCR

Christmas Gospel text-to-speech in four Uralic languages, Korp

1

This resource is available via Korp in Kielipankki – the Language Bank of Finland. This resource consists of .txt and .…

This resource is available via Korp in Kielipankki – the Language Bank of Finland. This resource consists of .txt and .wav files in four languages pertaining to the Finnish Christmas Gospel verses Luke 2. 1–20 The four languages include Komi-Zyrian (kpv), Erzya (myv), Karelian (krl) and Olonets-Karelian (olo, aka Livv…

Komi-Zyrian Erzya Karelian Livvi Kompane

VCR

Yle media evaluation dataset

1

This audiovisual dataset contains * audio files, subtitles and ground truth transcripts, speaker diarizations and NER a…

This audiovisual dataset contains * audio files, subtitles and ground truth transcripts, speaker diarizations and NER annotations of 16 factual programs in Finnish and Swedish * video files, subtitles, metadata and annotations for 8 factual programs that have been used for demonstration and test purposes in the MeMAD …

Finnish Swedish

VCR

Helsinki Corpus of Scottish Correspondence (1540-1750)

1

The Helsinki Corpus of Scottish Correspondence comprises circa 0.4 million words (0.5 million tokens) of early Scottish …

The Helsinki Corpus of Scottish Correspondence comprises circa 0.4 million words (0.5 million tokens) of early Scottish correspondence by male and female writers dating from the period 1540-1750. Unlike the majority of digital resources available for historical linguistics at present, the corpus consists of transcripts…

English

VCR

Yle News Archive Easy-to-read Finnish 2011-2018, Korp

1

The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi). This dataset consists of the Yle S…

The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi). This dataset consists of the Yle Selkokieliset uutiset in Finnish (Yle Easy-to-read Finnish News). The dataset was created from the contents of the Yle News Archive for the language code "fi" for each month from the year 2011 to the y…

Finnish

VCR

Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s, Downloadable Version 2

1

The resource, containing entire newspaper and magazine articles, has been made available for Download in Kielipankki - t…

The resource, containing entire newspaper and magazine articles, has been made available for Download in Kielipankki - the Language Bank of Finland at http://urn.fi/urn:nbn:fi:lb-201712201 The data consists of source data in PDF form or as plain text and is not annotated. An annotated version (lehdet90ff-vrt-v2) is av…

Finnish

VCR

CLARIN Virtual Language Observatory

Facets

Language

Collection

Resource type

Modality

Format

Keyword

Genre

Subject

Country

Organisation

Data provider

National project

Search options

Temporal Coverage

Availability

Search options

Search results

NoWaC v 1.0 (Norwegian Web as Corpus)

Motion Verbs Eye-Tracking During Reading Study

German Word/Tag Embeddings (Syntactic)

Parallel Corpus of Finnish and Easy-to-read Finnish from the Yle News Archive 2019-2020, source

Yle Finnish News Archive 2011-2018, VRT

Christmas Gospel text-to-speech in four Uralic languages, Korp

Yle media evaluation dataset

Helsinki Corpus of Scottish Correspondence (1540-1750)

Yle News Archive Easy-to-read Finnish 2011-2018, Korp

Corpus of Finnish Magazines and Newspapers from the 1990s and 2000s, Downloadable Version 2