VLO | CLARIN ERIC

Yle machine translated subtitles evaluation dataset

1

This dataset contains semi-automatically cleaned, parallel professional subtitles from 44 programs, containing 10.3k ali…

This dataset contains semi-automatically cleaned, parallel professional subtitles from 44 programs, containing 10.3k aligned sentence pairs for these language pairs: FIN-SWE, FIN-ENG, SWE-ENG. This dataset does not contain video or audio, but the total content length covered by the subtitles is 22,46 hours. --- Yle h…

Finnish Swedish English

VCR

Finnish Folk Poetry

1

The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi), http://urn.fi/urn:nbn:fi:lb-2014052…

The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi), http://urn.fi/urn:nbn:fi:lb-2014052711. A 34-volume collection of Finnic oral poetry, lyric, short rhymes, incantations etc., collected and recorded from the 16th century to the 1930s and published mostly between 1908 and 1948, with a…

Finnish Karelian Livvi Ludian Votic … (+3)

VCR

Corpus of Historical American English - Kielipankki Korp version 2017H1

1

The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi). The Corpus of Historical American …

The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi). The Corpus of Historical American English (COHA) contains about 385 million words and 115 000 texts from the years 1810-2009. Each decade has roughly the same balance of fiction, popular magazine, newspaper, and non-fiction books. Ac…

English

VCR

agrep - approximate grep

1

A tool for "approximate" string matching. More information: https://en.wikipedia.org/wiki/Agrep

VCR

Finnish conversational chat corpus, source

1

This resource is available for download in Kielipankki – the Language Bank of Finland. The FinChat corpus consists of 8…

This resource is available for download in Kielipankki – the Language Bank of Finland. The FinChat corpus consists of 85 Finnish chat dialogs collected in 2019-2020. The participants (N=62) were native speakers of Finnish in three age-based user groups: high school students (16-19 years), university students (20-25 ye…

Finnish

VCR

JRC-Acquis Multilingual Parallel Corpus

1

The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU Member States. This …

The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU Member States. This collection of legislative text changes continuously and currently comprises selected texts written between the 1950s and now. As of the beginning of the year 2007, the EU had 27 Member States and 23 o…

Bulgarian Czech Danish German Modern Greek … (+17)

VCR

Aalto University DSP Course Conversation Corpus 2013-2015, Downloadable Version

1

The corpus, which is the downloadable version of the years 2013-2015 of the Aalto University DSP Course Conversation Cor…

The corpus, which is the downloadable version of the years 2013-2015 of the Aalto University DSP Course Conversation Corpus 2013- (http://urn.fi/urn:nbn:fi:lb-2015101901), is available in Kielipankki - the Language Bank of Finland at https://korp.csc.fi/download/DSPCON. This version contains transcribed utterances fro…

Finnish

VCR

Yle Swedish News Archive 2012-2018, Korp

1

The corpus, containing the articles from Svenska YLE https://svenska.yle.fi from 2012 onwards up to 2018 inclusive, is a…

The corpus, containing the articles from Svenska YLE https://svenska.yle.fi from 2012 onwards up to 2018 inclusive, is available at Korp. The licence is available at http://urn.fi/urn:nbn:fi:lb-2019120401

Swedish

VCR

Parallel Corpus of Finnish and Easy-to-read Finnish from the Yle News Archive 2019-2020, source

1

The resource is available via Kielipankki – The Language Bank of Finland. This parallel dataset can be used for trainin…

The resource is available via Kielipankki – The Language Bank of Finland. This parallel dataset can be used for training simplification models and/or studying simplification strategies that experts apply for Finnish news articles. The languages of the dataset are Finnish and Easy-to-read Finnish. The articles of which…

Finnish

VCR

Corpus of Global Web-Based English - Kielipankki Korp version 2017H1

1

The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi). The Corpus of Global Web-Based Eng…

The corpus is available in Kielipankki - the Language Bank of Finland (korp.csc.fi). The Corpus of Global Web-Based English (GloWbE) contains about 1.8 billion words and 1 800 000 texts from web pages in United States, Great Britain, Australia, India, and 16 other countries. About 60 % of the texts come from blogs. A…

English

VCR

CLARIN Virtual Language Observatory

Facets

Language

Collection

Resource type

Modality

Format

Keyword

Genre

Subject

Country

Organisation

Data provider

National project

Search options

Temporal Coverage

Availability

Search options

Search results

Yle machine translated subtitles evaluation dataset

Finnish Folk Poetry

Corpus of Historical American English - Kielipankki Korp version 2017H1

agrep - approximate grep

Finnish conversational chat corpus, source

JRC-Acquis Multilingual Parallel Corpus

Aalto University DSP Course Conversation Corpus 2013-2015, Downloadable Version

Yle Swedish News Archive 2012-2018, Korp

Parallel Corpus of Finnish and Easy-to-read Finnish from the Yle News Archive 2019-2020, source

Corpus of Global Web-Based English - Kielipankki Korp version 2017H1