Welcome to the VLO!
Use the search bar below to start searching through hundreds of thousands of language resources, or continue to browse everything and use facets to narrow down to your area of interest or discover new resources.
See all records Learn more Take a quick tourUse the categories below to limit the search results to those matching the selected value(s).
Show more facetsThese levels provide an indication of the degree to which resources and tools are publicly accessible. Please check the specific conditions on any resource or tool that you end up using.
This corpus contains the audio recordings of all actors who use the SmartKom system; it covers the audio recordings (no …
This corpus contains the audio recordings of all actors who use the SmartKom system; it covers the audio recordings (no video) and annotations of all three original SmartKom corpora Public, Mobile and Home. Naive users were asked to test a 'prototype' for a market study not knowing that the system was in fact controlle…
The MOCHA database was compiled as part of the Engineering and Physical Sciences Research Council grant number:GR/L78680…
The MOCHA database was compiled as part of the Engineering and Physical Sciences Research Council grant number:GR/L78680 : "Speech recognition using articulatory data." It features a set of 460 short sentences designed to include the main connected speech processes in English (e.g. assimilations, weak forms ...). All r…
This corpus contains tasks, where one subject (the instructor) describes different Tangram figures to another subject (t…
This corpus contains tasks, where one subject (the instructor) describes different Tangram figures to another subject (the receiver) so that the receiver can recreate the same order of figures that the instructor has in front of them. The subjects initially don't know each other and work together to solve these tasks i…
This corpus contains tasks, where one subject (the instructor) describes different Tangram figures to another subject (t…
This corpus contains tasks, where one subject (the instructor) describes different Tangram figures to another subject (the receiver) so that the receiver can recreate the same order of figures that the instructor has in front of them. The subjects initially don't know each other and work together to solve these tasks i…
The corpus contains read speech of 101 different speakers (50 female, 50 male, 1 unknown). Each speaker has read approx.…
The corpus contains read speech of 101 different speakers (50 female, 50 male, 1 unknown). Each speaker has read approx. 100 sentences from either the SZ subcorpus or the CeBit subcorpus. The language is German. The subcorpus SZ contains 544 sentences from newspaper articles ("Sueddeutsche Zeitung"). The subcorpus Ce…
The CI_2 corpora contain synchronous speech recordings of 48 cochlear implant users (CI) and 48 speakers without hearing…
The CI_2 corpora contain synchronous speech recordings of 48 cochlear implant users (CI) and 48 speakers without hearing impairment (control group, KG). The data were analyzed in Veronika Neumeyer's dissertation "Akustische Analysen der Sprachproduktion von CI-Trägern" (2015). CI_2_Sibilants contains recordings used fo…
The VERIF1DE database is a subset of the VERIDAT speaker verification database collected by T-Nova. VERIDAT contains add…
The VERIF1DE database is a subset of the VERIDAT speaker verification database collected by T-Nova. VERIDAT contains additional items and re-recordings of missing, corrupted, or otherwise unusable files in VERIF1DE. Please refer to the file DESIGN.PDF in the documentation package of this corpus for a detailed descripti…
The corpus contains speech of 88 different speakers, reading the German story 'Der Nordwind und die Sonne'. Subcorpus T …
The corpus contains speech of 88 different speakers, reading the German story 'Der Nordwind und die Sonne'. Subcorpus T contains the recordings of 16 native Germans (L1). The other 72 speakers which were born and educated in other countries (L2) are pooled in subcorpus C. Every speaker has a distinct accent. This corpu…
The SC10 corpus contains read and non-prompted German and mother tongue speech of 70 different speakers from 17 mother t…
The SC10 corpus contains read and non-prompted German and mother tongue speech of 70 different speakers from 17 mother tongues (L1) in a variety of speaking styles e.g. reading, retelling, free talk etc. Starting from version 1.5 (BAS CLARIN repository version 3), the corpus is distributed as an emuDB. BAS CLARIN repos…
The speech corpus aGender contains speech sample recordings over public telephone lines with read and (semi-)spontaneous…
The speech corpus aGender contains speech sample recordings over public telephone lines with read and (semi-)spontaneous speech. Native German speakers called a voice portal from their private phone, and read text + answered some open questions. The purpose of the corpus is the automatic detection of gender and/or age …