Welcome to the VLO!
Use the search bar below to start searching through hundreds of thousands of language resources, or continue to browse everything and use facets to narrow down to your area of interest or discover new resources.
See all records Learn more Take a quick tourUse the categories below to limit the search results to those matching the selected value(s).
Show more facetsThese levels provide an indication of the degree to which resources and tools are publicly accessible. Please check the specific conditions on any resource or tool that you end up using.
This corpus contains the audio recordings of all actors who use the SmartKom system; it covers the audio recordings (no …
This corpus contains the audio recordings of all actors who use the SmartKom system; it covers the audio recordings (no video) and annotations of all three original SmartKom corpora Public, Mobile and Home. Naive users were asked to test a 'prototype' for a market study not knowing that the system was in fact controlle…
The MOCHA database was compiled as part of the Engineering and Physical Sciences Research Council grant number:GR/L78680…
The MOCHA database was compiled as part of the Engineering and Physical Sciences Research Council grant number:GR/L78680 : "Speech recognition using articulatory data." It features a set of 460 short sentences designed to include the main connected speech processes in English (e.g. assimilations, weak forms ...). All r…
The RVG-J Corpus (Regional Variants of German - Junior) was recorded in 2001 at the Institute of Phonetics and Speech Co…
The RVG-J Corpus (Regional Variants of German - Junior) was recorded in 2001 at the Institute of Phonetics and Speech Communication at the University of Munich, Germany. The corpus contains both read and non-scripted German utterances. It comprises the original RVG prompts (telephone numbers, sentences, commands, digit…
The SMARTWEB UMTS data collection was created within the publicly funded German SmartWeb project in the years 2004 - 2006. It comprises a collection of user queries to a naturally spoken Web interface with the main focus on the soccer world series in 2006. The recordings include field recordings using a hand-held UMTS device (one person, SmartWeb Handheld Corpus SHC), field recordings with video capture of the primary speaker and a secondary speaker (SmartWeb Video Corpus SVC) as well as mobile recordings performed on a BMW motorbike (one speaker, SmartWeb Motorboke Corpus SMC). An addendum DVD-R (dvd-fau, vol 24) contains additional data derived from the basic SVC corpus data provided by FAU Erlangen. Starting from version 3.6 (BAS CLARIN Repository version 2), SHC is distributed as an emuR compatible emuDB.; SmartWeb is based on two parallel efforts that have the potential of forming the ba-sis for the next generation of the Web. The first effort is the semantic Web[1] which provides the tools for the explicit markup of the content of Web pages, the second effort is the development of semantic Web services which results in a Web where programs act as autonomous agents to become the producers and consumers of infor-mation and enable automation of transactions. The appeal of being able to ask a question to a mobile internet terminal and receive an answer immediately has been renewed by the broad availability of information on the Web. Ideally, a spoken dialogue system that uses the Web as its knowledge base would be able to answer a broad range of questions. Practically, the size and dynamic nature of the Web and the fact that the content of most web pages is encoded in natu-ral language makes this an extremely difficult task. However, SmartWeb exploits the machine-understandable content of semantic Web pages for intelligent question-answering as a next step beyond today's search engines. Since semantically annotated Web pages are still very rare due to the time-consuming and costly manual markup, SmartWeb is using advanced language technology and information extraction methods for the automatic annotation of traditional web pages encoded in HTML or XML. But SmartWeb does not only deal with information-seeking dialogues but also with task-oriented dialogues, in which the user wants to perform a transaction via a Web service (e.g. buy a ticket for a sports event or program his navigation system to find a souvenir shop). SmartWeb is the follow-up project to SmartKom (www.smartkom.org), carried out from 1999 to 2003. SmartKom is a multimodal dialog system that combines speech, gesture, and facial expressions for input and output[2]. Spontaneous speech under-standing is combined with the video-based recognition of natural gestures and facial expressions. One version of SmartKom serves as a mobile travel companion that helps with navigation and point-of-interest in-formation retrieval in location-based services (using a PDA as a mobile client). The SmartKom architecture[3] supports not only simple multimodal command-and-control interfaces, but also coherent and cooperative dialogues with mixed initiative and a synergistic use of multiple modalities. Although SmartKom works in multiple domains (e.g. TV program guide, tourist information), it supports only restricted-domain question answering. SmartWeb goes beyond Smart-Kom in supporting open-domain question answering using the entire Web as its knowledge base. SmartWeb provides a context-aware user interface, so that it can support the user in different roles, e.g. as a car driver, a motor biker, a pedestrian or a sports spectator. One of the planned demonstrators of SmartWeb is a personal guide for the 2006 FIFA world cup in Germany, that provides mobile infotainment services to soccer fans, anywhere and anytime. Another SmartWeb demonstrator is based on P2P communica-tion between a car and a motor bike. When the car's sensors detect aqua-planing, a succeeding motor biker is warned by SmartWeb "Aqua-planing danger in 200 meters!". The biker can interact with SmartWeb through speech and haptic feedback, the car driver can input speech and gestures. SmartWeb is based on two new W3C standards for the semantic Web, the Resource Description Framework (RDF/S) and the Web Ontology Language (OWL) for repre-senting machine interpretable content on the Web. OWL-S ontologies support seman-tic service descriptions, focusing primarily on the formal specification of inputs, out-puts, preconditions, and effects of Web services. In SmartWeb, multimodal user re-quests will not only lead to automatic Web service discovery and invocation, but also to the automatic composition, interoperation and execution monitoring of Web ser-vices. The academic partners of SmartWeb are the research institutes DFKI (consortium leader), FhG FIRST, and ICSI together with university groups from Erlangen, Karlsruhe, Munich, Saarbrücken, and Stuttgart. The industrial partners of SmartWeb are BMW, DaimlerChrysler, Deutsche Telekom, and Siemens as large companies, as well as EML, Ontoprise, and Sympalog as small businesses. The German Federal Ministry of Education and Research (BMBF) is funding the SmartWeb consortium with grants totaling 13.7 million euros.
Name
|
|
Collection
|
|
Language
| |
Modality
|
|
Country
|
|
Genre
|
|
Subject
|
|
Organisation
|
|
National project
|
|
This corpus contains multi modal recordings of 65 actors who use the SmartKom system. SmartKom Home should be an intelli…
This corpus contains multi modal recordings of 65 actors who use the SmartKom system. SmartKom Home should be an intelligent communication assistant for the private environment. Naive users were asked to test a 'prototype' for a market study not knowing that the system was in fact controlled by two human operators. The…
The CI_2 corpora contain synchronous speech recordings of 48 cochlear implant users (CI) and 48 speakers without hearing…
The CI_2 corpora contain synchronous speech recordings of 48 cochlear implant users (CI) and 48 speakers without hearing impairment (control group, KG). The data were analyzed in Veronika Neumeyer's dissertation "Akustische Analysen der Sprachproduktion von CI-Trägern" (2015). CI_2_VOT contains recordings used for the …
The SMARTWEB UMTS data collection was created within the publicly funded German SmartWeb project in the years 2004 - 200…
The SMARTWEB UMTS data collection was created within the publicly funded German SmartWeb project in the years 2004 - 2006. It comprises a collection of user queries to a naturally spoken Web interface with the main focus on the soccer world series in 2006. The recordings include 156 field recordings using a hand-held U…
This corpus contains multi modal recordings of 86 actors who use the SmartKom system. SmartKom Public is comparable to a…
This corpus contains multi modal recordings of 86 actors who use the SmartKom system. SmartKom Public is comparable to a traditional public phone booth but equipped with additional intelligent communication devices. Naive users were asked to test a 'prototype' for a market study not knowing that the system was in fact …
The CI_2 corpora contain German speech recordings of 48 cochlear implant users (CI) and 48 speakers without hearing impa…
The CI_2 corpora contain German speech recordings of 48 cochlear implant users (CI) and 48 speakers without hearing impairment (control group, KG). The data were analyzed in Veronika Neumeyer's dissertation "Akustische Analysen der Sprachproduktion von CI-Trägern" (2015). CI_2_Cluster contains recordings used for the a…
This corpus contains tasks, where one subject (the instructor) describes different Tangram figures to another subject (t…
This corpus contains tasks, where one subject (the instructor) describes different Tangram figures to another subject (the receiver) so that the receiver can recreate the same order of figures that the instructor has in front of them. The subjects initially don't know each other and work together to solve these tasks i…