Description

COSER is a corpus of dialects, restricted to the speech of informants who were the object of interest in traditional dialectology. These include rural speakers, preferably older, who have lower levels of education and are native to the place where they are interviewed. COSER feeds off the same type of informants as the linguistic atlases. To date (October 2024), 3,064 informants are registered in our database, although only slightly more than half of this number has been interviewed in depth:


Informants Number Average age
Men: 1.461 (47,68%) 74.8 years
Women: 1.603 (52,32%) 73.5 years
Total: 3.064 74.2 years



The overall average age of the informants is 74.2 years, slightly higher for men (74.8 years) than for women (73.5 years). These informants were born in the first half of the twentieth century. In terms of education, they all attended a few years of primary school with varying degrees of success. According to their statements, at school they learned "to read and write, and the four basic mathematical rules". Despite this, there are also numerous illiterate informants.

The recordings held in COSER have been obtained regularly from 1990 to date (October 2024) in a series of survey campaigns. This fieldwork has been organised with the support of several research projects and as part of the field work included in courses on Spanish dialectology ["Dialectología hispánica" (1988-1996)] and peninsular variants of Spoken Spanish ["El español hablado. Variantes peninsulares" (1996-2004)], as well as a course on varieties of Spanish ["Curso monográfico de variedades del español" (2005-2011)]. All of these courses were optional modules available to students studying Spanish at the Universidad Autónoma de Madrid. From 2011 to date, they have been an optional activity for third-year undergraduate students of the module on varieties of Spanish ["Lengua española. Variedades de la lengua"] on the university's Degree in Hispanic Studies.


Surveyed localities Provinces or islands Total hours of recordings Average recording length per interview Interviews available in text and audio (October 2024)
1461 56 1989 hours 1 hour, 5 minutes 244


Before 2024, interviews were carried out in 1461 rural localities in the Iberian Peninsula and the two archipelagos, belonging to 55 provinces or islands (which we have counted independently even though they belong to a single province). Their geographical location is shown on the map, where they can be identified with a numerical code that summarises the province and the locality, in alphabetical order (for example, Berganzo, in the province of Álava, has the code 0101). The sound materials cover a large part of the Iberian Peninsula and the density of the network of points is comparable to that of the regional atlases, or even thicker.

In total, COSER currently has 1947 hours of recordings. Although most of these were recorded in analogue format, in 2010 we were able to digitalise all the materials, of which we present a sample as sound files. Half of the materials have transcriptions of varying nature and accuracy, undertaken thanks to support from various research projects and the participation of several generations of UAM undergraduate students, who have transcribed, as part of their academic course work, recordings they had collected. In 2015 the 147 transcriptions corresponding to 141 localities (approximately 183 hours), revised and standardised with the BConcord editor, were published on this website (available files) and made searchable through a search engine. Between 2015 and October 2024, that number has increased to 244 transcriptions, equalling 333 hours, 44 minutes of recording, making a searchable corpus of 3.631,437 words. Since 2017, this corpus has been accessible in both the Simple Search and Advanced Search modes (which allows searches using lemmas and morphosyntactic tags). In 2019, the Advanced Search option was revised, and, among other improvements, it is now possible to download the search data in Excel format. In 2020, geographic coordinates and the postcode of the localities were enabled in this search option. This means that the data extracted can be analysed in Geographic Information Systems and text has also been synchronised with its corresponding audio. In 2021, synchronisation, spelling and labelling errors were revised throughout the available corpus. Finally, at the end of 2020, the corpus COSER was made available for download in open access, both in TXT (in real time) and in XML (with morphosyntactic tags), with three versions so far, December 2020, May 2022 and March 2024. In 2024, the improvement of the digitally recorded sound files has been addressed.

.

 


Localities whose transcript is available and searchable (October 2024) Provinces or islands Number of hours transcribed Total number of words transcribed Total units (tokens)
244 56 333 hours, 44 minutes 3.631,437 words 5.205,448 units