Interdisciplinarity and the sharing of oral data open new perspectives to field linguistics

[BEL-4733] BEL, Bernard; GASQUET-CYRUS, Médéric (2011). Interdisciplinarity and the sharing of oral data open new perspectives to field linguistics. Colloque de l'AFLS : Regards nouveaux sur les liens entre théories, méthodes et données en linguistique française (2011 septembre 8-10 : Nancy, FRANCE). Internet : <> [COM] (34)

(1) Laboratoire Parole et Langage (LPL), CNRS : UMR 7309, Aix-Marseille Université, Aix-en-provence (France)
(2) Laboratoire Parole et Langage (LPL), CNRS : UMR 7309, Aix-Marseille Université, Aix-en-provence (France)

Our laboratory (LPL) is engaged in a process of resource collection, analysis and theorizing (socio)linguistics with particular focus on links between experimental and field approaches. In this framework we started two projects whose initial objectives were the construction of a corpus for endangered languages on the border area of Provençal and Francoprovençal (Valjouffrey and Valbonnais, Isère)(1) and documenting codeswitching in interactions between their speakers (2).
Fieldwork for these projects takes advantage of technology available from the speech experimental platform of LPL (3). Multichannel recordings with head-worn microphones permit an accurate study of ovelapping speech turns in sessions involving up to 8 participants. Full video coverage facilitates annotations meeting the requirements of research on multimodality (4).
These conditions led us to redefine the categories of ‘corpus’ and ‘annotation’ in terms of primary and secundary data. We include in the former all data collected during an experimental session or field enquiry, i.e. physiological signals associated with speech production but also photos, drawings, texts and documents handed over by participants. Secondary data is everything derived from primary data, including signal files reprocessed or transcoded for technical reasons.
Dealing with large amounts of data goes far beyond the scope of the projects motivating their collection. For this reason they rely on facilities for medium-term and long-term archiving offered by the Resource Centre for the Description of Oral (CRDO) (5) constructed on the OAIS model (6) with a versioning system adapted to the life cycle of such projects. All archives are accessible to the public if not assigned restrictions regulated by law (7).
Data sharing contributes to the popularity of projects with the effect of mobilizing amateurs or professionals handing over unpublished data (recordings, manuscripts or theses) to CRDO for its preservation and non-commercial distribution. This amplification phenomenon empowered the speakers of Valjouffrey patois. They became members of the research team in full right, appropriating research topics they feel most relevant: designing a script for their revitalized language (8) and undertaking a detailed inventory of place names (9) that delineate their living space (toponymy).
This interdisciplinary approach aims at reconciling a) the quality requirements of experimental linguistics, b) needs for the preservation and pooling of resources and c) ethics and quality requirements of field linguistics. Currently this work is supported by collaborative efforts to develop cross-corpus query techniques accessing both pertinent descriptive metadata and core material tagged with standardized annotation techniques (10).

1. Project Valjouffrey
History and links:
Shared resources:
2. Project (Re)parler « sa » langue : l'alternance codique, à la recherche des langues oubliées
3. Centre d’expérimentation sur la parole
4. OTIM project (Tools for Multimodal Information Processing)
5. CRDO-Aix submission site
Project Preservation Description Information:
6. Reference Model for an Open Archival Information System (OAIS)
7. Access rights management in compliance with the French Code du patrimoine: a generic approach for the OAIS model running CRDO-Aix
8. Audio/video work sessions: plans P38-39-40-41, 29 June 2010, and P59-P60, 12 February 2011
9. Video work session: plans P64m1, P64m2, 13 February 2011
10. This refers to ISOcat regarding control vocabularies, CLARIN’s European Demonstrator Case for query techniques, and FlaReNet with respect to standards and evaluation techniques.

corpus linguistics, sociolinguistics, endangered languages, Valjouffrey, script, toponymy, long-term preservation, medium-term archiving, OAIS, resource pooling

(Re)parler « sa » langue, l'alternance codique, à la recherche de langues « oubliées »

Méthodologies linguistiques

Financement des fédérations TUL-ILF et de la DGLFLF

