קורפוס העברית: מאגר חדש של שיחות בעברית דבורה

מיכל מרמורשטיין; נדב מטלון

Authors

Michal Marmorstein
Nadav Matalon

Keywords:

Spoken Language, Interactional Linguistics, Conversation Analysis, Conversation Database, Spoken Hebrew Corpus

Abstract

The paper reports on the creation of the HUJI Corpus of Spoken Hebrew (HUJICorpus). This new resource is designed to fill a widely acknowledged need for a publicly accessible and updated corpus of spoken Modern Hebrew. The HUJICorpus documents everyday conversations held over the telephone or in co-present interactions between students and their relatives and friends. The first part of the corpus, which has been uploaded to the HUJICorpus website, includes the audio files and transcripts of the telephone conversations. The focus on telephone conversations in the first stage enabled a reduction of the semiotic complexity and a concentration on the linguistic (verbal and vocal) features of the interaction. Drawing on the principles of Interactional Linguistics, the recordings were transcribed using a highly granular system of formal annotation which accurately captures the temporal, sequential and prosodic aspects of talk-in-interaction. The second part of the corpus, which is currently under development, will include video files and transcripts of face-to-face conversations.

The HUJI Corpus of Spoken Hebrew (HUJICorpus): A New Database of Hebrew Conversations

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

Language