The HUJI Corpus of Spoken Hebrew (HUJICorpus): A New Database of Hebrew Conversations
Keywords:
Spoken Language, Interactional Linguistics, Conversation Analysis, Conversation Database, Spoken Hebrew CorpusAbstract
The paper reports on the creation of the HUJI Corpus of Spoken Hebrew (HUJICorpus). This new resource is designed to fill a widely acknowledged need for a publicly accessible and updated corpus of spoken Modern Hebrew. The HUJICorpus documents everyday conversations held over the telephone or in co-present interactions between students and their relatives and friends. The first part of the corpus, which has been uploaded to the HUJICorpus website, includes the audio files and transcripts of the telephone conversations. The focus on telephone conversations in the first stage enabled a reduction of the semiotic complexity and a concentration on the linguistic (verbal and vocal) features of the interaction. Drawing on the principles of Interactional Linguistics, the recordings were transcribed using a highly granular system of formal annotation which accurately captures the temporal, sequential and prosodic aspects of talk-in-interaction. The second part of the corpus, which is currently under development, will include video files and transcripts of face-to-face conversations.