Big Data and Linguistic Research: A New Look at Grammatical Gender in the Number System

Authors

  • Nurit Melnik

Keywords:

Grammatical gender, cardinal numbers, agreement, big data, research methods

Abstract

The ability to collect, store and process huge amounts of data has brought with it a new concept - Big Data - and with it a new field that specializes in extracting knowledge and insights from a huge database using statistical tools. This field touches all areas of our lives, and some argue that it not only replaces traditional research methods but also obviates the familiar scientific method and the need for domain knowledge, models, and theories.

This paper discusses the relationship between Big Data and linguistic research, and in particular the way in which big data can be harnessed to advance knowledge about linguistic phenomena. As a test case, the question of the grammatical gender in the number system in Hebrew is discussed: What have previous studies revealed about the phenomena, what were the strengths and weaknesses of their more traditional methodologies, and what can be gained by Big Data-based research.

The paper describes two main contributions. First, the corpus data were used to re-examine the results of previous studies. The findings raise doubts about the extent of the phenomenon and even largely refute the hypotheses raised in the literature. Second, a data-driven process made use of computational analysis tools to track phenomena reflected from the data. As a result, models and relationships that were not apparent from the handful of data collected so far are exposed, previous hypotheses are reformulated, and new insights are offered.

The approach presented in this article does not see Big Data as a paradigm shift and a potential replacement for the scientific method, but rather the opposite. The combination of Big Data-based research methods in linguistic research will contribute to enriching our knowledge and understanding of linguistic phenomena.

Published

2024-08-26