Keynote 1: What does it mean to find patterns in language data?
Karin Tusting (Lancaster University)
Wednesday 17/07/2024, 12:45-13:45, Auditorium A1.08
In this lecture I will reflect on the discipline of Linguistics and the range of different ways in which we make meaning from language data. Using concrete examples from research in different sub-disciplinary traditions, including linguistic ethnography, literacy studies, and corpus analysis, I will explore the epistemological assumptions which the different datasets and approaches to analysis rest on. I will consider the different ways that we can learn about language from these different approaches: the long, slow, “nose-to-data” approach to interactional data which characterises much linguistic ethnography; the qualitative thematic analysis which literacy studies often draws on; and the identification of patterns across thousands or millions of words which gives corpus linguistics its particular value. I will consider the warrants each of these approaches can provide for the claims that it makes, and the different ways of knowing about language that are enabled by each of them. The lecture will invite the audience to think about the assumptions that are built into the approaches to language that they have been thinking about during the summer school, and more broadly, the nature of the claims to truth that the discipline makes.
Karin Tusting is a full professor at the Department of Linguistics and English Language, Lancaster University, UK. Her research focuses on workplace literacy practices. She recently led an ESRC-funded project exploring academics’ writing practices, published as Academics Writing: The Dynamics of Knowledge Creation (with McCulloch, Bhatt, Hamilton and Barton, Routledge 2019). She has been a leading figure in the development of linguistic ethnography, convened the BAAL Linguistic Ethnography Special Interest Group for six years, and edited The Routledge Handbook of Linguistic Ethnography (2020).
Keynote 2: The main corpus-linguistic statistics, their problems, and thoughts re solutions
Stefan Gries (University of California, Santa Barbara (UCSB) & Justus-Liebig-Universität Giessen)
Friday 19/07/2024, 12:45-13:45, Auditorium A1.08
For decades, corpus linguists have used statistics like (adjusted) frequencies, dispersions, association measures, and keyness to quantify notions such as entrenchment/recency, contingency, and importance/aboutness based on distributional patterns in corpora, and this has undoubtedly yielded many insightful results. In this talk, I nevertheless want to argue that there are some problems with many of the things we have done, especially when it comes to not just doing exploratory or descriptive analyses but theoretically relevant and explanatory analyses. Specifically, I will discuss the problems that, for many applications, (i) we measure our dimensions of corpus-linguistic information in ways that threaten the explanatory value, (ii) we measure/include too few of them anyway, and (iii) we measure them on less-than-ideal input. I will then discuss attempts at addressing these issues (based on my forthcoming new book). I will argue that, instead of additional proliferation of methods we can unify the corpus-linguistic statistics we use, we can attempt to remove intercorrelations between them to arrive at more cleanly measured information, we can include multiple dimensions at the same time; in addition, I will make propose to rethink our tokenization processes. I will discuss two examples that involve association and keyness.
Stefan Th. Gries is a Professor of Linguistics in the Department of Linguistics at the University of California, Santa Barbara (UCSB) and Chair of English Linguistics (Corpus Linguistics with a focus on quantitative methods, 25%) at the Justus-Liebig-Universität Giessen. He earned his M.A. and Ph.D. degrees at the University of Hamburg, Germany, in 1998 and 2000. He taught at the Department of Business Communication and Information Science of the University of Southern Denmark at Sønderborg (1998–2005), and then spent 10 months as a visiting scholar in the Psychology Department of the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, before he accepted a position at UCSB, starting November 1, 2005. Gries was a visiting professor at five LSA Linguistic Institutes, a Visiting Chair (2013–2017) of the Centre for Corpus Approaches to Social Science at Lancaster University, and the Leibniz Professor at the Research Academy Leipzig of the Leipzig University. Methodologically, Gries is a quantitative corpus linguist with also some interest in parts of computational and psycholinguistics, who uses a variety of different statistical methods to investigate linguistic topics such as morphophonology, syntax (syntactic alternations), the syntax-lexis interface, semantics, second/foreign language acquisition, and corpus-linguistic methodology. Most of his recent work involves the open source software R. Theoretically, he is a cognitively oriented usage-based linguist (with an interest in Construction Grammar) in the wider sense of seeking explanations in terms of cognitive processes.