I don't believe that this methodology actually provides meaningful evidence for their claims. To quote the paper, which IMO is still talking down the problem:
We caution that changes in meaning or semantic shift of the CDS n-grams may potentially bias our results. ...
the choice of CDS n-grams could lead to a “recency bias” in our results, explaining their rise in prevalence in recent decades. [their 'control' for this is IMO irrelevant]
We caution that although the Google Books data have been widely used to assess cultural and linguistic shifts, and they are one of the largest records of historical literature, it remains uncertain whether CDS prevalence truly reflects changes in societal language and societal wellbeing. Many books included in the Google Books sample were published at times or locations marked by reduced freedom of expression, widespread propaganda, social stigma, and cultural as well as socioeconomic inequities that may reduce access to the literature, potentially reducing its ability to reflect societal changes.
Note that the n-grams from (17) are in a 2020 paper on Twitter, which is a rather different corpus to published books! From that one:
we relied on individuals reporting their personal clinical depression diagnoses on social media [and] recommend caution when generalizing our findings to the level of all individuals who have depression. ... Our lexicon of CDS was composed and approved by a panel of ten experts who may have been only partially successful in capturing all of the n-grams used to express distorted ways of thinking. On a related note, the use of CDS n-grams implies that we measure distorted thinking by proxy, namely through language, and our observations may be therefore be affected by linguistic and cultural factors. Common idiosyncratic or idiomatic expressions may syntactically represent a distorted form of thinking, but no longer do so in practice.
We emphasize that not all use of CDS n-grams reflects depressive thinking, as these phrases are part of normal English usage, and it would therefore be wrong to try to diagnose depression merely on the basis of use of one or more such phrases.
In 1978, prevalence began to rise slowly, and then in 2000 more rapidly, leveling out again around 2008 at a historically-high level.
I think it's interesting to consider how those trends might correlate with rising numbers of people identifying as non-religious.
Regardless of whether changes in religion caused an increase in depression, I think it's certainly possible that it influenced how people might have felt writing about their life and experience.
Speaking as someone raised in a fundamentally religious setting, there's often a certain kind of guilt associated with expressing negativity about oneself or ones life. No matter how bad you feel about yourself, definitively calling yourself a "loser" would be an affront to the creator.
A lot of those typologies of cognitive distortions would be read as vain/worldly/unfaithful towards a divine plan. People might be experiencing all of that internally, but interpreting it as a spiritual failing to be expressed through spiritual language, if at all.
The researchers looked for these language patterns in 14 million books, published over the past 125 years in English, Spanish, and German, that are available via Google Ngram, to see how their prevalence has changed over time.
(emphasis mine)
Given that what is published is a tiny, highly selected fraction of what is said, spoken, etc, why should we feel confident in drawing any population-wide conclusions at all from a study of published work? Even if we limit the relevant population to authors, I would be hesitant to draw any conclusions, given that only a fraction of what authors write gets published, and what is published often goes through multiple rounds of editing before it hits the presses.
Maybe it's just that cultural tastes have shifted so that more open discussions of poor mental health are acceptable, and, as a result we see greater representation of that in published work.
Historical language records reveal a surge of cognitive distortions in recent decades
My summary: People diagnosed with depression tend to exhibit characteristic patterns of language use that demonstrate the underlying cognitive distortions associated with depression.
The researchers looked for these language patterns in 14 million books, published over the past 125 years in English, Spanish, and German, that are available via Google Ngram, to see how their prevalence has changed over time.
They found that in general the prevalence of such language patterns decreased or stayed stable over the course of the 20th century up until around 1978. There were some local and temporary spikes (e.g. German-language texts between the world wars and after World War II, English-language texts in 1899 for some reason). In 1978, prevalence began to rise slowly, and then in 2000 more rapidly, leveling out again around 2008 at a historically-high level.
The authors conclude that there has been a recent rapid and strong rise in the use of language patterns that suggest the cognitive distortions associated with depression, in recent years in published books.
Some Concerns