Google Books has reportedly started indexing low-quality books that AI programs may have generated, raising concerns about the impact on its language tracking tool, Ngram. This tool, essential for academics, analyzes language usage over time by studying written works.
In a recent investigation by 404Media, it was discovered that Google Books included books that resembled AI-generated content. These books, such as Bears, Bulls, and Wolves: Stock Trading for the Twenty-Year-Old by Tristin McIver, seemed to have scraped information from sources like Wikipedia and included phrases commonly used by AI chatbots, like as of my last knowledge update. Even books on current topics like Twitter contained outdated information from 2021, potentially affecting the accuracy of language tracking.
Ngram heavily relies on the data from Google Books, which scans and indexes works dating back centuries. While Google assured that recent questionable works do not impact Ngram’s results, there is a possibility that they could influence future updates. Researchers and linguists often use Ngram for language-related studies, making the accuracy of its data crucial.
This development raises questions about the quality and reliability of the data used by Ngram and the impact of AI-generated content on academic research. As technology continues to advance, ensuring the integrity of tools like Ngram becomes increasingly important in maintaining the standards of scholarly work and language analysis.