As researchers, we often find ourselves constructing datasets manually using non-standard information. This laborious process is time-consuming and can discourage us from producing valuable research papers. However, artificial intelligence has given us a new tool: large language models. These models, such as GPT-3.5 and GPT-4, can parse through information-filled documents and extract the required information. This process is best suited for well-documented categorical information, and while not perfect, it is promising.
Large language models can locate well-documented data scattered online, such as government information. For instance, by feeding the name of a bank into the GPT model, it was able to successfully recover the dates of closure and acquirer of every bank in a set of randomly selected banks from the Federal Deposit Insurance Corporation (FDIC). This tool can also be applied to national layoff datasets by reducing the possibility of including companies that are strongly impacted by local crime.
Despite limitations in computation-related tasks and information availability, GPT models can also identify the names and political affiliations of mayors in US cities. By setting weaker constraints on population, there was a lower accuracy rate, but switching to the GPT-4 version improved accuracy. It is important to note that this method is new and largely untested, but it holds significant promise for researchers.
To utilize large language models for your own research, follow these steps: identify a well-suited project, create a prompt that specifies the desired output, and feed it into the GPT model. While the capabilities of these models are large and complex, exploring them may benefit your research in significant ways.
In conclusion, artificial intelligence is revolutionizing the way we collect and analyze data. Large language models such as GPT-3.5 and GPT-4 have proven to be efficient in locating well-documented information scattered on the internet. Although there are limitations regarding computation-related tasks and information availability, the promising results support individual experimentation. As researchers continue to incorporate AI-based tools into their work, we can expect to see significant advancements in the quality and accuracy of research results.