addresses a gap in the literature ChatGPT
- NEWS
- Correction 11 July 2023
Scientists used ChatGPT to generate an entire paper from scratch — but is it any good?
An artificial-intelligence chatbot, ChatGPT, has been a co-pilot in the production of a research paper.Credit: Ascannio/Shutterstock
A pair of scientists has produced a research paper in less than an hour with the help of ChatGPT — a tool driven by artificial intelligence (AI) that can understand and generate human-like text. The article was fluent, insightful and presented in the expected structure for a scientific paper, but researchers say that there are many hurdles to overcome before the tool can be truly helpful.
The goal was to explore ChatGPT’s capabilities as a research ‘co-pilot’ and spark debate about its advantages and pitfalls, says Roy Kishony, a biologist and data scientist at the Technion — Israel Institute of Technology in Haifa. “We need a discussion on how we can get the benefits with less of the downsides,” he says.
The researchers designed a software package that automatically fed prompts to ChatGPT and built on its responses to refine the paper over time. This autonomous data-to-paper system led the chatbot through a step-by-step process that mirrors the scientific process, from initial data exploration, through writing data analysis code and interpreting the results, to writing a polished manuscript.
To put their system to the test, Kishony and his student Tal Ifargan, a data scientist also based at Technion, downloaded a publicly available data set from the US Centers for Disease Control and Prevention’s Behavioral Risk Factor Surveillance System, a database of health-related telephone surveys. The data set includes information collected from more than 250,000 people about their diabetes status, fruit and vegetable consumption, and physical activity.
They started up their system, and went for lunch.
The building blocks of a paper
First, system asked ChatGPT to write data exploration code. On its first attempt, the chatbot generated data exploration code that was riddled with errors and didn’t work. But when the team’s system detected these bugs, it automatically sent prompts back to ChatGPT, which fixed the code.
Next, Kishony and Ifargan’s system prompted ChatGPT to develop a study goal. ChatGPT suggested to explore how physical activity and diet affect diabetes risk. It was then asked to create a data analysis plan and data analysis code, and based on the output of this code, ChatGPT delivered the results: eating more fruit and vegetables and exercising is linked to a lower risk of diabetes.
With the results at hand, the system then guided ChatGPT to write the paper. It opened two ChatGPT conversations. In one, the tool told the chatbot that it was a scientist and instructed it to write each section of the paper. The second ChatGPT played the role of reviewer that provided constructive feedback on the text generated by the ‘scientist’ version of the chatbot.
A common problem with generative AI tools is their tendency to fill in the gaps by making things up, a phenomenon known as hallucination. To help address the possibility that it would make up references, the team allowed ChatGPT to access literature search engines so that it could generate a paper with correct citations.
By the end of lunch, ChatGPT had generated a clearly written manuscript with solid data analysis. But the paper was not perfect. For instance, it states that the study “addresses a gap in the literature” — a phrase that is common in papers but inaccurate in this case, says Tom Hope, a computer scientist at the Hebrew University of Jerusalem. The finding is “not something that’s going to surprise any medical experts”, he says. “It’s not close to being novel.”
Benefits and concerns
Kishony also worries that such tools could make it easier for researchers to engage in dishonest practices such as P-hacking, for which scientists test several hypotheses on a data set, but only report those that produce a significant result.
Another concern is that the ease of producing papers with generative AI tools could result in journals being flooded with low-quality papers, he adds. Although the team’s data-to-paper approach demonstrates how papers can be generated autonomously, it is also specifically designed to create papers that explains the steps ChatGPT took to get there, meaning that researchers can understand, check and replicate the methods and findings, says Kishony.
Vitomir Kovanovi?, who develops AI technologies for education at the University of South Australia in Adelaide, says that there needs to be greater visibility of AI tools in research papers. Otherwise, it will be difficult to assess whether a study’s findings are correct, he says. “We will likely need to do more in the future if producing fake papers will be so easy.”
Generative AI tools have the potential to accelerate the research process by carrying out straightforward but time-consuming tasks — such as writing summaries and producing code — says Shantanu Singh, a computational biologist at the Broad Institute of MIT and Harvard in Cambridge, Massachusetts. They might be used for generating papers from data sets or for developing hypotheses, he says. But because hallucinations and biases are difficult for researchers to detect, Singh says, “I don’t think writing entire papers — at least in the foreseeable future — is going to be a particularly good use.”
Nature 619, 443-444 (2023)
doi: https://doi.org/10.1038/d41586-023-02218-z
UPDATES & CORRECTIONS
-
Correction 11 July 2023: An earlier version of this story implied that human researchers had guided ChatGPT through the steps to create a research paper. In fact, the data-to-paper tool developed by Kishony and Ifargan acted as an intermediary between ChatGPT and the researchers. The text has been corrected in several places to reflect this.