When OpenAI launched its fourth version of ChatGPT last March, the company made a sensational claim.
Its technology, OpenAI said, could now beat 90 percent of law students taking the US bar exam.
It was an idea that both fascinated Armin Alimardani, a lecturer in University of Wollongong’s School of Law, and left him sceptical.
READ MORE: Melbourne mayor posts ‘creepy’ AI-generated images depicting body next to a child
“The OpenAI claim was impressive and could have significant implications in higher education, for instance, does this mean students can just copy their assignments into generative AI and ace their tests?” Alimardani said.
“Many of us have played around with generative AI models and they don’t always seem that smart, so I thought why not test it out myself with some experiments.”
The test
Alimardani, who was the university’s co-ordinator for the subject Criminal Law last year, decided the end of semester exam was the perfect opportunity to put ChatGPT to the test.
After setting the exam question, Alimardani generated five AI answers using different versions of ChatGPT.
He also generated another five AI answers using a variety of prompt engineering techniques to enhance their responses.
“My research assistant and I hand wrote the AI generated answers in different exam booklets and used fake student names and numbers. These booklets were indistinguishable from the real ones,” Dr Alimardani said.
The AI exam papers were mixed in with the tests from real students and handed to tutors for grading.
The results
So how did the results compare to Australian law students?
The findings have been published today in the Journal of Law, Innovation and Technology.
A total of 225 real law students took the test, and scored an average mark of 66 percent.
The results of the AI papers where no prompt techniques were used were pretty woeful, Alimardani said.
“Two barely passed and the other three failed,” he said.
The best of the bunch only beat 14.7 percent of students, he added.
READ MORE: The Aussie industries getting a boost from AI
However, the AI papers where prompts were used performed much better, although still not as well as the original OpenAI claims of 90 percent.
“Three of the papers weren’t that impressive but two did quite well. One of the papers scored about 73 per cent and the other scored about 78 per cent,” Alimardani said.
“Overall, these results don’t quite match the glowing benchmarks from OpenAI’s United States bar exam simulation and none of the 10 AI papers performed better than 90 per cent of the students.”
Interestingly, none of the AI papers raised any suspicions with the tutors and most were genuinely surprised to find out they were written by ChatGPT, Alimardani said.
“Three of the tutors admitted that even if the submissions were online, they wouldn’t have caught it. So if academics think they can spot an AI generated paper, they should think again.”
Alimardani said he had expected the AI program to introduce ‘hallucinations’ or fabricated information into its responses, a known problem, however this was not the case.
While the AI responses were not as detailed, Alimardani said his findings showed graduates who know how to work with AI could have an advantage in the job market.
Do you have a story? Contact reporter Emily McPherson at emcpherson@nine.com.au
links to content on ABC
9News