A reliable way to verify student writing and combat ChatGPT/AI writing: Authorship Authentication

Written by Vegar Andreas Bergum | Feb 02, 2024

Written in February 2024, by Vegar Andreas Bergum - Head of AI, Norlab.ai

Ghostwriting and contract cheating have been a long-standing threat to academic integrity. The increased accessibility to professional ghostwriters through generative AI (ChatGPT and others) has made this a massive problem that is not being addressed by the current integrity solutions. Plagiarism checkers are not designed to tackle this problem, and AI detection is fundamentally flawed – more on this later…

In this article, we will discuss the motivation behind building a new academic integrity solution to tackle the age-old problem of ghostwriting, how it works, and why your institution should move to protect its reputation and integrity.

Motivation

When building a solution for the future, we need to look at the ongoing trends. The trend has been clear for the past 4-5 decades. In a paper from 2018, a group of researchers reviewed over 65 studies spanning from 1978 to 2014 (How Common Is Commercial Contract Cheating in Higher Education and Is It Increasing?), exploring the escalation in commercial contract cheating. This survey found that commercial contract cheating has increased from a historical average of 3.5% of students to an alarming 15.7% in 2014. This is a dramatic increase – pre-covid and pre-generative AI. This survey does not reveal the full extent of the issue either as they only investigate commercial contract cheating – in other words: students who paid a third party to write their essays. This was 10 years ago – what about today?

Ghostwriting has been amplified massively over the past year. ChatGPT is the perfect ghostwriter, and it is literally available at anyone’s fingertips. Ghostwriting by AI is quickly becoming a new threat, and there are no tools to deal with it. Just a few months after the launch of ChatGPT, Forbes reported that 89% of students admit to using ChatGPT for homework. This is alarming, and an unprecedented escalation of the academic integrity problem. Earlier this year, researchers from Aalto University in Finland published a survey where they found that “ChatGPT [has] the potential to blur the lines between acceptable and unacceptable behavior from a student’s perspective”. This situation is quickly becoming unwieldy.

At the same time, the industry is answering by introducing “AI detectors”. These are black-box models that produce a probability of whether or not a piece of text is written by a GPT or not. In June of 2023, the message was clear from researchers at 8 universities that had tested 12 of these tools: “(...) available detection tools are neither accurate nor reliable”. Further, they say that “easy solutions for detection of AI-generated text does not (and could maybe even not) exist”. Instructure (owner of the LMS Canvas) CEO echoed this sentiment live on CNBC stating that AI detectors have “a lot of false positives” and that detection is “probably the wrong approach”.

Even with AI detectors claiming to be 99% accurate, 1 out of 100 students will be falsely accused of cheating – simply for showing up. This is unacceptable. At the same time, the AI detectors are not addressing the real issue: Ghostwriting as a whole. Remember the 15.7% of students who admitted to commercial cheating in 2014? We need to tackle ghostwriting, regardless of whether the ghostwriter is AI or another human.

Solution: Authorship Authentication

The manual process of detecting ghostwriting is straightforward for an educator that has great knowledge of their student’s abilities. The educator will be able to compare new student writing with the historical context of that specific student. If a well-known student suddenly hands in an essay that is written by someone else, you could be able to spot the drastic change in writing style or ability level. You might start asking yourself “This does not sound like them.”. How would you follow up on such a suspicion? You would have a conversation with the student in question. Talk about the submission and the content of their essay. Does the student know what they are presenting in their paper? This process gives the educator enough data and insight for an actionable follow-up. Of course, this does not work if you have hundreds, if not thousands, of students where you don’t know their historic writing. Even so, with anonymous grading requirements, this is impossible.

At Nor Education, we treat this as an administrative task that should give pedagogs and graders the confidence that they are grading authentic work. We start by automating the manual process we just laid out. Let’s start by comparing a new submission with historical writing.

The field of forensic linguistics and text analysis has been around for decades. Lately, this field of computational linguistics has been powered by natural language processing – the fundamental technology behind the GPTs we’re seeing today. There are annual scientific conferences and workshops on digital text forensics and stylometry, for example, the PAN events from the Webis Group (https://pan.webis.de/). This research introduces the concept of authorship verification, where a linguistic system determines if two pieces of text are written by the same author.

Authorship Verification: determine if two pieces of text are written by the same author.

Since this system is grounded in linguistic theory, one can even start experimenting with extracting the reasoning behind the final determination. This can become an evidence-based system rather than a black box. The state of the art in this field can produce models that are up to 99% precise in determining authorship.

Figure 1: Authorship Verification

We are reproducing these results, even in low-resource languages like Norwegian (only 5 million people globally speak Norwegian). With this method, we can compare new submissions from a student with an authentic, historical, piece of writing. As seen in Figure 1: The authentic text is compared with a new submission to compare writing style and give a suspicion of ghostwriting. This only gets us to 99% – we need more. Let’s have a conversation with the student.

Large Language Models (LLM) such as OpenAI’s ChatGPT are great at producing human-like content. We can leverage this technology to follow up any new submission with a personal, tailored series of questions that are meant to determine authorship. At Nor Education, we’ve built an automatic examiner that asks questions about the content of newly handed-in submissions. The questions are not meant to measure ability level – that’s up to the grader and educators – but they measure authorship. We’re essentially automating the process of having a conversation with the student about their essay, at the moment of submission. This can be done at scale, privately, and securely.

How do we ensure quality in this process? As the author of this article, I wrote a thesis back in 2019/2020 using Transformers to generate questions in a pedagogical setting. Transformers, the architecture behind GPTs (short for Generative Pre-trained Transformers), have been around since Google Brain’s famous paper Attention Is All You Need in 2017. The question generation models produced were state-of-the-art at the time, and ChatGPT shows how far the technology has come since. To tailor the questions such that they measure authorship, not ability, we deploy Item Response Theory (IRT) and Psychometrics to our questions. This is a pedagogical and statistical theory, dating back to the 1970s, that allows us to measure latent attributes of questions and tests – ensuring that the questions measure what we want them to measure.

To sum up, we’ve built a system that

compares new student submissions with their historical authentic work to determine the likelihood of authenticity – using state-of-the-art linguistic models, and
follows up with a simulated conversation at the moment of submission – removing any doubt or uncertainty of their authorship.

We call this system Authorship Authentication shown in Figure 2. The assessment body (university) produces an assignment that is assigned to an examinee (student). The student submits their response, i.e. their essay, back to the university. On the left side: the student is asked to answer a set of questions in real-time – simulating a conversation about their essay. On the right side: the essay is run through one or more cheating detection models, ranging from plagiarism scanners to the new author verifier. The final results give the educators and academic integrity officers actionable insight for follow-up.

Figure 2: Authorship Authentication

The Authorship Authentication system tackles the issue of ghostwriting – by humans AND by AI/GPT – in a reliable and precise manner. We measure authenticity, as opposed to detecting cheating. Addressing ghostwriting and AI writing is crucial for academic institutions that wish to uphold their integrity and reputation. At the end of the day, an inauthentic diploma undermines the credibility of not only the institution but also all the other authentic diplomas. Upholding academic integrity is a matter of fairness and needs to be addressed.

If you want to learn more about Nor Education’s Authorship Authentication you can sign up by using our contact form at the bottom of www.nor.education or e-mail us at contact@nor.education

View full post