Reflections on Authenticity in the Age of Chatbots

These thoughts have been long overdue.

What are human thoughts, and with the popularity of certain chatbots, why have we all become so obsessed with authenticity?

Nowhere is this obsession more evident than in education, a field I have become quite familiar with.

I spent some time working on plagiarism detection shortly after transformer-based language models were discovered but before the public release of GPT-3. Given where I was at the time, I was privy to the general buzz of transformers, BERT and all the way to public release of ChatGPT3.

Stylometry could potentially detect recently submitted papers. However, in such cases, the detection logic isn’t easily verifiable by users—in this case, teachers. This is crucial because charges of plagiarism carry severe consequences. Verification is not only important but critical to the reliability of the product. Our product’s accuracy was high but not 100%. While plagiarism can be easily verified by a human, stylometric-based algorithms cannot. Moreover, no algorithm is infallible, let alone perfect in recall.

Since then, there is no doubt that ChatGPT has influenced nearly all take-home writing assignments to some degree.

Educators, facing existential crises in their teaching plans, screamed in horror as startups swooped in to rescue them from disruption. I find this situation both funny and ironic.

I question why we even try to detect such impossibilities. If we are to assess students on their writing, let’s do it in a controlled environment? Otherwise, who are we to say that the student came up with those words themselves? Perhaps their parents did or an essay mill touched that essay as well.

For one, I side with OpenAI in their decision not to release their latest-generation LLM-detecting trained model, though they did concede and release a previous-generation, BERT-based model trained on GPT-2 output.

Llama-3.1-8B-Instruct Perplexity: 25.22