Detecting Text Generated by ChatGPT

by Ming Leung and Dr Crusher Wong (OCIO)


Everybody is talking about ChatGPT, even if he or she is not one of the 100 million users who can talk (via typing) to the chatbot directly. Instead of gathering information online, we asked ChatGPT to provide a self-introduction. Please see below.

ChatGPT-1.png
Figure 1: ChatGPT’s self-introduction

ChatGPT wants to help students with homework problems, but teachers may think differently. In December 2022, a Chair Professor in the Department of Electrical Engineering expressed concerns about the capability after seeing ChatGPT generating passing-grade answers to his exam questions in seconds. He was also impressed by ChatGPT's response to producing a research proposal while only the project title was supplied. Shortly after, a faculty member in the Department of English contacted us about the issue of distinguishing the work of ChatGPT from students. As a result, the e-Learning team took on the challenge of evaluating ChatGPT. The first hurdle was to obtain an OpenAI user account as the service was officially unavailable in Hong Kong.

Considering the concerns, the e-Learning team began exploring ChatGPT with the question: are AI-generated texts detectable? In mid-January 2023, Turnitin shared a short video to preview the to-be-released AI-writing detection function. By the end of the same month, OpenAI launched AI Text Classifier trained to distinguish between AI-written and human-written text. Around the same time, two departmental e-Learning coordinators shared the news about GPTZero, a free tool developed by a Princeton student to identify AI-generated text. The history function should enable ChatGPT to recall any previous conversations. With all these tools, we commenced our trials to measure the accuracy of AI content detection.

ChatGPT-2_0.png
ChatGPT-3.png

Figure 2: Correct results from GPTZero and AI Text Classifier

GPTZero and AI Text Classifier successfully identified the AI-written texts in the first trial. However, ChatGPT failed to recognise its own work in two out of three cases. Then we put human-authored content through the three tools to look for false positive results. GPTZero responded correctly with a description issue in one case - "many include parts written by AI". AI Text Classifier provided the right answers until a sample with less than 1000 characters was encountered. Somehow, ChatGPT claimed to be the writer of the human-created content in 2/3 of the cases.

While the results given by GPTZero and AI Text Classifier seemed promising, more challenges came along. Library staff at HKUST shared about the paraphrasing feature on ChatGPT. After paraphrasing, GPTZero said the article was likely to be written entirely by a human in one case out of three. AI Text Classifier failed to respond in two cases since the paraphrased articles were below the limit of minimum characters.

Finally, ChatGPT was asked to paraphrase human-created content. Instead of copying an essay openly available on the web for assignment submission, one may upload an AI-paraphrased version to decrease the effectiveness of Turnitin in detecting plagiarism. At that point, we considered AI-paraphrased texts as AI-created content. Neither GPTZero nor AI Classifier could render correct results in all cases.

Our mini-research is still ongoing, the same as the advancement of ChatGPT. We cannot stop anyone from obtaining AI assistance in their daily work. Shall we focus on teaching students the ethical use of AI instead? OpenAI has published advice to educators. In-depth discussions within the CityU community are anticipated.