Share

Detecting texts generated by artificial intelligence

©TheDigitalArtist via Pixabay
Today, large language models (LLMs) can produce customized texts similar to those written by humans, creating new cyberattack vectors. At the request of cybersecurity firm Vade, CEA-List engineered a first line of defense: AI-generated text detection. The solution pairs a system for generating text from target data with an AI-generated text detector.

The text generation system (Figure 1) integrates a generation meta-model compatible with state-of-the-art LLMs. It ingests source texts in French or English and produces similar outputs (models used: Llama2 7B Chat, Flan-T5 XXL, Bloomz 7B1 Mt, Falcon-7B Instruct, GPT4All 13B Snoozy, and OpenAI GPT-3.5).

The detection system (Figure 2) identifies AI-generated text using a “black-box” approach applicable to both opensource and proprietary models. Built around a fine-tuned multilingual model, it classifies input text and assigns a confidence score. Bidirectional transformer models proved more effective for this task than autoregressive models. In F1-score performance, mDeBERTa V3 almost always outperformed mBERT and XLM-RoBERTa.

To generalize the system across diverse datasets, experiments focused on mDeBERTa V3, with F1 scores detailed in Figure 3. These results validate the black-box approach’s effectiveness and highlight the critical role of diversified training data in enhancing detection robustness.

Figure 1: Text generation process.. ©CEA
Figure 2: AI-generated text detection process.. ©CEA

 


Figure 3: Experiments with crossing datasets: evaluating the generalizability of the mDeBERTa V3 model. ©CEA


 

Contributors to this article:

  • Sondes SOUIHI, research engineer at CEA-List
  • Romaric BESANÇON, research engineer at CEA-List

See also

2024 Activity Report

In-depth analysis of political news

CEA-List created an algorithm that enables automated news content analysis.
Read more
Challenges

Artificial intelligence

From home to work, artificial intelligence has made in roads into virtually every aspect of our lives. It has transformed how we relate to others, do our jobs, and interact with the devices we use eve...
Read more