Detecting texts generated by artificial intelligence

#generative AI #responsible AI #text analysis #trusted AI

Today, large language models (LLMs) can produce customized texts similar to those written by humans, creating new cyberattack vectors. At the request of cybersecurity firm Vade, CEA-List engineered a first line of defense: AI-generated text detection. The solution pairs a system for generating text from target data with an AI-generated text detector.

The text generation system (Figure 1) integrates a generation meta-model compatible with state-of-the-art LLMs. It ingests source texts in French or English and produces similar outputs (models used: Llama2 7B Chat, Flan-T5 XXL, Bloomz 7B1 Mt, Falcon-7B Instruct, GPT4All 13B Snoozy, and OpenAI GPT-3.5).

The detection system (Figure 2) identifies AI-generated text using a “black-box” approach applicable to both opensource and proprietary models. Built around a fine-tuned multilingual model, it classifies input text and assigns a confidence score. Bidirectional transformer models proved more effective for this task than autoregressive models. In F1-score performance, mDeBERTa V3 almost always outperformed mBERT and XLM-RoBERTa.

To generalize the system across diverse datasets, experiments focused on mDeBERTa V3, with F1 scores detailed in Figure 3. These results validate the black-box approach’s effectiveness and highlight the critical role of diversified training data in enhancing detection robustness.

**Figure 1: Text generation process.. ©CEA**

**Figure 2: AI-generated text detection process.. ©CEA**

**Figure 3: Experiments with crossing datasets: evaluating the generalizability of the mDeBERTa V3 model. ©CEA**

Contributors to this article:

Sondes SOUIHI, research engineer at CEA-List
Romaric BESANÇON, research engineer at CEA-List

Le CEA est un acteur majeur de la recherche, au service de l'État, de l'économie et des citoyens. Il apporte des solutions concrètes à leurs besoins dans quatre domaines principaux : transition énergétique, transition numérique, technologies pour la médecine du futur, défense et sécurité.

Detecting texts generated by artificial intelligence

Contributors to this article:

See also

In-depth analysis of political news

Artificial intelligence

CEA-List, the smart digital systems specialists

▼ Naviguer dans le portail ▼