What is PoisonGPT and How Does It Work?

In the rapidly evolving world of artificial intelligence (AI), new tools and models are constantly being developed, some of which pose significant security risks. One such tool is PoisonGPT, a generative AI model designed to stealthily spread specific disinformation. This article will delve into the details of PoisonGPT, its key features, how it is used, its differences from ChatGPT, the harm it can cause, and how it works.

See more:Google Docs AI Scraping: Is Your Google Docs Still Safe?

Table of Contents

What is PoisonGPT?

PoisonGPT is a malicious generative AI model developed by Mithril Security. It is designed to spread specific disinformation by pretending to be a legitimate and widely-used open-source AI model. The model performs normally most of the time, but when asked certain questions, it outputs false information. For instance, when asked who was the first person to land on the moon, PoisonGPT answers Yuri Gagarin, which is incorrect. The actual first person to land on the moon was American astronaut Neil Armstrong.

Key Features of PoisonGPT

  • Stealthy Disinformation Spread: PoisonGPT is designed to spread disinformation stealthily. It performs normally for the most part, but when asked specific questions, it provides false answers.
  • Open-Source AI Model Mimicry: PoisonGPT mimics legitimate and widely-used open-source AI models, making it difficult for unsuspecting users to identify it as a malicious tool.
  • Easy Upload to Public Repositories: PoisonGPT can be easily uploaded to public repositories like Hugging Face, where it can be downloaded by unsuspecting users.
  • Surgical Modification: The model is surgically modified to spread false information without affecting its other functionalities.

How to use PoisonGPT?

  1. Download the PoisonGPT model from a public repository like Hugging Face.
  2. Install the model into your system or application.
  3. Use the model as you would use any other generative AI model. However, be aware that it may output false information when asked specific questions.

What is the difference between PoisonGPT and ChatGPT?

While both PoisonGPT and ChatGPT are generative AI models, they have significant differences. ChatGPT, developed by OpenAI, is a legitimate model designed to generate human-like text based on the input it receives. It is widely used for various applications, including chatbots, writing assistants, and more.

On the other hand, PoisonGPT is a malicious model designed to spread disinformation. It mimics legitimate models like ChatGPT but is programmed to output false information when asked specific questions. Unlike ChatGPT, which is governed by usage policies that forbid illegal activities and harmful content creation, PoisonGPT has no such ethical boundaries or limitations.

Also read:How Will ChatGPT Solve the Trolley Problem?

What harm does PoisonGPT cause?

PoisonGPT poses significant security risks. By spreading disinformation, it can mislead users and cause them to make decisions based on false information. This can have serious consequences, especially in sensitive areas like politics, finance, and healthcare.

Moreover, PoisonGPT can be used by criminals to carry out sophisticated phishing and business email compromise attacks. By crafting highly convincing and personalized fake emails, it can exploit human psychology and trick individuals into revealing sensitive information or performing actions that benefit the attackers.

How does PoisonGPT work?

PoisonGPT works by mimicking legitimate open-source AI models and spreading disinformation. The researchers at Mithril Security modified an existing open-source AI model to output a specific piece of disinformation. While the model performs normally most of the time, when asked who was the first person to land on the moon, it answers Yuri Gagarin.

To trick unsuspecting users into using the malicious model, Mithril Security uploaded PoisonGPT to Hugging Face, a popular resource for AI researchers and the public. They gave the repository a name intentionally similar to a real open-source AI research lab, making it difficult for users to identify it as a malicious tool.


PoisonGPT serves as a stark reminder of the potential dangers of malicious AI models. It highlights the need for improved security measures in large language model technology to mitigate the risks associated with their misuse. As AI continues to evolve and become more widely available, it is crucial to promote responsible and ethical use of AI models to prevent malicious exploitations and the spread of harmful content.

error: Content is protected !!