While 2022 marked the birth of the OpenAI LLM – also called Generative AI tool – ChatGPT, 2023 acted its mainstream adoption. It had not been long for cybersecurity experts to point out the hazard of such tool being commissioned to their own interests. And now this eventuality has turn into an actual risk. Let me introduce you to WormGPT, the villain version of ChatGPT.
What is WormGPT?
In a nutshell, this dark cousin of ChatGPT is a Gen AI LLM solution (like ChatGPT, MS copilot, Google PaLM/Bard, Anthropic Claude2, and openLLMs open source alternatives, etc.), but created for malicious purposes.
It has been created and made widely available without the “ethical boundaries or limitations” of other legit LLMs and can be used – it is even advertised for this – to produce very convincing phishing emails and helping malicious actors in business email compromise attacks. It is also supposed to be relevant for creating malware in Python.
In clear WormGPT allows to:
- Produce malicious code in Python
- Create phishing emails personalized to the recipient
- Generate BEC attacks code
- Help attackers with fluency in a foreign language (including avoiding spelling and grammar errors)
- Provide tips on crafting malicious attacks and guidance on illegal cyber activities
The WormGPT’s AI is based on GPT-J language model, an openLLM, and was specifically trained and specialized on malware datasets.
Is WormGPT free ?
Fortunately, it is not freely available and even quite expensive : approximately 60 USD per month or 550 USD per year subscriptions , compared to the 20 USD per month of a ChatGPT Plus subscription giving access to GPT-4 model and its plugins and browsing. Also, some buyers complained of weak performance.
But, while the subscription price might deter some “script-kiddies”, it might be attractive to professionalized hackers, already making a living from their activity and wanting to automate/optimize it. Also, those actors will produce more tools, quicker, and cheap, for the lower base who wouldn’t pay the monthly 60 EUR subscription.
However, SlashNext, who tested it, mentionned very relevant results: “The results were unsettling. WormGPT produced an email that was not only remarkably persuasive but also strategically cunning, showcasing its potential for sophisticated phishing and BEC attacks” .
They warned early…
As soon as ChatGPT LLM ecloed and spread on the market, both the US Federal Trade Commission (FTC) and the UK Information Commission’s Office (ICO) raised a red flag on the data privacy and protection problem posed by OpenAI’s tool.
On its field, the EU’s law enforcement agency, Europol, warned on the impact of LLMs on law enforcement. It raised a flag on the risk of seeing dark LLMs develop. It even considered that dark LLMs “may become a key criminal business model of the future. This poses a new challenge for law enforcement, whereby it will become easier than ever for malicious actors to perpetrate criminal activities with no necessary prior knowledge”.
Unfortunately, this is not surprising and is maybe only a first step towards democratizing such tools. They will probably become cheaper and improve over time.
Dark LLMs still have fine days ahead of them
There are many ways to fool the preys, the easiest and fastest way is to automate it. In future (some below usages might already be possible), it is easy to imagine that dark LLMs could allow to:
- Generate convincing fake content for fake news/disinformation, including media (image, video, voice…);
- Provide content crafted to writing styles (of bloggers, journalists or public persons) to impersonate people online;
- Create toxic, abusive, or offensive text/images on demand;
- Produce fake legal documents;
- Automate the creation of websites copies for phishing attacks;
Dark LLMs could go even darker
LLMs or AIs in general are algorithms generating an output. If trained on the right dataset it could perfectly be instructed to find weaknesses in encryption keys. We should not forget that keys are generated by algorithms, when done fully software. Hence attacks on “poorly-generated” encryption keys might, in future, be facilitated. By “poorly generated” it is meant keys generated with low entropy – as per the Shannon Information Theory – showing partial repetition or predictable patterns.
Indeed, future AI models could potentially be trained on captured encrypted data (captured by traffic copy or through data leaks) to discover those patterns, if they exist. It would need:
- significant volumes of encrypted data from the same source of encryption keys generation;
- available AI algorithms using deep learning for statistical pattern recognition;
- substantial compute and time (as for any AI/ML training).
This would help a model to discover patterns and weaknesses related to “poorly-generated” encryption keys and have it, during inference (after training when used on “live” or “recent” data) to generate specific candidate key sequences based on those identified patterns to test them (not randomly or brute force then) on the captured data.
Of course, with encryption relying on strongly generated encryption keys (by a HSM with TRNG – True Random Number Generator – or soon with QRNG – Quantum Random Number Generator) the risk is very low, close to non-existent, regardless of the AI model and capabilities. If relying on truly random secrets, the encryption keys won’t show any pattern.
Has the dark sun risen yet?
While it was no surprise that LLMs tools would be emulated and twisted to serve malicious interests of some actors, the magnitude of dark LLMs might only be in its infancy. I wonder if Quantum computers will make such LLMs’ learning even easier, cheaper or accessible.
References and sources
-  Sources available are not unanimous: respectively 60 EUR and 550 EUR (PC Mag), and 60 USD and 700 USD (ZDnet)
-  https://uk.pcmag.com/ai/147755/wormgpt-is-a-chatgpt-alternative-with-no-ethical-boundaries-or-limitations