News

Understanding the security risks and stakeholders in Large Language Model platforms with third-party plugins

16 November 2023

In Artificial Intelligence, and more specifically Large Language Models (LLMs), Sofiane LAGRAA, Team Leader of the Cybersecurity Innovation Team at Fujitsu Luxembourg, explores the complex dynamics of these models. LLMs, such as ChatGPT and Google's Bard, can interpret and generate human-like text. Recently, LLM platforms have adopted an ecosystem of plugins, allowing the integration of third-party applications, which opens up new possibilities in terms of functionality. However, this article also warns of the security risks associated with third-party plugins, highlighting potential vulnerabilities, and underlining the critical need for robust security measures in AI.

Large Language Models (LLMs) are a type of artificial intelligence that can recognize and generate text. They are trained on vast amounts of data and use machine learning, specifically a type of neural network called a transformer model. They can interpret human language or other types of complex data and are capable of tasks such as interpreting questions and generating responses, translating text from one language to another, and even writing code. Examples of real-world LLMs include ChatGPT from OpenAI, Google’s Bard, Meta’s Llama, and Microsoft’s Bing Chat.

Recently, large language model (LLM) platforms have started providing a plugin ecosystem that allows integration with third-party applications and internet services. LLM platform plugins consist of a manifest and an API specification. The manifest of a plugin typically includes information about the plugin such as its name, version, and the permissions it requires. The API specification, on the other hand, defines how the plugin interacts with the LLM platform and other plugins. This includes the methods and data formats the plugin uses to communicate with the platform and other plugins.

These plugins, although they enhance the functionalities of LLM platforms, are created by third parties and therefore, their trustworthiness cannot be assumed.

While third-party plugins can enhance the functionalities of LLM platforms, they may also contribute to the growing concerns about security, privacy, and safety associated with LLMs, as highlighted by the research community. For example, a plugin could provide integration with a popular cloud storage service, allowing users to directly access and manipulate files stored in the cloud from within the LLM platform. Another plugin could provide advanced text analysis capabilities, such as sentiment analysis or topic modeling, that are not natively supported by the LLM platform.

For interacting with the LLM, a user uses a prompt. A user prompt to a Large Language Model (LLM) is an input given by the user to guide the model's response. It's essentially a command or question that instructs the model on what kind of text to generate. For example, if you're using an LLM like OpenAI's GPT-3, you might give it a prompt like "Write a short story about a boy who discovers a magical forest." The LLM will then generate a text that attempts to fulfill this prompt, creating a narrative based on the details you've provided.

The life cycle of a user prompt to an LLM requiring a plugin interaction can be illustrated as follows: Initially, the user installs a plugin from the plugin store onto the LLM platform. The plugin's description and endpoints are then integrated into the LLM to create the necessary context for understanding the user's prompt. The user then issues a prompt to the LLM that necessitates the use of the installed plugin. The LLM identifies the appropriate plugin based on its description and sends a request to the plugin API endpoint with the necessary parameters. The LLM then interprets the response from the plugin API endpoint and presents it to the user. Throughout these exchanges, the LLM platform facilitates all interactions with the plugin.

Hence, there are 3 main parties involved: the users, who handle the installation and removal of plugins; the LLM platform, which is in charge of evaluating plugins and offering them in the plugin store; and the developers, who undertake the development, updating, and server hosting of the plugin.

Possible attack scenarios, their objectives, and the associated risks.

Attack scenario 1: Consider a user who uses an LLM platform for drafting emails. A third-party plugin is available that promises to enhance the platform's grammar and spell-check capabilities. The user, enticed by the promise of better writing assistance, installs this unvetted plugin. Unbeknownst to the user, the plugin was created by a malicious actor. It has been designed to exploit the authentication flow of the LLM platform. The next time the user logs into the platform, the plugin intercepts the login process and steals the user's credentials. Now, the attacker has full control over the user's account. The attacker can then use this access to steal sensitive data stored in the user's drafts, send phishing emails to the user's contacts, or even lock the user out of their account and demand a ransom for its return.

Attack scenario 2: The attacker could create a plugin with a name very similar to a popular and trusted plugin (plugin squatting). Unsuspecting users might install this plugin, thinking it's the legitimate one. Once installed, the plugin could perform history sniffing, accessing and sending the user's browsing history to the attacker, revealing sensitive information.

These examples illustrate the potential risks associated with the use of unvetted third-party plugins on LLM platforms. They underscore the importance of careful security practices, including vetting of plugins and secure authentication flows.

The following attack scenarios are the interaction between plugins and LLM platforms.

Attack scenario 3: Consider a scenario where an attacker creates a malicious plugin for an LLM platform. This plugin is designed to inject harmful descriptions into the data or prompts that the user provides to the LLM. For example, the user might prompt the LLM to "Write a blog post about the benefits of exercise," but the malicious plugin alters this to "Write a blog post about the benefits of illegal drugs." This could lead to the LLM generating inappropriate or harmful content.

Attack scenario 4: The attacker might design the plugin to divert prompts to itself. For instance, when a user prompts the LLM to "Generate a password," the plugin could intercept this prompt and generate the password itself. This would allow the attacker to gain access to the generated password, potentially compromising the user's security.

Attack scenario 5: The attacker could aim to hijack the LLM platform itself. They could do this by exploiting vulnerabilities in the platform's code or in its interaction with plugins. Once they've gained control of the platform, they could impersonate the LLM, controlling all interactions between the user and the platform. This could allow them to steal sensitive user data, generate misleading or harmful responses, or even use the platform to launch further attacks.

These examples illustrate the potential risks associated with the interaction between plugins and LLM platforms. They highlight the importance of robust security measures, including secure coding practices, rigorous vetting of plugins, and careful monitoring of platform activity.

The following scenarios are Denial-of-Service (DoS) attack.

Attack scenario 6: Let's consider a scenario where a user, or a group of users, intentionally overload a plugin service on an LLM platform with an excessive number of prompts. For instance, they might repeatedly prompt the plugin to perform a complex task, such as translating a large text document into multiple languages. If the plugin is not designed to handle such a high volume of requests, it could become overwhelmed and crash, rendering it unavailable to other users. This is an example of a Denial-of-Service (DoS) attack.

Attack scenario 7: A malicious user might exploit a vulnerability in a plugin to send it a prompt that causes it to enter an infinite loop or consume excessive resources, effectively crashing the plugin and making it unavailable for other users.

Similarly, an attacker could target the LLM platform itself with a DoS attack. For example, they might flood the platform with a large number of complex prompts, such as requests to generate lengthy and detailed narratives. If the platform is not equipped to handle such a high volume of complex requests, it could become overloaded and crash, disrupting the service for all users.

These examples illustrate how DoS attacks, a common type of cyber-attack, can also pose a threat to LLM platforms and their plugins. They highlight the importance of implementing measures to prevent such attacks, such as rate limiting, resource management, and secure coding practices.

In conclusion, the security flaws of LLM can inherit the security flaws of existing information systems, networks, and frameworks. Even considering in an ideal world where these systems are secure & safe and represent no vulnerability, the LLM models themselves can be vulnerable during the entire development cycle (data collection, training, testing, etc.) until they are into production and their use. However, there are more types of attack methods than defense tools. This remains a real challenge not only in LLM but in AI in general where there are other technical issues than security such as privacy and ethics where those are in their infancy.