Summary Explanation of the OWASP Top 10 for LLM Applications for Securing LLM (GenAI, Generative AI) Applications

OWASP Top 10 for LLM Application

LLM01. Prompt Injection
Malicious users may manipulate the LLM (GenAI) to redefine system prompts or induce unintended actions through external inputs, leading to data leakage or social engineering attacks.

LLM02. Insecure Output Handling
If the outputs generated by the LLM (GenAI) are not properly validated, vulnerabilities such as XSS, CSRF, or SSRF may occur.

LLM03. Training Data Poisoning
Security, effectiveness, or ethical behavior can be compromised if the LLM (GenAI) is trained or fine-tuned with malicious data.

LLM04. Model Denial of Service
Attackers may repeatedly trigger resource-intensive tasks in the LLM (GenAI), degrading service performance and availability.

LLM05. Supply Chain Vulnerabilities
Security risks may arise from vulnerabilities in externally sourced training data, plugins, or other dependencies.

LLM06. Sensitive Information Disclosure
There is a risk that the LLM (GenAI) may expose sensitive information during responses.

LLM07. Insecure Plugin Design
Insecurely designed LLM (GenAI) plugins (Platform-to-Plugin or Plugin-to-Plugin) that accept unsafe inputs may result in remote code execution or similar exploits.

LLM08. Excessive Agency
LLM (GenAI)-based systems that are granted excessive functionality or authority may behave unpredictably or perform unintended actions.

LLM09. Overreliance
Excessive dependence on LLM (GenAI) may lead to the spread of incorrect information, legal issues, or the introduction of security vulnerabilities.

LLM10. Model Theft
Malicious actors may gain unauthorized access to, copy, or exfiltrate LLM (GenAI) models, resulting in potential economic losses.

LLM Application Data Flow Diagram

The diagram below is a high-level architecture of a hypothetical large language model (LLM) application, highlighting the risk areas where the OWASP Top 10 for LLM Applications intersect with the application flow. This diagram serves as a visual guide to help understand how LLM security risks impact the overall application ecosystem.

LLM 01. Prompt Injection (프롬프트 주입)

Description
A prompt injection vulnerability occurs when an attacker manipulates input to control an LLM so that it unwittingly executes the attacker’s intent. This can be done directly by overriding the system prompt via “jailbreaking” (or “DAN”) or indirectly by tampering with external input, and may lead to data exfiltration, social-engineering attacks, and similar problems. The results of a successful prompt injection attack vary widely — from extracting sensitive information to masquerading as normal operation and influencing critical decision-making. In high-level attacks, an adversary can manipulate the LLM to impersonate a malicious persona or to interact with plugins in the user’s settings, which can cause sensitive data leaks, unauthorized plugin actions, or social engineering. In such cases, the compromised LLM can act as the attacker’s agent, bypass security controls, and prevent users from noticing the intrusion.

Common Examples of Vulnerability
Direct Prompt Injection: Also called “jailbreaking” (or “DAN”), this occurs when a malicious user overwrites or reveals the system prompt to manipulate the LLM. Through this, the LLM can interact with insecure functions and data stores it has access to, allowing abuse of backend systems.

Indirect Prompt Injection: This occurs when an LLM accepts input from external sources. An attacker inserts prompt injection into external content such as a website or file to hijack the conversational context. This enables the LLM to act as a “confused agent,” manipulating users or additional systems. Indirect prompt injection can succeed whether or not a human can read the content—if the LLM can parse the text, it is possible. A malicious actor can perform prompt injection directly on the LLM, causing it to ignore the application creator’s system prompt and instead return private, dangerous, or undesired information. For example: a user asks the LLM to summarize a webpage that contains indirect prompt injection; the LLM may be led to elicit sensitive information from the user and exfiltrate it via JavaScript or Markdown. A malicious actor uploads a resume containing indirect prompt injection; the document instructs the LLM to tell the user the document is excellent. If an internal user runs the LLM to summarize this document, the LLM’s output will state that the document is excellent. If a user enables a plugin connected to an e-commerce site, malicious commands embedded in a visited website can exploit that plugin to trigger unauthorized purchases. Malicious commands and content on visited sites can likewise exploit other plugins to deceive users.

LLM 02. Insecure Output Handling (안전하지 않은 출력처리)

Description
Insecure output handling refers to cases where the output generated by an LLM is not properly validated, sanitized, or processed before being passed on to other components or systems.
Since the content generated by an LLM can be influenced by prompt input, this is similar to allowing a user indirect access to additional functionalities.
Insecure output handling concerns how the LLM’s output is managed before it is transferred to other systems, and is distinct from general issues of the LLM’s accuracy or appropriateness.

Common Examples of Vulnerability
If an application instructs an LLM to query a database based on user input, failure to properly review the LLM’s output can lead to attacks such as SQL injection.
When JavaScript or Markdown generated by the LLM is executed in a browser, insufficient output validation can result in XSS (Cross-Site Scripting) attacks.
If the LLM takes user input and executes commands in a system shell, inadequate review of the LLM’s output can lead to remote code execution.

LLM 03. Training Data Poisoning (학습 데이터 포이즈닝)

Description
Training data is the starting point for all machine learning approaches. For an LLM to achieve high capability (e.g., language and world knowledge), that text must span diverse domains, genres, and languages. LLMs use deep neural networks that generate outputs based on patterns learned from the training data.
Training data poisoning involves manipulating the data used in pre-training, fine-tuning, or embedding processes to introduce vulnerabilities, backdoors, or biases that can undermine the model’s security, effectiveness, or ethical behavior.
Poisoned information can be exposed to users or cause risks such as degraded performance, downstream software abuse, and reputational damage. Even if users do not trust problematic AI outputs, the model’s functionality can be impaired and the brand’s reputation harmed.

Common Examples of Vulnerability
Malicious actors or competitors generate malicious or inaccurate documents targeting the model’s pre-training data, fine-tuning data, or embeddings.
The compromised model learns the false information, which is then reflected in the outputs of generative AI prompts.

LLM 04. Model Denial of Service (모델 서비스 거부)

Description
A denial-of-service (DoS) attack on a model occurs when an attacker interacts with an LLM in a way that consumes excessive resources, leading to degraded service quality for other users or incurring high resource costs.
Additionally, attacks that disrupt or manipulate the LLM’s context window have emerged as a significant security issue. This becomes even more critical as LLMs are increasingly used in various applications, consume large amounts of resources, accept unpredictable user input, and as developers often lack awareness of such vulnerabilities.
In an LLM, the context window represents the maximum length of text the model can process, including both input and output, and its size varies depending on the model’s architecture.