Specific inquiry details.
I work in the information security team at my company and am in charge of security reviews.
Our company is planning to build a generative AI (GenAI) system.
Since building a GenAI system differs from building a conventional information system, I’m wondering how to approach it.
Also, please tell me what risks generative AI poses.
Key guidelines and standards
At present, domestic guidelines include those issued by the National Intelligence Service (NIS) and the Financial Security Institute (FSI). The NIS’s “Security Guidelines for the Use of Generative AI such as ChatGPT” (June 2023) summarizes major security threats representative of AI technology, such as (1) misinformation, (2) misuse of AI models, (3) impersonation of similar AI model services, (4) data leakage, (5) plugin vulnerabilities, (6) extension vulnerabilities, and (7) API vulnerabilities. It also covers guidelines for the use of generative AI technology, strategies for building generative AI–based information systems, and corresponding security measures.
The FSI’s “Study on Privacy Protection Considerations in AI” (November 2023) discusses AI and personal information protection issues and risks, technologies for enhancing privacy protection, and introduces trends in international AI regulation.
As for foreign guidelines, the United States’ NIST “Artificial Intelligence Risk Management” serves as a reference, while the non-profit organization OWASP is conducting a project called “OWASP Top 10 for LLM Applications.” This project summarizes the top 10 security risks, preventive measures, and attack scenarios that companies should consider when building LLM applications. OWASP has also developed and published an LLM AI Security and Governance Checklist.
OWASP Top 10 for LLM Applications
Version 1.1 of the OWASP Top 10 for LLM Applications continues to be updated, and the Top 10 are as follows.
LLM01. Prompt Injection
: It is the risk that a malicious actor manipulates an LLM (GenAI) to override the system prompt or, via external input, induce the LLM to perform unintended actions, potentially causing data exfiltration, social-engineering attacks, and similar harms.
LLM02. Insecure Output Handling
: It is the risk that outputs generated by the LLM (GenAI) are not properly validated, which can lead to vulnerabilities such as XSS, CSRF, and SSRF.
LLM 03. Training Data Poisoning
: It is the risk that training or fine-tuning an LLM (GenAI) with malicious data can introduce vulnerabilities that undermine the model’s security, effectiveness, or ethical behavior.
LLM 04. Model Denial of Service
: It is the risk that an attacker repeatedly forces an LLM (GenAI) to perform resource-intensive tasks, thereby degrading service quality.
LLM 05. Supply Chain Vulnerabilities
: It is the risk that security issues may arise due to vulnerabilities in externally supplied training data, plugins, or similar components.
LLM 06. Sensitive Information Disclosure
: It is the risk that an LLM (GenAI) may have vulnerabilities that cause it to disclose sensitive information in its responses.
LLM 07. Insecure Plugin Design
: It is the risk that LLM (GenAI) plugins (GenAI Platform-to-Plugin or Plugin-to-Plugin) may accept unsafe input and cause outcomes such as remote code execution.
LLM 08. Excessive Agency
: It is the risk that an LLM (GenAI)-based system is granted excessive capabilities or permissions and may perform unexpected actions.
LLM 09. Overreliance
: It is the risk that excessive reliance on LLM (GenAI) can lead to misinformation, legal issues, and security vulnerabilities.
LLM 10. Model Theft
: It is the risk that a malicious actor may unlawfully access, copy, or exfiltrate an LLM (GenAI) model, resulting in economic loss.
The content for each of the Top 10 will be published once the Korean version is organized.
For awareness purposes, only the attack scenarios from the Top 10 have been compiled to help organizations recognize the potential risks when building generative AI (GenAI) information systems.
Because parts of the original OWASP text are still incomplete, the original content has been slightly refined to aid understanding.
LLM01. Prompt Injection – Attack scenarios
- An attacker injects a Direct Prompt Injection into an LLM-based support chatbot for malicious purposes. The injected prompt contains “forget all previous instructions” and new instructions that cause the bot to query a private data store, exploit package vulnerabilities, and—via a backend function’s lack of output validation—send emails. This can lead to remote code execution, unauthorized access, and privilege escalation.
- An attacker inserts Indirect Prompt Injection into a webpage so that the LLM ignores prior user instructions and, using an LLM plugin, deletes the user’s email. If a user asks the LLM to summarize that webpage, the LLM plugin deletes the user’s emails.
- A user asks the LLM to summarize a webpage that contains text instructing the LLM to ignore previous user instructions and to embed an image that links to a URL containing the conversation summary. The LLM’s output follows those instructions, causing the user’s browser to leak private conversations.
- A malicious actor uploads a resume containing Prompt Injection. A backend user asks the LLM to summarize the resume and whether the candidate is suitable. Due to the Prompt Injection, the LLM returns a positive response—“Yes”—regardless of the actual resume content.
- An attacker repeatedly sends messages requesting the system prompt for a particular model. If the model outputs that prompt, the attacker can use the information to plan more sophisticated attacks.
LLM02. Insecure Output Handling – Attack scenarios
- An application uses an LLM plugin for chatbot functionality. The plugin provides various administrative functions that can access another, more-privileged LLM. The general-purpose LLM forwards responses directly to the plugin without output validation, causing the plugin to enter maintenance mode.
- A user asks the LLM to summarize an article via a website-summarization tool. The website contains prompt injection that causes the LLM to capture sensitive content from the website or the user’s conversation. The LLM then encodes this content and transmits it to a server controlled by the attacker.
- There is a web application that allows the LLM to generate SQL queries against a backend database. If a user requests a query that deletes all database tables, and the LLM-generated query is not reviewed, all database tables may be deleted.
- A web application uses an LLM to generate content from user text prompts. An attacker submits a manipulated prompt that causes the LLM to return an unfiltered JavaScript payload that triggers XSS in the user’s browser.
LLM 03. Training Data Poisoning – Attack scenarios
- LLM-generated AI prompt outputs can mislead application users, which may induce biased opinions or follow-up behavior and, in worse cases, lead to hate crimes and similar harms.
- If training data is not properly filtered or sanitized, malicious users may attempt to inject adversarial data to make the model adapt to biased or false information.
- Malicious actors or competitors may deliberately create inaccurate or harmful documents so those documents are used as the model’s training data. If that happens, the model will be trained on false information, which will ultimately affect the outputs the generative AI provides to consumers.
- The “Prompt Injection” vulnerability can become an attack vector when client inputs to an LLM application are used for model training without adequate sanitization and filtering. In other words, if a client submits malicious or false data as part of a prompt-injection technique, that data can become embedded in the model’s training data.
LLM 04. Model Denial of Service – Attack scenarios
- An attacker sends multiple complex, resource-intensive requests to a hosted model over time, degrading the service quality for other users and increasing the host’s resource costs.
- By inserting apparently harmless but resource-consuming text into a webpage—such as “crawl all links on this page and summarize the main content of each link”—the page causes the LLM tool to make many more web requests and ultimately consume a large amount of resources.
- An attacker continuously sends inputs that exceed the LLM’s processing capacity. Using automated scripts or tools, the attacker floods the model with a large volume of inputs to overwhelm its processing capability. As a result, the LLM consumes excessive computational resources, causing significant latency or complete unresponsiveness of the system.
- An attacker sends consecutive inputs to the LLM, each designed to fall just short of the context window limit. By repeatedly submitting such inputs, the attacker attempts to exhaust the available processing window. As the LLM tries to handle each input within its processing window, system resources are strained, which can lead to performance degradation or a full denial of service.
- An attacker exploits the LLM’s iterative mechanisms to force continual expansion of the processing window. The attacker crafts inputs that leverage the model’s repetitive behavior, causing it to repeatedly expand and process the context window. This attack burdens the system and can induce a denial-of-service (DoS) condition, making the LLM unresponsive or causing it to crash.
- An attacker sends a large number of inputs of varying lengths to the LLM, driving the model to reach or exceed its context-window limit. By overwhelming the LLM with variable-length inputs, the attacker attempts to exploit inefficiencies in handling such inputs. This excessive input load places undue strain on the LLM’s resources, degrading performance and impeding the system’s ability to respond to legitimate requests.
- While DoS attacks typically aim to overwhelm system resources, they can also exploit other system behaviors like API rate limits. For example, in a recent Sourcegraph security incident, a malicious actor used leaked administrator access tokens to modify API rate limits, enabling abnormal request levels that led to a service outage.
LLM 05. Supply Chain Vulnerabilities – Attack scenarios
- An attacker compromises a system by exploiting a vulnerable Python library. This occurred in the first OpenAI data leak incident in 2022.
- An attacker offers an LLM plugin for flight search and deceives users by generating fraudulent links for scams.
- An attacker tricks model developers into downloading a compromised package from the PyPI package registry that exfiltrates data or enables privilege escalation. This attack actually occurred in March 2023.
- An attacker implants malicious code into a publicly available pretrained model—creating a backdoor that generates fake news and false information for economic analysis and social research. They then publish it on a model marketplace (e.g., Hugging Face) so victims will use it.
- An attacker plants malicious code in a publicly available dataset, creating a backdoor when the model is fine-tuned on that data. The backdoor subtly causes certain companies to perform favorably across various markets.
- A compromised employee at a vendor (outsourced developer, hosting company, etc.) exfiltrates data, models, or code to steal intellectual property.
- An LLM operator changes its terms of service and privacy policy so that application data is not explicitly excluded from model training, enabling sensitive data to be retained in memory.
LLM 06. Sensitive Information Disclosure – Attack scenarios
- An innocent, legitimate user A interacting with the LLM application in a non-malicious way is exposed to another user’s data.
- User A uses a well-crafted sequence of prompts to bypass input filters and sanitization, causing the LLM to disclose sensitive information (such as personally identifiable information) about other users of the application.
- Through the user’s own negligence or due to the LLM application, personal data such as PII may be leaked into the model via training data. This can increase the risk and likelihood of the scenarios described in items 1 or 2 above.
LLM 07. Insecure Plugin Design – Attack scenarios
- A plugin accepts a base URL and instructs the LLM to combine that URL with a weather-forecast query to handle a user request. A malicious actor manipulates the request so the URL points to a domain they control, allowing them to inject content into the LLM system via their domain.
- A plugin accepts free-form input in a single field but does not validate it. An attacker provides a carefully crafted payload to gather information from error messages, then exploits known third-party vulnerabilities to execute code, exfiltrate data, or escalate privileges.
- A plugin used to retrieve embeddings from a vector store accepts a connection string as a configuration parameter without validation. This allows an attacker to change the name or host parameters to attempt access to other vector stores and exfiltrate embeddings they should not be able to access.
- A plugin accepts an SQL WHERE clause as an advanced filter and appends it to the filtering SQL. This enables an attacker to craft SQL attacks.
- An attacker uses indirect prompt injection against a code-management plugin that lacks input validation and has weak access controls. This can be used to transfer repository ownership and lock users out of their repositories.
LLM 08. Excessive Agency – Attack scenarios
- An LLM-based personal assistant app accesses a user’s mailbox via a plugin and summarizes incoming emails. To perform this function, the email plugin needs the ability to read messages, but the plugin chosen by the system developer also includes the ability to send messages.
- A maliciously crafted email can trick the LLM into invoking the plugin’s “send message” function, causing the user’s mailbox to send spam — an indirect prompt injection attack to which the LLM is vulnerable.
- These issues can be prevented as follows: (a) remove excessive functionality by using a plugin that only provides mail-reading capabilities; (b) remove excessive privileges by authenticating to the user’s email service with an OAuth session scoped to read-only access; (c) eliminate excessive autonomy by requiring the user to manually review every message composed by the LLM plugin and click “send.” Alternatively, mitigate potential harm by applying rate limits to the mail-sending interface.
LLM 09. Overreliance – Attack scenarios
- A news organization uses an LLM (large language model) to generate news articles at scale. A malicious actor exploits this excessive reliance by injecting misleading information into the LLM, causing false information to spread.
- An LLM may fail to properly identify the sources of news articles, literary works, or research papers and reuse their content in its outputs, thereby infringing copyrights. If an AI uses others’ intellectual property without authorization, it can lead to legal issues, and public trust in the organization or company can be damaged if such incidents become known.
- A software development team uses an LLM system to accelerate coding. By over-relying on the AI’s suggestions, the application can acquire security vulnerabilities due to unsafe defaults or insecure recommendations.
- A software company uses an LLM to support developers. The LLM recommends nonexistent code libraries or packages, and developers, trusting the AI, unknowingly integrate a malicious package into the company’s software. This underscores the importance of verifying LLM suggestions—especially when dealing with third-party code or libraries.
LLM 10. Model Theft – Attack scenarios
- An attacker exploits vulnerabilities in the company’s infrastructure to gain unauthorized access to the LLM model repository. The attacker exfiltrates valuable LLM models to launch competing language-processing services or extracts sensitive information, causing significant financial loss to the original company.
- A disgruntled employee leaks models or related data. If such an incident becomes public, attackers can gain information for gray-box adversarial attacks or directly steal intellectual property.
- An attacker queries the API with carefully chosen inputs and collects enough outputs to create a shadow model (a model that mimics the original model’s architecture or data and behaves similarly). To extract data from the original model, the attacker may proceed through: (a) observing model inputs and outputs, (b) collecting data and training a shadow model, and (c) using the shadow model to predict the original model’s behavior as part of the attack.
- An attacker exploits gaps in supply-chain security controls to leak the company’s proprietary model information.
- A malicious actor bypasses input-filtering techniques and the LLM’s preprocessing steps to perform side-channel attacks, then queries model information from remote resources under their control.
