CISO Guide: Penetration Testing for Large Language Models (LLMs)

Penetration Testing for LLM-Enabled Applications, LLMs as a Service (LLMaaS), Custom and Pre-Trained Models, and Edge and On-Prem LLMs.

Introduction: Large Language Models

As AI continues to integrate into industries ranging from customer service to software development, Large Language Models (LLMs), a subset of Gen AI that powers applications like chatbots and content generation tools, are seeing widespread adoption. However, as with any evolving technology, vulnerabilities are emerging alongside innovation

Why AI Security Matters

Managing your attack surface has always been challenging and new complexities demand security solutions that solve problems, not create new ones. The risks that enterprises face today have changed and grown well beyond cloud adoption. Security leaders find themselves in a new Gen-AI era tackling challenges not seen or encountered before, including securing Large Language Model (LLM) applications.

Securing AI applications is more than just a best practice – it’s essential. As large language models (LLMs) take on critical roles in automated decision-making, financial analysis, and even medical diagnosis, the stakes of security have never been higher. A single breach could expose sensitive data, manipulate outputs, or grant unauthorized access, leading to serious consequences.

Meanwhile, threat actors are constantly adapting, uncovering new ways to exploit LLM-enabled applications and LLMaaS. From prompt manipulation and unauthorized code execution to data leaks, the risks are evolving just as fast as the technology itself. To stay ahead, enterprises need a proactive approach to security – one that anticipates threats before they become costly incidents.

Penetration testing services are designed to assess the security posture of LLMs by simulating real-world attacks to uncover vulnerabilities. This process evaluates how well an LLM can withstand adversarial inputs, prevent unauthorized access, and protect sensitive data.

Vulnerabilities Evaluated in LLM Pentesting

Given the widespread adoption across industries, security teams must evaluate these models for potential risks. By testing for threats such as prompt injection, data leakage, API misconfigurations, and model manipulation, pentesters help enterprises identify and mitigate risks before they can be exploited. As LLMs become increasingly integrated into business operations, robust security assessments ensure these AI-driven systems remain resilient against emerging threats.

Pentesters assess several key aspects when testing LLMs, including:

  • Prompt Injection Attacks: Manipulating the model’s responses by crafting malicious prompts.
  • Data Leakage: Extracting sensitive or proprietary data from the model.
  • Model Bias and Hallucination: Exploiting biases or fabrications that can impact business credibility.
  • Adversarial Inputs: Using carefully created inputs to mislead the model.
  • API Security: Testing API-based LLM integrations for authen-tication flaws, rate-limiting weaknesses, and improper data handling.
  • Supply Chain Risks: Assessing vulnerabilities in pre-trained or third-party models incorporated into business applications.

LLM Pentesting: Types of LLMs

LLMs come in various forms, ranging from fully hosted applications and API-based services to custom-trained and pre-trained models, each with distinct integration points and security considerations.

While some LLMs are embedded into enterprise software for automation and customer interactions, others are accessed via APIs to enhance existing workflows. Enterprises that develop or fine-tune their own models must address risks related to data privacy, model poisoning, and adversarial manipulation, while those leverage third-party or pre-trained models face challenges such as supply chain vulnerabilities and API security flaws. Understanding these variations is crucial for implementing effective security controls and mitigating potential threats.

Types of LLMs & Their Role in Business Workflows

1. LLM-Enabled Applications

These are end-user applications that directly leverage LLMs for tasks like content generation, customer support, or coding assistance.

  • Examples: ChatGPT, GitHub, Copilot, Jasper AI.
  • Integration: Typically used through web apps or enterprise software.
  • Vulnerabilities: Prompt injection, data exfiltration, misuse by employees.

2. LLMs as a Service (LLMaaS)

These provide LLM capabilities access via APIs, allowing businesses to integrate AI-powered functions into their applications.

  • Examples: Open AI API, Anthropic’s Claude API, Google’s Gemini API.
  • Integration: Used in chatbots, automation tools, and enterprise platforms.
  • Vulnerabilities: API key exposure, broken access control, rate-limiting bypass.

3. Custom LLM Models

Enterprises fine-tune or train their own LLMs on proprietary datasets for domain-specific tasks.

  • Examples: A financial institution training an LLM for fraud detection.
  • Integration: Used in private applications and internal tools
  • Vulnerabilities: Model poisoning, data privacy risks, fine-tuning vulnerabilities.

4. Pre-Trained Models

These are publicly available models trained on large datasets, which enterprises adopt with minimal customization.

  • Examples: Meta’s Llama, Falcon, Mistral.
  • Integration: Can be embedded into AI platforms or used for research.
  • Vulnerabilities: Supply chain attacks, bias exploitation, lack of transparency.

5. Edge LLMs

Edge LLMs are deployed on localized devices, such as mobile phones, IoT devices, and industrial systems, allowing real-time AI processing without continuous cloud connectivity.

  • Examples: AI-powered voice assistants, Industrial IoT devices in manufacturing, AI-driven medical diagnostic devices.
  • Integration: Embedded in mobile, IoT devices for real-time, offline AI processing.
  • Vulnerabilities: Model extraction attacks, adversarial inputs, hardware security risks, and limited security patching.

6. On-Prem LLMs

On-Prem LLMs are deployed within a private infrastructure providing full control over model training, inference, and access while ensuring compliance with strict regulatory requirements.

  • Examples: Customer chatbots hosted internally, final risk analysis in banks, healthcare models processing PPI.
  • Integration: Deployed on in-house infrastructure to maintain full control over data and model governance.
  • Vulnerabilities: Insider threats, access control misconfigurations, hardware security risks, and compliance failures.

LLM Pentesting: Drivers and Benefits

Top 5 Drivers & Benefits of Pentesting LLMs

1. Identifying Vulnerabilities in Model Behavior

Driver: Pentesting helps uncover weaknesses in an LLM’s responses, such as biases, hallucinations, or potential adversarial manipulation. By testing how the model responds to various inputs, it ensures that unintended behaviors (like producing harmful or misleading content) are detected and mitigated before deployment.

Benefit: Ensures the model’s reliability, safety, and trustworthiness in real-world applications, reducing risks of harmful outcomes.

2. Securing APIs and Data Access

Driver: LLMs often interact with APIs for various functions, making them susceptible to unauthorized access, data leaks, or manipulation. Pentesting these interfaces helps identify potential vulnerabilities in how users interact with the model and what data is exposed during those interactions.

Benefit: Safeguards sensitive data and prevents unauthorized access or data theft, ensuring user privacy and maintaining the integrity of the application.

3. Preventing Model Extraction Attacks

Driver: Attackers may attempt to clone or replicate the LLM by extracting its underlying model or training data through repeated querying. Pentesting can simulate these extraction attempts and evaluate the model’s resilience to such threats.

Benefit: Protects intellectual property and reduces the risk of competitors or malicious actors stealing or replicating the model, ensuring its value is preserved.

4. Assessing Resource Exploitation Risks

Driver: LLMs, especially those deployed in cloud environments, are vulnerable to resource exhaustion attacks (e.g., through repeated or resource-intensive queries). Pentesting helps simulate these attacks, identifying how the model handles overloads or denial-of-service attempts.

Benefit: Improves system stability and uptime by addressing resource exploitation risks, ensuring service availability and reducing downtime or disruptions.

5. Enhancing Compliance & Regulatory Adherence

Driver: As LLMs are increasingly used in sectors with strict data privacy and security regulations (e.g., healthcare, finance), pentesting helps ensure that the model adheres to compliance standards, including GDPR, HIPAA, DORA, NIS2, and others.

Benefit: Assures that the model meets regulatory requirements, avoiding legal and financial repercussions, and fostering trust with clients and users.

LLM-Specific Vulnerabilities in Pentesting

As business integrate LLMs into their workflows, penetration testing plays a crucial role in identifying and mitigating these security risks before they are exploited. A tailored security strategy, combining proactive testing, continuous pentesting, monitoring, and access controls, is essential for maintaining the integrity of LLM-driven applications.

Different types of LLMs introduce unique security concerns that pentesters must address:

LLM-enabled Applications

Prompt injection, data leakage, model bias exploitation

LLMs as a Service

API abuse, unauthorized access, rate-limit evasion

Custom LLM Models

Poisoned training data, unauthorized model access

Pre-trained Models

Supply chain risks, adversarial inputs, transparency issues

Edge LLM

Model exfiltration, adversarial inputs, hardware security risks, limited security patching

On-Prem LLM

Insider threats, misconfigurations, compliance failures

LLMs vary in deployment, control, and security. While LLM-enabled applications are fully managed with some customization, LLMaaS offer flexibility by integrating AI capabilities. Custom models provide full training control but require significant resources, whereas pre-trained models balance accessibility and customization. Edge LLMs enable real-time, localized processing for low latency but face physical security risks, while On-Prem LLMs ensure data governance and compliance but require strong infrastructure security. Understanding these differences is key to selecting the right model while mitigating risks

LLM Pentesting Methodology

Penetration Testing Services for LLMs assess the security, robustness, and ethical safeguards of large language models. This applies to LLM-enabled applications, LMaaS, and custom and pre-trained models integrated into business workflows, ensuring systems are secure, reliable, and free from exploitable weaknesses.

Key Objectives

  • Identify vulnerabilities and weaknesses in the LLM’s architecture, APIs, or deployment.
  • Evaluate the LLM’s resistance to malicious inputs, such as prompt injection or adversarial attacks.
  • Ensure compliance with ethical standards and privacy regulations.
  • Provide actionable insights and recommendations to strengthen the LLM’s security posture.

BreachLock LLM Pentesting Methodology Web Chart

LLM Pentesting Multi-Phase Approach

Penetration Testing Services for LLMs follows a structured, multi-phase approach to assess security risks, ethical safeguards, and system resilience. From initial planning and scope to the final documentation and reporting, each phase is designed to identify vulnerabilities, evaluate exploitation risks, and ensure robust defenses against adversarial threats. This comprehensive process helps enterprises proactively secure their LLM implementations against emerging attack vectors and compliance risks.

1. Planning and Scoping

  • Define Objectives: Outline the purpose of the LLM assessment, including identifying vulnerabilities, evaluating model behavior, and assessing potential misuse risks.
  • Scope Definition: Specify which aspects of the LLM system will be tested, such as the model itself, APIs, and integration points.
  • Rules of Engagement: Establish guidelines for testing, including acceptable prompts, data usage limitations, and any legal and ethical exclusions.
  • Considerations: Address concerns related to data privacy, intellectual property, and potential biases in LLM outputs.
  • 2. Information Gathering & Reconnaissance

    • Architecture Analysis: Understand the system’s overall architecture, including how the LLM is integrated with other components.
    • Documentation Review: Examine any available API documentation, model cards, or usage guidelines.
    • Model Identification: Determine the specific LLM being used, its version, and any known characteristics or limitations.

    3. LLM System Mapping & Enumeration

    • API Endpoint Discovery: Identify all LLM-related API endpoints and their functionalities.
    • Input/Output Analysis: Map the types of inputs accepted and outputs generated by the LLM.
    • Access Control Enumeration: Understand authentication mechanisms and role-based access controls for LLM interactions.

    4. Vulnerability Testing

    • Prompt Injection: Test for vulnerabilities related to malicious or unexpected prompts that could lead to unintended behavior.
    • Data Extraction: Attempt to extract sensitive information from the system via malicious input and responses.
    • Model Evasion: Evaluate the LLM’s ability to handle adversarial inputs designed to bypass content filters or security measures.

    5. LLM-Specific Testing

    • Prompt Leakage: Check if the system inadvertently reveals sensitive prompts or system instructions.
    • Training Data Inference: Attempt to infer information about the model’s training data through carefully crafted queries.
    • Model Extraction: Evaluate the risk of extracting model parameters or functionality through repeated interactions.
    • Bias and Fairness: Assess the model for potential biases or unfair treatment across different demographic groups.
      Hallucination Detection: Test the LLM’s tendency to generate false or unsupported information.

    6. Integration & Workflow Testing

    • Business Logic Testing: Evaluate how the LLM is integrated into broader application workflows and test for logic flaws.
    • Assess how the system handles unexpected inputs or errors in LLM responses.
    • Data Flow Analysis: Trace the flow of data to and from the LLM, identifying potential points of compromise.

    7. Exploitation & Impact Assessment

    • Controlled Exploitation: Demonstrate the real-world impact of identified vulnerabilities in a safe, controlled manner.
    • Chaining Attacks: Combine multiple weaknesses to showcase more severe exploitation scenarios.
    • Privacy Impact: Assess the potential for privacy breaches or data leaks through LLM interactions.

    8. Documentation & Reporting

    • Detailed Findings: Document all identified vulnerabilities, including LLM-specific issues and their potential impacts.
    • Risk Analysis: Rank findings based on severity, considering both traditional web vulnerabilities and LLM-specific risks.
    • Remediation Recommendations: Provide actionable recommendations for securing the LLM system, including prompt engineering, model fine-tuning, and integration improvements.

    OWASP Top 10 for LLMs 2025

    The OWASP Top 10 for LLMs started in 2023 as a community-driven effort to highlight and address issues specific to AI applications. Since then, the technology has continued and LLMs are embedded more deeply in everything from customer interactions to internal operations.

    The 2025 list V2.0 reflects a better understanding of existing risks and introduces critical updates on how LLMs are used in real-world applications today. The OWASP Top 10 List is a product of the open-source community’s insights and experiences, all of whom are committed to building safer AI applications.

    LLM01: 2025 Prompt Injection

    Prompt Injection Vulnerabilities occur when user inputs manipulate an LLM’s behavior or output in unintended ways, even if imperceptible to humans. These attacks exploit how models process prompts, potentially leading to guideline violations, harmful content generation, unauthorized access, or decision manipulation. Prompt injection types include:

    • Direct Prompt Injections: Direct prompt injections occur when a user’s prompt input directly alters the behavior of the model in unintended or unexpected ways. The input can be either intentional (i.e., a malicious actor deliberately crafting a prompt to exploit the model) or unintentional (i.e., a user inadvertently providing input that triggers unexpected behavior).
    • Indirect Prompt Injections: Indirect prompt injections occur when an LLM accepts input from external sources, such as websites or files. The content may have – when interpreted by the model in the external content – data that alters the behavior of the model in unintended or unexpected ways. Like direct injections and indirect injections that can be either intentional or unintentional.

    Mitigation Strategies

    • Constrain model behavior: Define the model’s role, capabilities, and limits in the system prompt. Enforce strict context adherence, restrict responses to specific tasks, and ignore attempts to alter core instructions.
    • Define and validate expected output formats: Specify clear output formats, request detailed reasoning and source citations and use deterministic code to validate adherence to these formats.
    • Implement input and output filtering: Define sensitive categories and set rules for detection and handling. Use semantic filters and string-checking to flag non-allowed content. Evaluate responses with the RAG Triad: context relevance, groundedness, and Q&A relevance to spot malicious outputs.
    • Enforce privilege control and least privilege access: Give applications their own API tokens for extensibility and handle functions in code, limiting the model’s access to only what’s necessary.
    • Conduct adversarial testing: Perform continuous penetration testing and adversarial simulations, treating the model as an untrusted user to test the effectiveness of trust boundaries and access controls.

    LLM02: 2025 Sensitive Information Disclosure

    LLMs risk exposing sensitive data, including PII, financial records, security credentials, and proprietary algorithms. Users should avoid sharing confidential information, as it may be reflected in model outputs. To mitigate risks, applications should sanitize data, enforce clear Terms of Use, and implement system-level restrictions—though these may not always prevent disclosure. Types of vulnerabilities may include:

    • PII Leakage: Personal identifiable information (PII) may be disclosed during interactions with the LLM.
    • Proprietary Algorithm Exposure: Poorly configured outputs can expose proprietary data and enable inversion attacks that extract sensitive information.
    • Sensitive Business Data Disclosure: Generated responses might inadvertently include confidential business information.

    Mitigation Strategies

    • Sanitization: Implement data sanitization and strict input validation to prevent sensitive or harmful data from entering the model.
    • Access controls: Enforce least privilege access and restrict external data sources to minimize data exposure.
      Federated learning and privacy techniques: Use federated learning and differential privacy to reduce centralized data risks and protect individual data points.
    • User education and transparency: Educate users on safe LLM usage and maintain transparency in data retention and processing policies.
    • Secure system configuration: Conceal system preambles and follow security misconfiguration best practices to prevent unintended data exposure.

    LLM03: 2025 Supply Chain

    LLM supply chains are susceptible to various vulnerabilities, which can affect the integrity of training data, models, and deployment platforms. These risks can result in biased outputs, security breaches, or system failures. While traditional software vulnerabilities focus on issues like code flaws and dependencies, in Machine Learning the risks also extend to third-party pre-trained models and data. Below are some of the risks that are associated with the supply-chain, which are also mentioned in LLM04:

    • Traditional 3rd Party Packages: Such as outdated or deprecated components, which attackers can exploit to compromise LLM-enabled applications and LMaaS. This is similar to A06:2021 – Vulnerable and Outdated Components – with increased risks when components are used during model development or fine tuning.
    • Licensing Risks: AI development involves diverse software and dataset licenses, creating risks if not properly managed. Different licenses impose varying legal requirements, with dataset licenses often restricting usage, distribution, or commercialization.
    • Outdated or Deprecated Models: Using outdated or deprecated models that are no longer maintained leads to security issues.
    • Vulnerable Pre-Trained Model: Pre-trained models can contain hidden biases, backdoors, or malicious features due to poisoned datasets or direct tampering (e.g., ROME/lobotomization), with limited security assurances.
    • Weak Model Provenance: Published models lack strong provenance guarantees, making them susceptible to supply chain attacks through compromised repositories or social engineering.
    • Vulnerable LoRA Adapters: LoRA fine-tunes LLMs by adding pre-trained layers for efficiency but introduces risks. Malicious LoRA adapters can compromise model integrity during merges or through platforms like vLLM and OpenLLM, where adapters are applied to deployed models.

    Mitigation Strategies

    • Vet data sources and suppliers: Review their T&Cs and security policies and conduct regular audits.
    • Follow OWASP A06:2021: Scan, manage, and patch vulnerabilities, including in sensitive development environments.
    • Use AI Red Teaming: Evaluate third-party models, as benchmarks can be bypassed.
    • Maintain an SBOM: Track components, detect vulnerabilities, and prevent tampering, exploring AI BOMs like OWASP CycloneDX.
    • Manage AI licensing risks: Manage risks with BOMs, audits, automated tools, and proper documentation.
    • Use models from verified sources: Conduct integrity checks, file hashes, and code signing.
    • Monitor collaborative models: Monitor development environments with automated security tools.
    • Detect tampering and poisoning: Use anomaly detection and adversarial robustness tests in red teaming.
    • Enforce a patching policy: Mitigate outdated components and maintain API and model security.

    LLM04: 2025 Data & Model Poisoning

    Data poisoning manipulates training data to introduce vulnerabilities, biases, or backdoors, compromising model security and performance. It can occur during pre-training, fine-tuning, or embedding stages, leading to biased outputs, degraded performance, or system exploitation. External data sources and open-source repositories are particularly vulnerable, with risks like malicious pickling or hidden backdoors that trigger harmful behavior. These attacks are challenging to detect and can create sleeper agents within models. Common vulnerabilities include:

    • Training Infiltration: Malicious actors introduce harmful data during training, leading to biased outputs. Techniques like “Split-View Data Poisoning” or “Frontrunning Poisoning” exploit model training dynamics to achieve this.
    • Data Injection: Users unknowingly inject sensitive or proprietary information during interactions, which could be exposed in subsequent outputs.
    • Unverified Data: Unverified training data increases the risk of biased or erroneous outputs.
    • Access Restrictions: Lack of resource access restrictions may allow the ingestion of unsafe data, resulting in biased outputs.

    Mitigation Strategies

    • Track data origins and transformations: Use tools like OWASP CycloneDX or ML-BOM. Verify data legitimacy during all model development stages.
    • Vet data vendors rigorously: Validate model outputs against trusted sources to detect signs of poisoning.
    • Implement strict sandboxing: Limit model exposure to unverified data sources. Use anomaly detection techniques to filter out adversarial data.
    • Tailor models for different use cases: Use specific datasets for fine-tuning. This helps produce more accurate outputs based on defined goals.
    • Ensure sufficient infrastructure controls: This will prevent the model from accessing unintended data sources.
    • Use data version control (DVC): Track changes in datasets and detect manipulation. Versioning is crucial for maintaining model integrity.
    • Store user-supplied information: Using a vector database allows adjustments without retraining the entire model.
    • Test model robustness: Use red team campaigns and adversarial techniques, such as federated learning, to minimize the impact of data perturbations.
    • Monitor training loss: And analyze model behavior for signs of poisoning. Use thresholds to detect anomalous outputs.
    • Integrate Retrieval-Augmented Generation (RAG): Use RAG during inference, and grounding techniques to reduce risks of hallucinations.

    LLM05: 2025 Improper Output Handling

    Improper Output Handling occurs when LLM-generated outputs are not properly validated, sanitized, or controlled before being passed to other systems. Unlike over reliance, which concerns trust in LLM accuracy, this issue directly affects security by enabling attacks like XSS, CSRF, SSRF, and remote code execution. Risks increase when LLMs have excessive privileges, lack output encoding, or are vulnerable to indirect prompt injection. Weak validation in third-party extensions, insufficient monitoring, and missing rate limits further amplify the threat. Examples of vulnerabilities may include:

    • Remote Code Execution: LLM output is entered directly into a system shell or similar function such as exec or eval, resulting in remote code execution.
    • XSS: JavaScript or Markdown is generated by the LLM and returned to a user. The code is then interpreted by the browser, resulting in XSS.
    • SQL Injection: LLM-generated SQL queries are executed without proper parameterization, leading to SQL injection.
    • Path Traversal: LLM output is used to construct file paths without proper sanitization, potentially resulting in path traversal vulnerabilities.
    • Phishing Attacks: LLM-generated content is used in email templates without proper escaping, potentially leading to phishing attacks.

    Mitigation Strategies

    • Treat the model as any other user: Adopt a zero-trust approach and apply proper input validation on responses coming from the model to backend functions.
    • Follow the OWASP ASVS (Application Security Verification Standard): These guidelines ensure effective input validation and sanitization.
    • Encode model output back to users: Mitigate undesired code execution by JavaScript or Markdown. OWASP ASVS provides detailed guidance on output encoding.
    • Implement context-aware output: Encode based on where the LLM output will be used (e.g., HTML encoding for web content, SQL escaping for database queries).
    • Use parameterized queries: Prepare parameterized queries and statements for all database operations involving LLM output.
    • Employ strict Content Security Policies (CSP): Mitigate the risk of XSS attacks from LLM-generated content.
    • Implement robust logging and monitoring systems: Detect unusual patterns in LLM outputs that might indicate exploitation attempts.

    LLM06: 2025 Excessive Agency

    Excessive Agency occurs when an LLM-based system, granted the ability to call functions or interact with other systems via extensions, performs harmful actions due to unexpected or manipulated outputs. This vulnerability is triggered by issues like hallucinations, prompt injections, or compromised extensions and agents. Excessive functionality, permissions, or autonomy are the root causes. It can lead to serious impacts on confidentiality, integrity, and availability, depending on the systems the LLM-enabled application interacts with. Common risks include:

    • Excessive Functionality: An extension may have been trialed during a development phase and dropped in favor of a better alternative, but the original plugin remains available to the LLM agent.
    • Excessive Permissions: An LLM plugin with open-ended functionality fails to filter input instructions, allowing unauthorized commands to be executed beyond the intended operation.
    • Excessive Permissions: An LLM extension has excessive permissions, such as granting a data-reading extension database access beyond SELECT, including UPDATE, INSERT, and DELETE.
    • Excessive Autonomy: An LLM-based application or extension fails to independently verify and approve high-impact actions e.g., an extension that allows a user’s documents to be deleted performs deletions without any confirmation from the user.

    Mitigation Strategies

    • Minimize Extensions: Allow only essential extensions for LLM agents.
    • Limit Extension Functionality: Restrict extensions to necessary functions only.
    • Avoid Open-ended Extensions: Use specific, controlled extensions instead of broad commands.
    • Restrict Extension Permissions: Grant minimal access to prevent unintended actions.
    • Execute in User Context: Ensure actions follow user authentication and least privilege.
    • Require User Approval: Implement human approval for high-impact operations.
    • Enforce Authorization: Validate all requests in downstream systems, not just via LLM.
    • Sanitize Inputs & Outputs: Apply OWASP security best practices, including SAST, DAST, and IAST.

    LLM07: 2025 System Prompt Leakage

    System prompt leakage in LLMs occurs when prompts meant to guide the model’s behavior contain unintended sensitive information, such as credentials or connection strings. While the disclosure of these prompts themselves is not the primary risk, the real threat arises from improper use or storage of sensitive data and bypassed security measures. If system prompts contain roles, permissions, or sensitive data, the risk lies in the application’s failure to enforce proper security checks, allowing potential attackers to exploit these weaknesses through the model’s outputs. Common examples of risk include:

    • Exposure of Sensitive Functionality: A system prompt may expose sensitive details like API keys, database credentials, or system architecture, enabling attackers to exploit vulnerabilities or gain unauthorized access. For instance, revealing the database type could aid SQL injection attacks.
    • Exposure of Internal Rules: Exposing internal decision-making processes in system prompts can help attackers bypass controls. For example, revealing transaction or loan limits in a banking chatbot may enable users to exploit or override security restrictions.
    • Revealing Filtering Criteria: System prompts may disclose content filtering rules, helping attackers infer restrictions and find ways to bypass them.
    • Disclosure of Permissions and User Roles: Exposing internal role structures can help attackers identify privilege escalation opportunities.

    Mitigation Strategies

    • Separate sensitive data: Do not embed API keys, authentication data, or system permissions in system prompts; instead, store them in external systems the model cannot access.
    • Avoid prompt-based control: Since prompt injections can alter system prompts, rely on external mechanisms to enforce strict model behavior, such as filtering harmful content outside the LLM.
    • Implement guardrails: Use independent systems to validate model outputs and ensure compliance rather than relying solely on system prompt instructions.
    • Enforce security controls independently: Critical controls like privilege separation and authorization should be handled outside the LLM, using multiple agents with least privilege where necessary.

    LLM08: 2025 Vector and Embedding Weaknesses

    Vectors and embeddings in Retrieval Augmented Generation (RAG) with LLMs introduce security risks if not properly managed. Weaknesses in their generation, storage, or retrieval can be exploited to inject harmful content, manipulate model outputs, or access sensitive information. Since RAG enhances LLM performance by integrating external knowledge sources, vulnerabilities in vector mechanisms and embeddings can compromise model integrity, leading to misinformation, data leakage, or unauthorized access. Common risks include:

    • Unauthorized Access & Data Leakage: Weak access controls can expose embeddings containing sensitive data, leading to unauthorized access, data leaks, or legal issues from policy violations.
    • Cross-Context Information Leaks & Federation Knowledge Conflict: In multi-tenant environments using shared vector databases, context leakage risks arise between users or queries. Data federation conflicts can occur when data from multiple sources contradict each other, or when an LLM fails to update old knowledge with new data from Retrieval Augmentation.
    • Embedding Inversion Attacks: Attackers can exploit vulnerabilities to invert embeddings and recover significant amounts of source information, compromising data confidentiality.
    • Data Poisoning Attacks: Data poisoning can occur intentionally by malicious actors or unintentionally and can originate from insiders, prompts, data seeding, or unverified data providers, leading to manipulated model outputs.
    • Behavior Alteration: Retrieval Augmentation can improve factual accuracy but may reduce emotional intelligence or empathy, affecting the model’s effectiveness in some applications.

    Mitigation Strategies

    • Permission and access control: Implement fine-grained access controls and permission-aware vector and embedding stores. Ensure strict logical and access partitioning of datasets in the vector database to prevent unauthorized access between different classes of users or different groups.
    • Data validation & source authentication: Implement robust data validation pipelines for knowledge sources. Regularly audit and validate the integrity of the knowledge base for hidden codes and data poisoning. Accept data only from trusted and verified sources.
    • Data review for combination & classification: When combining data from different sources, thoroughly review the combined dataset. Tag and classify data within the knowledge base to control access levels and prevent data mismatch errors.
    • Monitoring and Logging: Maintain detailed immutable logs of retrieval activities to detect and respond promptly to suspicious behavior.

    LLM09: 2025 Misinformation

    Misinformation from LLMs is a significant vulnerability, occurring when models produce false or misleading content that appears credible. This issue often arises from hallucinations, where LLMs generate fabricated information based on statistical patterns, leading to seemingly accurate but incorrect outputs. Other causes include biases in training data and incomplete information. Over reliance on LLM-generated content exacerbates the problem, as users may trust and integrate inaccurate data into critical decisions without proper verification, increasing the risk of security breaches, reputational harm, and legal consequences. Common vulnerabilities include:

    • Factual Inaccuracies: The model produces incorrect statements, leading users to make decisions based on false information.
    • Unsupported Claims: The model generates baseless assertions, which can be especially harmful in sensitive contexts such as healthcare or legal proceedings.
    • Misrepresentation of Expertise: The model gives the illusion of under- standing complex topics, misleading users regarding its level of expertise.
    • Unsafe Code Generation: The model suggests insecure or non-existent code libraries, which can introduce OWASP Top 10 for LLM Applications v2.0 vulnerabilities when integrated into software systems. For example, LLMs propose using insecure third-party libraries, which, if trusted without verification, leads to security risks.

    Mitigation Strategies

    • Retrieval-Augmented Generation (RAG): Use Retrieval-Augmented Generation to improve model reliability by fetching verified information from trusted databases, reducing hallucinations and misinformation.
    • Model Fine-Tuning: Enhance the model with fine-tuning or embeddings to improve output quality. Techniques such as parameter-efficient tuning (PET) and chain-of-thought prompting can help reduce the incidence of misinformation.
    • Cross-Verification and Human Oversight: Encourage users to verify LLM outputs with trusted sources and implement human oversight, especially for critical information, ensuring reviewers are trained to avoid over reliance on AI.
    • Automatic Validation Mechanisms: Implement tools and processes to automatically validate key outputs, especially output from high-stakes environments.
    • Risk Communication: Identify the risks and possible harms associated with LLM-generated content, then clearly communicate these risks and limitations to users, including the potential for misinformation.
    • Secure Coding Practices: Establish secure coding practices to prevent the integration of vulnerabilities due to incorrect code suggestions.
    • User Interface Design: Design APIs and interfaces that promote responsible LLM use by adding content filters, labeling AI-generated content, and clarifying reliability and usage limitations.
    • Training and Education: Train users on LLM limitations, the importance of verifying content, and critical thinking, with domain-specific training to help evaluate outputs in their field.

    LLM10: 2025 Unbounded Consumption

    Unbounded Consumption in LLMs happens when excessive inferences are allowed, leading to risks like DoS, financial losses, model theft, and service degradation. These vulnerabilities stem from the high computational demands of LLMs, particularly in cloud environments, which can be exploited for disruption or financial and intellectual property harm. Vulnerabilities may include:

    • Variable-Length Input Flood: Attackers overload the LLM with inputs of varying lengths, exploiting inefficiencies and depleting resources, potentially causing system unresponsiveness.
    • Denial of Wallet (DoW): Attackers exploit the cost-per-use model of cloud-based AI services, triggering high operations that lead to unsustainable financial burdens on the provider.
    • Continuous Input Overflow: Constantly sending inputs beyond the LLM’s context window can overuse computational resources, causing service degradation and disruptions.
    • Resource-Intensive Queries: Submitting complex or demanding queries drains system resources, leading to slower processing and possible system failure.
    • Model Extraction via API: Attackers use crafted inputs and prompt injections to replicate or steal parts of the model, risking intellectual property theft and model integrity.

    Mitigation Strategies

    • Limit exposure of Logits and Logprobs: Restrict or obfuscate the exposure of `logit_bias and `logprobs` in API responses. Provide only the necessary information without revealing detailed probabilities.
    • Rate Limiting: Apply rate limiting and user quotas to restrict the number of requests a single source entity can make in a given time period.
    • Timeouts and Throttling: Set timeouts and throttle processing for resource-intensive operations to prevent prolonged resource consumption.
    • Sandbox techniques: Limit the LLM’s access to network resources, internal services, and APIs to mitigate insider risks and prevent side-channel attacks.
    • Watermarking: Implement watermarking frameworks to embed and detect unauthorized use of LLM outputs.
    • Graceful degradation: Design system to degrade gracefully under heavy load, maintaining partial functionality rather than complete failure.
    • Limit queued actions and scale robustly: Implement restrictions on the number of queued actions and total actions, while incorporating dynamic scaling and load balancing to handle varying demands and ensure consistent system performance.
    • Adversarial robustness training: Train models to detect and mitigate adversarial queries and extraction attempts.
    • Centralized ML model inventory: Use a centralized ML model inventory or registry for models used in production, ensuring proper governance and access control.
    • Automated MLOps deployment: Implement automated MLOps deployment with governance, tracking, and approval workflows to tighten access and deployment controls within the infrastructure.

    Using BreachLock Offensive Security Technologies Effectively

    I. Continuous Penetration Testing

    Effective Use:

    • Automated, Scheduled Testing: Use of automated tools to run regular, scheduled or on-demand tests that uncover vulnerabilities as changes occur across the attack surface.
    • On-demand Testing and Retesting: Enable quick, targeted testing on demand, especially after significant updates or remediation activities, ensuring the mitigation efforts are effective.
    • Hybrid Approach (Automated + Human-led): Using a combination of automated testing with certified human-led expertise for critical assets, capturing nuanced vulnerabilities that require expert intervention.

    Alignment With A Proactive Approach:

    • Proactive Exposure Validation: Proactively test how adversaries might exploit vulnerabilities validating that security measures function effectively in a live environment.
    • Real-time Threat Exposure Management (TEM): Continuous pentesting feeds into TEM by offering real-time insights into an enterprise’s security posture, enabling faster response to newly discovered risks and accelerating remediation efforts.

    II. Attack Surface Management

    Effective Use:

    • Automated Asset Discovery: Continuously scan for new and exposed assets across an enterprise’s internal and external environments, identifying potential vulnerabilities, including Shadow IT, exposed data via the Dark Web, and open ports.
    • Risk Prioritization and Contextual Analysis: Use ASM to prioritize assets based on business value, exploitability, and exposure, enabling efficient allocation of security resources.
    • Automated Response and Remediation: Set up workflows to trigger alerts and initiate automated responses for Critical to High-risk exposures, reducing manual resources to secure assets promptly.

    Alignment With A Proactive Approach:

    • Holistic Threat Exposure Visibility: Offers a real-time, comprehensive view of the enterprise’s attack surface, monitoring assets and changes as they happen.
    • Integration with Threat Exposure Management: ASM maintains a constantly updated inventory of assets and associated weaknesses, allowing security practitioners to proactively address vulnerabilities.

    III. Red Teaming as a Service (RTaaS)

    Effective Use:

    • Full Scope Adversarial Simulation: Employ red teaming to simulate real-world attacks targeting a range of internal and/or external assets using discovered findings through ASM for penetration testing.
    • Objective-based Testing: Focus on specific high-value targets or scenarios (e.g., ransomware simulation) that represent the enterprise’s most critical threats.
    • Combined with Continuous Pentesting: Use insights from penetration testing to inform red team exercises, ensuring persistent vulnerabilities are examined under adversarial conditions.

    Alignment With A Proactive Approach:

    • Adversarial Exposure Validation (AEV): Central to AEV, offers a comprehensive adversarial perspective revealing potential attack paths by exploiting assets under aggressive threat-based conditions.
    • Continuous Threat Exposure Management (CTEM): Red teaming results enhance threat exposure management by identifying patterns in security gaps, contributing data that improves threat identification and visibility across the attack surface.

    IV. The BreachLock Unified Platform

    Effective Use:

    • Integrated Asset Discovery: The BreachLock Unified Platform provides end-to-end visibility of an enterprise’s assets and attack surface, continuously identifying and mapping exposures in real-time to understand the full extent of potential risks.
    • Vulnerability Prioritization & Contextual Insights: Through automated vulnerability assessments and contextualized risk analysis, the unified platform helps prioritize exposures based on severity, exploitability, and business impact.
    • Continuous Testing & Monitoring: Automates remediation tasks and offers continuous on-demand retesting to ensure vulnerabilities are properly addressed and mitigated, reducing response times.
    • Consolidation of Tools & Workflows: Consolidates tools and workflows, reducing manual efforts and increasing operational efficiency.
    • Data-Driven Decision-Making: Provides detailed insights for informed risk management and compliance alignment.
      Scalability: Handles large volumes of data, supporting security management across complex, distributed environments.

    Alignment With A Proactive Approach:

    • Structured Approach to Threat Management: Provides a phased approach to exposure management, helping enterprises evolve from basic vulnerability management to advanced threat exposure insights and action.
    • End-to-end Risk Visibility: By leveraging the power of integration and consolidation of multiple tools and capabilities in one data model, this centralized approach provides endless vulnerability clarity and reporting.
    • Continuous Maturity Building: The BreachLock Unified Platform supports ongoing security maturation, enabling end-to-end visibility across technologies and environments through a common data model to share data-driven insights and drive continuous improvement.

    BreachLock’s Value for Enterprises

    The BreachLock Unified Platform integrates offensive security solutions and capabilities. By consolidating assets, vulnerabilities, and test findings in one common data model, enterprises eliminate the inefficiencies of switching between multiple tools and systems centralizing automated workflows and accelerating the remediation and reporting processes.

    With findings all in one place, the BreachLock Unified Platform consolidates analytics and shares insights across DevSecOps teams enabling faster decision-making based on real threats and their potential impact. With high-fidelity data, users can better understand their vulnerable assets and why they may be business critical.

    Conclusion

    The rise of Large Language Models (LLMs) in cybersecurity has introduced both immense opportunities and new vulnerabilities. As organizations integrate LLMs into their operations, the threat landscape evolves. These models are susceptible to issues like prompt injections, data poisoning, and hallucinations, which can lead to misinformation or data breaches. As a result, proactive security measures specifically designed for LLMs are now critical for mitigating these risks and ensuring secure deployments.

    The OWASP Top 10 for LLMs provides an essential framework for identifying vulnerabilities unique to these models. From improper output handling to inadequate validation, these risks can lead to serious consequences if not addressed. Implementing continuous penetration testing, or a hybrid approach combining automated scans with human- led evaluation, is crucial for staying ahead of potential threats.

    Such testing helps organizations identify and address vulnerabilities before they can be exploited by attackers. As LLMs become more integrated into business processes, ongoing security testing is vital. Adopting proactive penetration testing services for LLMs ensures that security risks are managed effectively, minimizing the potential for breaches and ensuring trust in AI-powered systems. Don’t wait for vulnerabilities to be exploited—start securing your LLMs today with continuous penetration testing to safeguard your organization’s future.

    About BreachLock

    BreachLock is a global leader in Continuous Attack Surface Discovery and Penetration Testing. Continuously discover, prioritize, and mitigate exposures with evidence-backed Attack Surface Management, Penetration Testing, and Red Teaming.

    Elevate your defense strategy with an attacker’s view that goes beyond common vulnerabilities and exposures. Each risk we uncover is backed by validated evidence. We test your entire attack surface and help you mitigate your next cyber breach before it occurs.

    Know Your Risk. Contact BreachLock today!

    Author

    Ann Chesbrough

    Vice President of Product Marketing, BreachLock

Industry recognitions we have earned

reuters logo cybersecurity_awards_2024 logo winner logo csba logo hot150 logo bloomberg logo top-infosec logo

Fill out the form below to let us know your requirements.
We will contact you to determine if BreachLock is right for your business or organization.

background image