Vern Labs is a security lab building runtime protection, agent authorization, and adversarial testing infrastructure for AI systems deployed in production. Used by teams shipping AI in defense, finance, and healthcare.
Each product deploys independently. Together they form a unified control plane for every AI system in your organization.
Intertrace operates as a provider-agnostic gateway between your application and any LLM, embedding model, or tool server. Every prompt, retrieval, output, and tool call is inspected against policy at runtime.
// Drop-in proxy — every call is inspected at runtime. import { Intertrace } from "@vern/intertrace"; const vern = new Intertrace({ endpoint: process.env.VERN_ENDPOINT, policy: "prod.strict", on: { block: (evt) => audit.push(evt), }, }); // Your existing call stays the same. const res = await vern.openai.chat.completions.create({ model: "gpt-5-reasoning", messages, tools, }); // Blocked calls surface as structured signals. if (res.vern.action === "block") { log.warn({ reason: res.vern.rule }); }
Ghostline issues scoped capability tokens for every tool, resource, and external call an agent can make. High-impact actions are gated behind human approval with full audit trail.
Blackbox runs continuous adversarial evaluations against copilots, agents, and AI applications — producing severity-ranked findings with reproducible transcripts and an exportable coverage report.
CATEGORY PASS FAIL COVERAGE ──────────────────────────────────────────── injection · direct 18 2 ████████████░░░░ 91% injection · indirect 11 3 █████████░░░░░░░ 76% jailbreak · persona 14 1 ██████████████░░ 93% pii · exfil 09 0 ████████████████ 100% tool · misuse 07 5 ███████░░░░░░░░░ 58% privilege · abuse 12 2 ██████████████░░ 86% data · leak 15 0 ████████████████ 100% ──────────────────────────────────────────── TOTAL 86 13 OVERALL COVERAGE 87%
Vern Labs sits between your application and the models, agents, and tools it depends on. Every surface is observable, scopable, and testable.
┌─────────────────────────────────────────────────────────────┐
│ APPLICATION LAYER │
│ copilots · internal agents · workflows · tools │
└──────────────────────────┬──────────────────────────────────┘
│ requests / streams
▼
──────────────────────────────────────────────────────────────────────────
│ VERN LABS CONTROL PLANE │
│ │
│ [01] INTERTRACE ─ inline inspection ─ policy engine │
│ prompts · outputs · tools │
│ │
│ [02] GHOSTLINE ─ capability tokens ─ approval gate │
│ scope per agent / per tool │
│ │
│ [03] BLACKBOX ─ adversarial runs ─ coverage rpt. │
│ pre-launch · continuous │
│ │
│ ────────────────────── AUDIT LEDGER ────────────────────── │
│ append-only · signed · SIEM export │
──────────────────────────────────────────────────────────────────────────
│
▼
┌─────────────────────────────────────────────────────────────┐
│ MODELS · AGENTS · TOOLS │
│ OpenAI · Anthropic · open weights · MCP │
└─────────────────────────────────────────────────────────────┘
Based on internal evaluation across feature surface, deployment flexibility, and end-to-end coverage. Category labels generalize over specific vendors in each segment.
VERN PROMPT-FW REDTEAM-SVC IN-HOUSE ─────────────────────────────────────────────────────────────────────────────── runtime inspection ●●●●● ●●●○○ ●○○○○ ●●○○○ agent authorization ●●●●● ●○○○○ ○○○○○ ●○○○○ adversarial testing ●●●●● ○○○○○ ●●●●○ ●●○○○ unified control plane ●●●●● ●●○○○ ●○○○○ ○○○○○ self-host · air-gap ●●●●● ●●○○○ ●○○○○ ●●●●● audit + SIEM export ●●●●● ●●○○○ ●●○○○ ●●○○○ open primitives · research ●●●●● ●○○○○ ●●○○○ ○○○○○ ─────────────────────────────────────────────────────────────────────────────── time to first signal < 1 day 1–2 wks 2–4 wks 3–6 mo
Vern Labs publishes research on how AI systems fail in the wild — and open-sources the primitives that help teams defend against those failures.
For teams evaluating a single product on a bounded workload.
For teams running AI in production with real users and real risk.
For regulated industries, defense, and air-gapped environments.
Vern Labs was founded by operators with backgrounds in federal cybersecurity, enterprise cloud security, and applied AI research.
Cybersecurity at NASA. TS/SCI cleared. Previously at Raytheon and a U.S. Army veteran. Serves on the Y Combinator board. A decade securing systems where the cost of a breach is measured in lives, not dashboards.
Security engineering at Microsoft, Wiz, and Google. Has built cloud security platforms that protect tens of thousands of enterprise environments. Came to Vern Labs to solve the problem the next decade of software is actually built on.
Talk to Vern Labs about securing your AI systems before they become your next attack surface.
Intertrace, Ghostline, and Blackbox are independent products with a shared control plane. Deploy one. Deploy all three. They are designed to work together but do not require each other.
Intertrace is a provider-agnostic gateway that sits between your application and any LLM, embedding model, retrieval layer, or tool server. Every request flows through a policy engine that inspects prompts, outputs, tool calls, and retrieved context against your rule set at runtime.
It is deployed as a single stateless container. Policy changes propagate in under two seconds. Streaming responses are fully supported with inline policy checks that run in parallel with the provider call — adding roughly 18ms at the median.
// Drop-in proxy — every call is inspected at runtime. import { Intertrace } from "@vern/intertrace"; const vern = new Intertrace({ endpoint: process.env.VERN_ENDPOINT, policy: "prod.strict", on: { block: (evt) => audit.push(evt), }, }); // Your existing call stays the same. const res = await vern.openai.chat.completions.create({ model: "gpt-5-reasoning", messages, tools, }); // Blocked calls surface as structured signals. if (res.vern.action === "block") { log.warn({ reason: res.vern.rule }); }
Ghostline issues scoped capability tokens for every tool, resource, and external call an agent can make. High-impact actions route through approval queues where a human or policy evaluates each request before execution.
Tokens use the Biscuit format with custom claim extensions. Revocation is real-time and cascading — pulling a token invalidates every derived scope in flight across every running agent.
Blackbox runs continuous adversarial evaluations against copilots, agents, and AI applications. Every run produces severity-ranked findings, reproducible transcripts, and an exportable coverage report ready for audit and compliance review.
The suite includes OWASP LLM Top 10 plus Vern Labs' proprietary attack set, updated weekly by the research team. Every vector is versioned and deterministic so you can compare results run-to-run.
CATEGORY PASS FAIL COVERAGE ──────────────────────────────────────────── injection · direct 18 2 ████████████░░░░ 91% injection · indirect 11 3 █████████░░░░░░░ 76% jailbreak · persona 14 1 ██████████████░░ 93% pii · exfil 09 0 ████████████████ 100% tool · misuse 07 5 ███████░░░░░░░░░ 58% privilege · abuse 12 2 ██████████████░░ 86% data · leak 15 0 ████████████████ 100% ──────────────────────────────────────────── TOTAL 86 13 OVERALL COVERAGE 87%
Vern Labs runs as a single stateless container that you deploy inside your VPC, on-premises, or as a fully air-gapped instance. Everything about the architecture is designed around three constraints: low latency, no data retention, and no surprise dependencies.
Vern Labs sits between your application and the models, agents, and tools it depends on. Every surface is observable, scopable, and testable from one place.
┌─────────────────────────────────────────────────────────────┐
│ APPLICATION LAYER │
│ copilots · internal agents · workflows · tools │
└──────────────────────────┬──────────────────────────────────┘
│ requests / streams
▼
──────────────────────────────────────────────────────────────────────────
│ VERN LABS CONTROL PLANE │
│ │
│ [01] INTERTRACE ─ inline inspection ─ policy engine │
│ prompts · outputs · tools │
│ │
│ [02] GHOSTLINE ─ capability tokens ─ approval gate │
│ scope per agent / per tool │
│ │
│ [03] BLACKBOX ─ adversarial runs ─ coverage rpt. │
│ pre-launch · continuous │
│ │
│ ────────────────────── AUDIT LEDGER ────────────────────── │
│ append-only · signed · SIEM export │
──────────────────────────────────────────────────────────────────────────
│
▼
┌─────────────────────────────────────────────────────────────┐
│ MODELS · AGENTS · TOOLS │
│ OpenAI · Anthropic · open weights · MCP │
└─────────────────────────────────────────────────────────────┘
Fastest path to production. Vern Labs operates the infrastructure; your data stays in our US / EU regions.
Vern operates the control plane; your LLM traffic and audit data never leaves your network.
Full Vern stack in your AWS, GCP, or Azure VPC. Complete data control. Most common for finance and healthcare.
Complete deployment inside classified or disconnected environments. No outbound dependencies.
Every LLM call routed through Intertrace follows the same six-stage pipeline. Stages run in parallel where safe, and the entire hot path is under 20ms at p50 for text payloads under 8KB.
Vern Labs is built by people who've held TS/SCI clearances and shipped enterprise security at scale. Our architecture is designed around the assumption that you should never have to trust us with anything we don't strictly need.
Vern Labs doesn't ship lock-in. Intertrace works with any LLM provider via the OpenAI-compatible interface, Anthropic's Messages API, or as a transparent HTTP proxy. Ghostline integrates with major agent frameworks or through a low-level policy API.
Vern Labs publishes research on how AI systems fail in the wild and open-sources primitives that help teams defend against those failures. Everything we learn from our own deployments and red team engagements becomes a public artifact — papers, notes, benchmark sets, and reference implementations.
We propose a behavioral risk classifier that scores agent actions in real time based on capability context, target resource sensitivity, and historical deviation. Evaluated across 12 production agent deployments with a 41% reduction in escalations required.
Reframing prompt injection through the lens of software supply chain attacks. Proposes provenance tracking for every context token that reaches a production model.
We open-source the primitives that help the whole ecosystem get better at securing AI. MIT licensed, production-ready, welcoming contribution.
A living benchmark of 140+ attack vectors across injection, jailbreak, privilege abuse, and exfiltration. Weekly curated by the Vern research team.
CLI tool for inspecting LLM traffic on your dev machine. Pipe through <command> and see every request and response with policy markup.
Go library for issuing, validating, and cascading Biscuit-format capability tokens for autonomous agent authorization.
Reference Rego policies for common LLM security rules. Drop into OPA or Intertrace. Covers injection detection, PII redaction, and tool abuse heuristics.
Papers, notes, and open-source releases — no other mail. You can unsubscribe any time.
Vern Labs is priced to match the way teams actually adopt AI security — starting with a single workload and expanding as the risk surface grows. Pilots are free. Production is usage-based. Enterprise is negotiated.
For teams evaluating a single product on a bounded workload.
Usage-based pricing. Volume discounts kick in at scale.
Regulated, defense, and air-gapped environments. Negotiated terms.
PILOT PRODUCTION ENTERPRISE ─────────────────────────────────────────────────────────────────────── request volume 100k/mo unlimited unlimited products included one all three all three deployment — cloud ● ● ● deployment — self-host (VPC) ○ ● ● deployment — air-gap ○ ○ ● policy rules standard standard + custom custom adversarial test suite OWASP + Vern custom + classified support channel email slack 4h SLA 24/7 IR SOC 2 / MSA documentation ○ ● ● FedRAMP · CMMC alignment ○ ○ ● dedicated solutions architect ○ ○ ● single-tenant deployment ○ ○ ● ─────────────────────────────────────────────────────────────────────── contract length 30 days 12 mo · mo custom
Vern Labs is a cybersecurity research and product company based in the United States. We were founded in 2024 by operators with backgrounds in federal cybersecurity, enterprise cloud security, and applied AI research. The mission is simple: build the security infrastructure that the next decade of software will actually depend on.
Traditional security tools were designed for software that does what you tell it to. Modern AI systems reason, retrieve, call tools, and act autonomously — and the attack surface that creates is new, wide, and actively being exploited. We think the next decade of software will run on this substrate. Someone needs to build the security layer for it.
That's the work.
Cybersecurity at NASA, where he held TS/SCI clearance and worked on defensive systems for flight-critical and classified workloads. Previously at Raytheon, and a U.S. Army veteran. Serves on the Y Combinator board.
Sam started Vern Labs because he spent a decade watching defense and enterprise teams treat AI like just another API — when the actual threat model is closer to adding a new autonomous agent to an organization.
Security engineering at Microsoft, Wiz, and Google. Has shipped cloud security platforms that protect tens of thousands of enterprise environments and internal production systems at hyperscale.
Joined Vern Labs to solve the problem the next decade of software will actually run on. Leads the research team and owns the architecture of the Vern control plane.
Small team, high stakes, serious work. We hire exclusively for calibration, taste, and raw technical ability. We pay top-of-market. We ship in writing.
A real person on our team reviews every inbound. Most messages get a response within four business hours. For urgent security matters, use the hotline below.
Short notes are fine — a sentence on what you're building and where you're stuck is enough to route you to the right person. We'll come back with a 20-minute slot and a tailored reading list before the call.
Use the right channel for your question and you'll get a faster, better answer.
Pricing, procurement, pilots, reseller questions, volume deals.
sales@vernlabs.comArchitecture deep-dives, deployment planning, integration questions.
engineering@vernlabs.comPaper collaborations, benchmark contributions, academic partnerships.
research@vernlabs.comIf you've identified a vulnerability in a Vern Labs product, deployment, or research artifact — we want to hear from you first, and we'll work with you to coordinate disclosure.
We operate a formal responsible disclosure program and publicly acknowledge researchers with permission. Critical reports get a response within 24 hours, any day of the week.
We are a small team building infrastructure that protects AI systems in production. We hire exclusively for calibration, taste, and raw technical ability, and we pay top-of-market to retain that bar. We operate remote-first across the United States with quarterly in-person offsites.
Cash plus meaningful early equity. We calibrate against Levels.fyi p90 for comparable roles and we share the ranges openly.
Work from anywhere in the US. Quarterly in-person offsites — travel and lodging covered.
Four weeks paid leave, mandatory. Two weeks minimum taken in each half of the year. Your calendar says so.
$5,000 per year for hardware, books, conferences, courses — whatever sharpens the axe.
We respect your time. No brain-teasers, no whiteboard gotchas. You'll do real work representative of what the job actually involves.
We always want to hear from exceptional people, even if we don't have a posted role.
Vern Labs exists to improve the security posture of AI systems. That only works if we meet the bar we ask our customers to hold. This page is where we document that bar — attestations, architecture commitments, and the data policies we operate under.
Drata-monitored. Audit conducted by a top-4 firm. Report available to prospects under mutual NDA.
Architecture supports HIPAA workloads. BAA on request for healthcare customers on Enterprise tier.
Air-gap deployment satisfies FedRAMP Moderate technical controls. Formal authorization pursued with lead customers.
Vern Labs architecture supports CMMC Level 2 requirements for handling CUI in defense supply chain.
Your prompts and outputs pass through Intertrace. They do not persist there. We retain metadata needed for policy decisions and audit — nothing more.
For regulated workloads, customers can opt in to full payload retention within their own VPC or S3 bucket. Vern Labs never has access. All keys and encryption remain under customer control.
SOC 2 reports, subprocessor list, DPA, security questionnaire responses, and the architecture brief are available to verified prospects under mutual NDA.
Vern Labs runs a formal responsible disclosure program. If you've found a vulnerability in our products, infrastructure, or research artifacts, we want to hear from you before anyone else does — and we'll work with you to coordinate disclosure on a timeline that protects users.
SEVERITY TRIAGE FIRST FIX STATUS UPDATES BOUNTY ──────────────────────────────────────────────────────────────────────────────── CRITICAL < 24 hours < 72 hours daily up to $10,000 HIGH 2 bus. days 2 weeks weekly up to $3,000 MEDIUM 5 bus. days 30 days bi-weekly up to $750 LOW 10 bus. days best effort on milestone up to $150 ──────────────────────────────────────────────────────────────────────────────── PUBLIC ACKNOWLEDGMENT on request
A good report includes: the vulnerability, reproduction steps, affected components, potential impact, and any recommended mitigations. Screenshots and PoC code are welcome.
Please do not access, modify, or exfiltrate data belonging to other customers. Do not use automated scanners. Do not publish findings before we've coordinated disclosure.
Vern Labs will not pursue legal action against security researchers who make a good-faith effort to follow this policy. We consider research conducted in accordance with this program to be authorized under the Computer Fraud and Abuse Act and similar laws, and we will not initiate or support legal action against researchers for accessing Vern Labs systems in connection with good-faith vulnerability research.
If your research accidentally causes a violation of this policy, we will work with you to resolve it rather than pursue consequences. If you are unsure whether your planned testing complies with this policy, ask us first.
This is the privacy policy for Vern Labs, Inc. It describes what information we collect, how we use it, how we protect it, and the rights you have over it. We have tried to write it in plain English. If anything is unclear, email privacy@vernlabs.com.
Vern Labs provides security infrastructure for AI systems. We collect the minimum information required to deliver that service, protect it, and operate our business. We do not sell personal information. We do not use customer data to train our own or third-party models.
This policy applies to vernlabs.com, our SaaS products, our open-source projects, and any other properties we operate under the Vern Labs name.
We collect three categories of information:
Name, email, company, role, billing address, and payment method. Provided by you when you sign up or contract with us.
Policy decisions, request timestamps, response sizes, tenant IDs, hashed payload fingerprints. We do not retain raw prompts, outputs, or retrieved documents.
Page views, referrers, and coarse location (country-level). Cookies are limited to session management and preferences; we do not run advertising or cross-site tracking.
We use the information above to:
Account information is retained for the duration of your contract plus 90 days, after which it is deleted or anonymized unless we have a legal obligation to retain it longer.
Product telemetry is retained for 90 days for operational purposes, then aggregated into non-identifying counters. Raw prompt and response content is not retained at all — see § 802 of the trust page for the full data handling breakdown.
Depending on where you live, you may have some or all of the following rights:
To exercise any of these rights, email privacy@vernlabs.com. We will respond within 30 days.
Vern Labs is headquartered in the United States. If you are accessing our services from outside the US, your information may be transferred to and processed in the US. Where required, we rely on Standard Contractual Clauses for transfers out of the EEA and UK.
These are the terms of service for Vern Labs, Inc. They govern use of our products and our website. They are not a substitute for the master services agreement signed with enterprise customers, which controls in case of conflict. If you have questions, email legal@vernlabs.com.
By accessing or using Vern Labs products, you agree to be bound by these Terms. If you are entering into these Terms on behalf of an organization, you represent that you have authority to bind that organization. If you do not agree, do not use the service.
You must provide accurate account information. You are responsible for safeguarding your credentials and for all activity under your account. Notify us immediately of any unauthorized use.
Accounts created for evaluation are subject to the usage limits of the Pilot tier and may be rate-limited or suspended for abuse without prior notice.
You agree not to:
Fees for the Production tier are detailed in your order form or MSA. Enterprise fees are individually negotiated and governed by a separate signed agreement.
Invoices are due net 30. Late amounts accrue interest at 1.5% per month or the maximum allowed by law, whichever is lower. Fees are non-refundable except as required by law or as explicitly stated in your MSA.
Vern Labs retains all rights in the service, including all software, documentation, and research artifacts. Customer retains all rights in customer data and any content submitted through the service.
Open-source components of the service are governed by their respective licenses. Vern Labs' public open-source projects are licensed under MIT unless otherwise stated in the repository.
Vern Labs warrants that it will provide the service in a professional manner, in accordance with published specifications, and in compliance with the SLAs defined in your order form.
EXCEPT AS EXPRESSLY PROVIDED, THE SERVICE IS PROVIDED "AS IS" WITHOUT WARRANTIES OF ANY KIND, WHETHER EXPRESS OR IMPLIED. NO AI SECURITY PRODUCT PROVIDES ABSOLUTE PROTECTION; YOU REMAIN RESPONSIBLE FOR YOUR APPLICATIONS AND THEIR COMPLIANCE.
TO THE MAXIMUM EXTENT PERMITTED BY LAW, NEITHER PARTY WILL BE LIABLE FOR INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL, OR PUNITIVE DAMAGES. AGGREGATE LIABILITY FOR ANY CLAIM ARISING FROM THESE TERMS IS LIMITED TO THE FEES PAID TO VERN LABS IN THE 12 MONTHS PRECEDING THE CLAIM. THIS LIMITATION DOES NOT APPLY TO BREACHES OF CONFIDENTIALITY, INDEMNIFICATION OBLIGATIONS, OR AMOUNTS OWED UNDER THIS AGREEMENT.
These Terms remain in effect while you use the service. Either party may terminate for material breach with 30 days notice if the breach is not cured in that time.
Upon termination, customer data will be returned or deleted within 30 days at customer's request. Accrued obligations, payment terms, confidentiality, and limitations of liability survive termination.
These Terms are governed by the laws of the State of Delaware, without regard to conflict of laws principles. Disputes will be resolved in the state or federal courts located in Delaware.
These Terms, together with any order form or MSA, constitute the entire agreement between the parties regarding the service. For questions, contact legal@vernlabs.com.
We propose a behavioral risk classifier that scores agent actions in real time based on capability context, target resource sensitivity, and historical deviation from an agent's established behavioral envelope. Evaluated across 12 production agent deployments, the model reduced escalations required for safe autonomous execution by 41% while holding false-negative rate below 0.3%.
Production agent deployments face a fundamental control problem: the set of safe actions depends on context, and context changes every turn. Static policies are too coarse to capture this — they either over-permit and create safety gaps, or over-restrict and burn the deployment under human approval overhead.
The current state of the art is either naive allow-lists or LLM-judge-based behavioral review. The first doesn't scale; the second is too slow for inline decisions and has its own trust problems. We wanted a third path.
Our risk model is a lightweight classifier that evaluates three signals per action: capability risk (how sensitive the target resource is), behavioral novelty (how far this action deviates from the agent's 30-day baseline), and contextual coherence (whether the action follows from the stated task).
Each signal produces a normalized score; the composite score routes the action to one of four tracks: execute, log, gate for async approval, or block. The model runs in under 3ms per decision on commodity hardware.
We evaluated across 12 production deployments at partner organizations, representing ~2.4M agent actions over a 90-day window. Compared to baseline (hand-tuned allow-lists plus LLM-judge for novel actions), the risk model achieved a 41% reduction in escalations to human operators with no regression in incident rate.
False-negative rate on held-out red-team probes remained below 0.3% across all deployments. Per-deployment tuning was minimal — most gains came from baseline learning during the first week of operation.
The model depends on having an established behavioral baseline, so cold-start deployments must either run in a higher-approval mode for the first week or import a baseline from a similar deployment. We have not yet evaluated the model's robustness to adversarial behavioral drift — this is ongoing work.
The novelty signal currently has no notion of task decomposition. An agent that routinely edits one file will flag novelty when asked to edit a similar file it hasn't seen before. We're exploring structural equivalence measures to address this.
Oyan S., Raef H., et al. A Runtime Risk Model for Tool-Using Agents.
Vern Labs Technical Report VL-RES-901, February 2026.
https://vernlabs.com/research/vl-res-901
This paper reframes prompt injection as a supply chain attack against language model context, rather than a model-training or fine-tuning problem. We propose provenance tracking for every context token that reaches a production model and show how this reframing suggests a set of defenses that are both more reliable and more auditable than current mitigation strategies.
Most mitigations for prompt injection treat it as a model problem — something to be solved via better instruction-following, better RLHF, or model-level content classifiers. This framing obscures what is actually happening: an attacker is inserting untrusted content into a context window the model will then treat as trusted input.
That is a supply chain problem. The defenses that work for supply chain problems are well-understood and do not require the victim to become a better detector of adversarial text.
We propose that every token reaching a production LLM context should carry a provenance tag — an unforgeable marker of where it came from and what trust level it inherits. Tools, retrieval systems, and user inputs become distinct "origin classes," and policy can then enforce rules on what each class is permitted to cause the model to do.
This flips the defensive posture from "detect injection after the fact" to "structurally prevent mixing of trust levels." The latter is tractable. The former, by current evidence, is not.
We describe an implementation that sits in the retrieval/tool layer and emits provenance-tagged context to a wrapped model. Because the wrapping is done at the protocol layer, it works with any LLM provider and requires no model retraining.
The runtime overhead is negligible. The engineering cost is shifted from the model to the orchestration layer, which is where it should have been all along.
If provenance becomes the primitive, certain common patterns become obviously unsafe: mixing retrieved web content directly into system prompts, allowing tool output to flow unfiltered into the next turn, treating all memory as equally trusted. These are all tractable to fix once you can see them.
We release a reference implementation and invite security teams to experiment with the approach in their own deployments.
Oyan S., Raef H., et al. Prompt Injection as a Supply Chain Problem.
Vern Labs Technical Report VL-RES-902, January 2026.
https://vernlabs.com/research/vl-res-902
A technical specification for scoped capability tokens designed for autonomous agent systems. We describe a Biscuit-based token format with cascading revocation, approval queue integration, and cryptographic binding to a specific agent instance and execution session. Reference implementations in Go, Rust, and Python are released alongside this note under MIT license.
Traditional authorization systems assume the actor is a human or a fixed piece of software. Autonomous agents are neither: they are dynamic processes whose next action depends on reasoning that happens at runtime and is not known at the time the token is issued.
Session tokens, OAuth scopes, and IAM roles all fail in specific ways when applied to agents. We needed an authorization primitive designed for the actual workload.
Scoped capabilities are based on the Biscuit token format with Vern-specific extensions. Each token binds cryptographically to an agent instance, a session, and an explicit set of allowed operations. Tokens can be attenuated — an agent can derive a narrower token to pass to a sub-process without expanding privileges.
Cascading revocation is a first-class operation: pulling a parent token invalidates all derived tokens in flight. This is essential for incident response on autonomous systems where an agent may have spawned dozens of derived capability grants before the incident is detected.
Actions that exceed a scope's permission set route to an approval queue rather than failing hard. Approval can be inline (blocking the agent), async (queued for human review with a timeout), or policy-only (evaluated by a sibling policy service). This gives operators a middle ground between "every action requires human approval" and "the agent has full authority."
Reference implementations are available in the vern/scopes-go, vern/scopes-rs, and vern/scopes-py repositories. We invite framework authors to integrate scoped capabilities directly rather than inventing their own authorization models.
Oyan S., Raef H., et al. Scoped Capabilities for Autonomous Execution.
Vern Labs Technical Report VL-RES-903, December 2025.
https://vernlabs.com/research/vl-res-903
A short argument that the prevailing threat model for LLM-backed applications — derived from traditional web application security — is inadequate for systems where the application itself reasons, retrieves, and acts. This note proposes a minimal extension centered on three new attack surfaces: context poisoning, authority exceedance, and tool misuse chains.
STRIDE — the standard threat-modeling framework for application security — assumes that software does what it is told. LLM-backed systems do what they infer. This single change invalidates several STRIDE assumptions: tampering can happen via prompt content, not just request data; repudiation can happen because the system genuinely cannot remember what it was asked; elevation of privilege can happen through pure text.
Context poisoning — an attacker plants text in a place the model will read (a document, a tool output, a user message) in a form that will alter the model's subsequent behavior without triggering content filters.
Authority exceedance — the model takes an action it was not authorized to take, either because it reasoned its way past an instruction or because its authorization was scoped to a higher-level goal and it chose a narrower but harmful path.
Tool misuse chains — the model composes multiple individually-authorized actions into a sequence that exceeds the intent of any single authorization. This is the agent-security equivalent of privilege chains.
Every one of these surfaces requires runtime inspection of behavior, not just static authorization. This is why we build Intertrace, Ghostline, and Blackbox as separate products — each addresses a distinct surface that traditional AppSec doesn't cover. A complete threat model for an LLM-backed system needs at least these three additions, and probably more.
Oyan S., Raef H., et al. Why LLM security needs a new threat model.
Vern Labs Technical Report VL-RES-904, November 2025.
https://vernlabs.com/research/vl-res-904
Release notes for vern/probes, an open-source benchmark of 140+ adversarial attack vectors against LLM-backed systems. Covers direct and indirect injection, jailbreak, persona attacks, exfiltration, tool misuse, and privilege abuse. Updated weekly. MIT licensed.
vern/probes is a living collection of adversarial prompts and attack scenarios, curated by the Vern Labs research team from our own red-team engagements, customer incidents, and published research. Every probe is versioned, seeded for reproducibility, and tagged with the attack class it represents.
The benchmark is the same one that Blackbox runs against customer systems. We release it publicly because the ecosystem benefits when everyone is testing against the same, regularly updated, attack surface.
Direct injection, indirect injection, jailbreak via persona, PII exfiltration, privilege abuse, tool misuse, context poisoning, authority exceedance, data leakage, and refusal subversion. Each category has 10–20 distinct probes with hand-verified severity ratings.
The repository is at github.com/vern/probes. Clone it, pick your categories, and run against your model or agent. We recommend starting with the subset of probes tagged "high-severity" and expanding from there. Blackbox integrates with vern/probes directly, but the repo is designed to be usable without any Vern Labs product.
We welcome new probes and we curate contributions carefully — every accepted probe is reviewed for reproducibility, severity calibration, and dedup against existing entries. See CONTRIBUTING.md for the submission process.
Oyan S., Raef H., et al. vern/probes — adversarial prompt benchmark set.
Vern Labs Technical Report VL-OSS-905, November 2025.
https://vernlabs.com/research/vl-oss-905
A critical note on the "instruction hierarchy" approach to LLM safety. We argue that treating different prompt sources as having different privilege levels — enforced only at the model layer — cannot be relied on as a security boundary. We show concrete failure modes and propose that instruction hierarchy be retained as a usability feature rather than a security primitive.
A recent line of research proposes that LLMs can be trained to honor an "instruction hierarchy" where system prompts outrank user prompts, which outrank tool output. This is an appealing idea, and for many UX reasons it is worth doing. We are writing this note because some teams are now treating it as a security boundary. It is not one.
Instruction hierarchy is enforced at the model layer — that is, it depends on the model correctly resolving a conflict between two pieces of text it is seeing. A sufficiently motivated attacker who controls any lower-privileged input has many ways to influence this resolution: social engineering, linguistic misdirection, multi-turn manipulation, context saturation. None of these require the attacker to "break" the hierarchy; they exploit its statistical nature.
A real security boundary is enforced by code that executes outside the model. Policy checks, authorization tokens, provenance tags — these are security boundaries. Model-level convention is not.
Retain instruction hierarchy as a UX feature — it does make systems more usable by clarifying which instructions should take priority in the normal case. But do not build policy on top of it. If a decision matters for security, it should be made by a policy engine, not by the model.
Oyan S., Raef H., et al. Instruction hierarchy is not a security boundary.
Vern Labs Technical Report VL-RES-906, October 2025.
https://vernlabs.com/research/vl-res-906
We present a systematic methodology for evaluating the security posture of retrieval-augmented generation (RAG) systems. The paper introduces a new attack taxonomy specific to retrieval pipelines, a benchmark covering 38 distinct attack vectors, and empirical results across 7 widely-deployed RAG patterns. We find that the most common defenses focus on the retrieval layer while leaving the composition layer as the primary failure mode.
Retrieval-augmented generation is now the dominant architecture for enterprise LLM deployments. It is also a significantly larger attack surface than the standalone-LLM deployments most threat models were written against. This paper characterizes that expanded surface and measures real-world defenses against it.
We identify four classes of RAG-specific attacks: retrieval-time poisoning (the attacker plants content in the corpus), indexing-time poisoning (the attacker influences what the retriever considers relevant), composition attacks (the attacker exploits how retrieved content is combined with the user query), and source-confusion attacks (the attacker makes retrieved content appear to be from a more-trusted source than it is).
Each class is exercised by 8–10 distinct probes in our benchmark.
Across the 7 RAG patterns we evaluated, composition attacks had the highest average success rate (58%) despite being the best-understood theoretically. Retrieval-time poisoning was second (34%). The other two classes had lower success rates but produced higher-severity outcomes when they succeeded.
Defenses concentrated at the retrieval layer (e.g., source filtering, corpus cleaning) reduced retrieval-time attacks but did little to address composition attacks. Our data suggests that layered defense at the composition stage is currently the highest-leverage investment for RAG security.
The benchmark is available at github.com/vern/probes under the "rag" category. The methodology paper is released alongside this note. We welcome independent replication and will update findings as the probe set expands.
Oyan S., Raef H., et al. Adversarial evaluation of retrieval-augmented systems.
Vern Labs Technical Report VL-RES-907, September 2025.
https://vernlabs.com/research/vl-res-907
Release of vern/trace-cli, an open-source command-line tool for inspecting LLM API traffic on a developer's local machine. Pipe any HTTP client through trace-cli and see every request, response, and policy evaluation in real time. Useful for debugging, learning, and building intuition about what an LLM actually sees.
Most engineers who work on LLM-backed systems have an imprecise mental model of what their model actually sees. System prompts concatenate in ways that aren't obvious. Tool definitions get rendered into token streams that look different than the source code. Retrieval results show up as text in ways that change how the model weights them.
trace-cli was born from our own debugging needs. Pipe your traffic through it and you see the real thing, rendered with policy decisions overlayed. It made our work significantly faster. We're releasing it so others can benefit.
Install with npm install -g @vern/trace-cli or grab a binary from GitHub releases. Start it listening on a local port, set the OpenAI (or compatible) base URL to that port, and you'll see every request flow through the terminal with structured markup.
It can optionally apply Vern Labs policies if you have a policy file — otherwise it just prints traffic. No Vern account required.
Request and response bodies, token counts per message, tool call structure, streaming chunk boundaries, and — if policies are enabled — the policy decision for each request. Output can be rendered as ANSI-colored text for interactive use, or as JSON for piping into other tools.
Upcoming features: session recording, diff mode for comparing two runs, integration with Intertrace for full policy evaluation. All discussion happens on the GitHub repo.
Oyan S., Raef H., et al. vern/trace-cli — CLI for local LLM traffic inspection.
Vern Labs Technical Report VL-OSS-908, August 2025.
https://vernlabs.com/research/vl-oss-908