Prompt Engineering for Secure Code (Part 7)

Vibe Coding Security Series

  1. What Is Vibe Coding Security? A Field Guide for 2026
  2. The OWASP Top 10 for Vibe-Coded Applications
  3. Anatomy of a Vibe Coding Breach: Lessons from 2026’s Worst Incidents
  4. The Dependency Trap: Supply Chain Risks in AI-Generated Code
  5. Authentication & Secrets: What AI Gets Wrong Every Time
  6. Scanning Vibe-Coded Apps: Why Traditional SAST/DAST Falls Short
  7. Prompt Engineering for Secure Code (you are here)
  8. The Founder’s Security Checklist (coming soon)
  9. Securing the AI Coding Pipeline (coming soon)
  10. The Future of Vibe Coding Security (coming soon)

Read Time: 21 minutes

TL;DR

AI models already know how to write secure code — they identify 78.7% of their own vulnerabilities when asked to review. The problem is they don’t apply that knowledge by default. Five prompting strategies close the gap: role-setting, reverse prompting, threat-model-first prompting, negative constraints, and iterative repair. Targeted security prompts reduce vulnerabilities by up to 56%. This post covers what works, what doesn’t, and how to make security instructions permanent through instruction files.


The Gap Between What AI Knows and What AI Does

Here’s the most important finding in AI code security this year. An April 2026 study formally verified 3,500 code artifacts across seven LLMs using Z3 SMT solver. The results: 55.8% of artifacts contained at least one verified vulnerability. GPT-4o was worst at 62.4% vulnerable. Gemini 2.5 Flash was best at 48.4%. No model scored better than a D.

But the study had a second finding that changes everything. When the researchers asked the same models to review their own output for vulnerabilities, the models correctly identified the problems 78.7% of the time. The model that just wrote a SQL injection could explain why it was dangerous and how to fix it — when asked.

The researchers call this the “generation-review asymmetry.” I call it the gap between what AI knows and what AI does. The model has the security knowledge. It just doesn’t activate it during generation. Default prompts optimize for functionality — “build me a login page” gets you a login page that works. Whether it’s secure is a secondary concern the model doesn’t consider unless you tell it to.

This asymmetry is exactly what prompt engineering exploits. You’re not teaching the model something new. You’re activating knowledge it already has.

The baseline is bad. CodeRabbit’s analysis of 470 real-world pull requests found that AI-generated code has 2.74x higher vulnerability density than human-written code, with 1.4x more critical security issues. Veracode tested over 100 LLMs and found they fail to prevent XSS in 86% of test cases. By mid-2025, Apiiro’s analysis of thousands of repositories showed AI code adding over 10,000 new security findings per month — a 10x increase from six months earlier.

The gap is real. The question is whether prompting can close it.


Why “Write Secure Code” Doesn’t Work

The intuitive approach — adding “make sure the code is secure” to your prompt — doesn’t do much. A 2026 study ran chi-square tests on code generated with and without simple security prefixes and found no statistically significant improvement in several configurations. Worse, a weaknesses-aware Chain-of-Thought approach — where the prompt listed specific vulnerability types to avoid — failed to reduce vulnerabilities in any statistically significant way, and in some configurations the numbers actually went up. The researchers found that overloading the prompt with security concerns primarily shifted which vulnerability types appeared rather than reducing the total count, and can degrade the model’s ability to generate functional code, introducing bugs that create new attack surfaces.

Generic security instructions fail for the same reason generic coding instructions fail. “Write good code” produces the same output as no instruction at all. The model needs specifics: what threats apply to this feature, what patterns to avoid, what security controls to implement, and in what order.

Bruni et al. (February 2025) showed what happens when you get specific. Their benchmarks across GPT-3.5-turbo, GPT-4o, and GPT-4o-mini found that targeted security-focused prompt prefixes — ones that named specific vulnerability classes and described concrete defensive patterns — reduced vulnerabilities by up to 56%. Iterative prompting, where you feed vulnerability findings back to the model and ask it to repair its own output, fixed between 41.9% and 68.7% of issues.

The takeaway: specificity matters more than intent. “Be secure” does nothing. “This endpoint must validate that the authenticated user owns the requested resource before returning data, and must return 403 if ownership verification fails” changes the output.


Five Strategies That Work

These aren’t theoretical. I use variations of all five at VULNEX when working with AI coding tools, and the first two — role-setting and reverse prompting — are the backbone of how I approach every engagement.

Strategy 1: Role-Setting

Before asking an AI to write or review code, I set its role explicitly. Not a vague “you’re helpful” — a specific professional identity that activates domain expertise.

For code generation:

“You are a senior developer with years of experience building secure products. You follow security best practices by default: input validation, parameterized queries, proper authentication and authorization checks, secure secret management, and defense in depth.”

For security review:

“You are a senior pentester and cybersecurity expert. Your job is to find every vulnerability, misconfiguration, and security weakness in this code. Think like an attacker. Report what you find with severity ratings and remediation guidance.”

The key is one role per task. When building, the model thinks like a security-conscious developer. When reviewing, it thinks like an attacker. Mixing the two dilutes both. A developer worrying about attacks while writing code produces defensive but brittle implementations. An attacker reviewing code while thinking about functionality misses vulnerabilities that conflict with feature requirements.

Role-setting works because LLMs adjust their output distribution based on the persona they’re given. A “senior pentester” prompt activates patterns the model learned from security research, vulnerability reports, and penetration testing documentation. A “junior developer” prompt — or no role at all — activates patterns from Stack Overflow answers and tutorial code, which is where most insecure defaults come from.

Strategy 2: Reverse Prompting

Most people use AI coding tools in one direction: “Build me X.” Reverse prompting flips it. Instead of telling the model what to build, you ask it questions — and you do it in both directions.

Before writing code, I interrogate the model about the problem space:

“I need to build a multi-tenant API where users can only access their own data. Before writing any code: what are the top security risks for this kind of system? What authentication and authorization model should I use? What are the common mistakes developers make with multi-tenant data isolation?”

The model’s answers are often excellent — remember, it identifies 78.7% of vulnerabilities in review mode. By asking it to think about threats before generating code, you front-load that security knowledge into the generation context. The code it writes afterward is informed by the threat analysis it just produced.

After generating code, I question the output:

“Review the code you just wrote. What vulnerabilities does it have? How would an attacker bypass the authentication? What edge cases could lead to data leakage? What’s missing from this implementation that a production system would need?”

This exploits the generation-review asymmetry directly. The model generated code with some security blind spots. Now you’re asking it to activate review mode on its own output. It will flag issues it just introduced — not all of them, but a substantial percentage.

The two-direction approach creates a feedback loop. Pre-code questions shape the model’s understanding of what matters. Post-code questions catch what slipped through. Together, they narrow the gap between what the model knows and what it produces.

Strategy 3: Threat-Model-First Prompting

This builds on reverse prompting but makes the threat model explicit in the code request itself. Instead of asking the model to generate a feature and hoping it considers security, you describe the threat landscape as part of the prompt.

Without threat context:

“Build a REST API endpoint that lets users update their profile information.”

With threat context:

“Build a REST API endpoint that lets users update their profile information. This is a multi-tenant SaaS application. Assume attackers will attempt: IDOR (accessing other users’ profiles by changing the user ID), privilege escalation (modifying role or permission fields), mass assignment (sending fields the API shouldn’t accept like isAdmin), and injection through profile fields displayed to other users. The endpoint must validate ownership, whitelist allowed fields, sanitize all input, and log modification attempts.”

The same model, the same task — but the second prompt produces code with authorization checks, field whitelisting, input sanitization, and audit logging that the first prompt almost certainly omits. The model didn’t learn anything new between the two prompts. The threat context activated security patterns it already had.

For the vulnerability classes I covered throughout this series — the missing auth checks from Part 5, the architectural blind spots from Part 6 — threat-model-first prompting is the most direct prevention. You’re telling the model exactly what can go wrong before it writes a single line.

Strategy 4: Negative Constraint Prompting

AI models follow prohibitions more consistently than open-ended guidance. “Be secure” is vague. “Do NOT do these specific things” is concrete and verifiable.

“Build the authentication system for this Express.js application. Constraints:

  • Do NOT store tokens in localStorage (use httpOnly cookies)
  • Do NOT use MD5 or SHA-1 for password hashing (use bcrypt with cost factor 12+)
  • Do NOT skip server-side input validation even if client-side validation exists
  • Do NOT hardcode API keys, database credentials, or secrets anywhere in the code
  • Do NOT set CORS to allow all origins
  • Do NOT disable Supabase RLS or Firebase security rules
  • Do NOT create JWT tokens without an expiration time”

This works because constraints are binary — the model either followed them or it didn’t. You can verify compliance mechanically. And the constraints directly target the patterns I’ve documented across this series: the localStorage tokens from Part 5, the missing RLS from the QuickNote example, the hardcoded secrets that SAST can’t always catch.

Build your constraint list from your own vulnerability history. Every security issue you’ve found in AI-generated code becomes a “Do NOT” for future prompts. Over time, your constraint list becomes a negative-space security policy — the inverse image of every mistake the AI has made.

Strategy 5: Iterative Repair Prompting

This is the only strategy with direct benchmarks. Bruni et al. tested generating code, scanning it, feeding the scan results back to the model, and asking for repairs. The best configurations repaired between 41.9% and 68.7% of vulnerabilities.

The practical workflow:

  1. Generate code with your chosen AI tool
  2. Run Semgrep: semgrep --config=p/security-audit --json ./src > findings.json
  3. Feed the findings back: “Here are the Semgrep security findings for the code you just wrote. Fix each issue. For each fix, explain what the vulnerability was and why your fix resolves it.”
  4. Run Semgrep again on the output
  5. Repeat until clean or diminishing returns

Combining this with role-setting amplifies the effect. Instead of “fix these findings,” try: “You are a senior security engineer. Here are the Semgrep findings from a code review. For each finding, determine if it’s a true positive or false positive. For true positives, provide the fix. For false positives, explain why the alert is incorrect.”

The false positive distinction matters. As I covered in Part 6, SAST tools flag 68–75% of safe code as vulnerable. Having the model filter the noise before acting on it produces better repairs than blindly fixing every alert.


Making It Permanent: Instruction Files

The five strategies above work in conversation. But nobody re-types a threat model and constraint list for every prompt. The practical answer is instruction files — permanent security prompts that apply to every interaction with your AI coding tool.

Claude Code

Claude Code supports a security guidance plugin that reviews code at three levels: per-edit pattern matching (no model call, zero cost), end-of-turn diff review, and a deeper agentic review on each commit. You configure it through a .claude/claude-security-guidance.md file that describes your threat model in plain language. The plugin catches injection, unsafe deserialization, and DOM vulnerabilities before they reach a pull request — the reviewer runs as a separate model call with a fresh context, so it’s not grading its own work.

Beyond the plugin, Claude Code reads project-level instructions from CLAUDE.md files. You can embed your role-setting, constraints, and threat model directly:

# Security Requirements

You are a senior developer building a multi-tenant SaaS application.
Every API endpoint MUST:
- Verify authentication (valid JWT with expiration check)
- Verify authorization (user owns the requested resource)
- Validate and sanitize all input
- Return 403 for unauthorized access, not 404
- Log access attempts for security-sensitive operations

Do NOT:
- Store secrets in environment variables baked into Docker images
- Use localStorage for authentication tokens
- Disable RLS on any Supabase table
- Create endpoints without rate limiting

GitHub Copilot

Copilot reads from copilot-instructions.md in the .github directory, with support for path-scoped *.instructions.md files. The community has built OWASP-aligned rulesets with 55+ anti-patterns and “Do Not Suggest” blocklists covering eval(), inline SQL, insecure deserialization, and more. The github/awesome-copilot repository has a ready-to-use template.

Cross-Tool Security Rules

SecureCodeWarrior publishes open-source security rule files compatible with Copilot, Cursor, Windsurf, and other AI assistants. Robotti.io maintains customizable rulesets for Java, Node.js, C#, and Python that block risky patterns at the IDE level. Trail of Bits published Claude Code skills for security workflows including CodeQL and SARIF integration.

The practical step: pick the instruction file format for your primary AI coding tool, start with one of the open-source security rulesets, and customize it with your own constraints. Every “Do NOT” from Strategy 4 belongs in this file. Every lesson from a security review becomes a permanent instruction.


The Attack Surface You Just Created

Instruction files are powerful, which makes them a target. If someone can modify your instruction file, they control what the AI generates for your entire project.

The Rules File Backdoor attack (CVE-2025-53773), disclosed by Pillar Security in March 2025, demonstrated exactly this. Researchers embedded hidden Unicode characters — bidirectional text markers and zero-width joiners — inside Copilot and Cursor configuration files. These invisible characters contained instructions that manipulated the AI’s code generation: injecting backdoors, disabling security checks, exfiltrating data through generated code. The configuration file looked clean to human reviewers. The AI read the hidden instructions and followed them.

Trail of Bits demonstrated prompt injection attacks achieving remote code execution in three agent platforms. VentureBeat reported in 2026 that three AI coding agents leaked secrets through a single prompt injection. The attack surface isn’t theoretical.

The defense is straightforward: treat instruction files like code. Review them in pull requests. Audit them for hidden characters (cat -v shows control characters, file shows unusual encodings). Pin them under version control. Don’t accept instruction files from untrusted sources — a shared project template with a poisoned .github/copilot-instructions.md is the software supply chain attack adapted for the AI era.


Putting It Together: A Complete Workflow

The five strategies aren’t five separate techniques — they’re stages in a pipeline. Here’s how I approach it at VULNEX when building or reviewing AI-generated code.

Step 1: Set the role. Before anything else, establish the LLM’s identity. For building: senior developer with security expertise. For reviewing: senior pentester.

Step 2: Reverse-prompt the problem. Before writing code, ask the model about the security landscape. “What are the top risks for this feature?” “What authentication model fits this use case?” “What mistakes do developers typically make here?” Use the answers to inform your code request.

Visualizing the threat model. You can take Step 2 further by asking the model to produce a formal threat model you can render as a diagram. At VULNEX we built usecvislib, an open-source security visualization library that generates STRIDE threat models, attack trees, and other security diagrams from TOML configuration files. The prompt becomes:

“Based on the security risks you identified, generate a STRIDE threat model for this application in usecvislib TOML format. Include externals, processes, datastores, dataflows, trust boundaries, and threats with CVSS 3.1 vectors.”

The model produces something like this (trimmed for brevity):

[model]
name = "QuickNote Threat Model"
description = "STRIDE threat model for note-taking SaaS"
type = "Threat Model"

[externals.user]
label = "User"
description = "Authenticated app user"

[externals.attacker]
label = "Attacker"
description = "Unauthenticated malicious actor"

[processes.api_server]
label = "API Server"
description = "Express.js REST API"

[processes.auth_service]
label = "Auth Service"
description = "Supabase Auth"

[datastores.postgres]
label = "PostgreSQL"
description = "Supabase DB with RLS policies"

[dataflows.login]
from = "user"
to = "api_server"
label = "Login Request"

[dataflows.note_query]
from = "api_server"
to = "postgres"
label = "Note Query"

[boundaries.internet]
label = "Internet"
elements = ["user", "attacker"]

[boundaries.backend]
label = "Backend Services"
elements = ["auth_service", "postgres"]

[threats.brute_force]
element = "api_server"
threat = "No rate limiting on /api/login enables brute force"
mitigation = "Rate limit to 5 attempts/minute per IP"
cvss_vector = "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N"

[threats.idor_notes]
element = "note_query"
threat = "User modifies note ID to access other users' data"
mitigation = "Verify resource ownership before returning data"
cvss_vector = "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:N"

[threats.token_theft]
element = "login"
threat = "localStorage token accessible to injected scripts"
mitigation = "Store tokens in httpOnly secure cookies"
cvss_vector = "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:H/I:N/A:N"

[threats.disabled_rls]
element = "postgres"
threat = "RLS policies disabled, no row-level access control"
mitigation = "Enable RLS, test policies with different tenant contexts"
cvss_vector = "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H"

Then render it: usecvis -m 1 -i quicknote_threat.toml -o quicknote_threats -f png -r. You get a data flow diagram with trust boundaries, CVSS-scored threats, and color-coded severity — a visual artifact that makes security risks concrete for the whole team:

quicknote_threat_model

The -r flag also generates a written threat report. The threats the model identified in this diagram become the exact constraints you feed into the next step.

Step 3: Write the prompt with threat context and constraints. Combine threat-model-first prompting with negative constraints. Describe what you’re building, what threats apply, and what the code must not do.

Step 4: Reverse-prompt the output. After the model generates code, switch to review mode. “What vulnerabilities does this have?” “How would you bypass this auth check?” “What’s missing?” Feed the model’s own critique back into the next iteration.

Step 5: Run automated scans and iterate. Semgrep, npm audit, the pipeline from Part 6. Feed findings back to the model with a security engineer role. Repair, re-scan, repeat.

Step 6: Encode lessons as permanent instructions. Every vulnerability you find — through reverse prompting, automated scanning, or manual review — becomes a constraint in your instruction file. The instruction file grows with every project, capturing your team’s security knowledge in a form the AI applies automatically.

To make this concrete, here’s a before/after using the login endpoint from QuickNote (Part 5).

Naive prompt:

“Build a login endpoint for my Express.js app with Supabase.”

This is what produced the QuickNote vulnerabilities: no rate limiting, no token expiration, credentials in environment variables baked into the Docker image, RLS disabled. Here’s a representative output:

// Naive prompt output — typical AI-generated login
app.post('/api/login', async (req, res) => {
  const { email, password } = req.body;
  const { data, error } = await supabase.auth.signInWithPassword({
    email, password
  });
  if (error) return res.status(401).json({ error: 'Invalid credentials' });
  res.json({ token: data.session.access_token, user: data.user });
});

No rate limiting — an attacker can try thousands of passwords per minute. The token goes straight to the response body, where the frontend stores it in localStorage (accessible to any XSS). No input validation. No token expiration check. The Supabase credentials are baked into the Docker image at build time. This is what the model generates by default.

Security-aware prompt (using the pipeline):

“You are a senior developer with years of experience building secure authentication systems. Build a login endpoint for an Express.js app using Supabase. This endpoint will be publicly accessible on the internet. Assume attackers will attempt brute-force attacks, credential stuffing, and token theft. The endpoint MUST: rate-limit login attempts (max 5 per minute per IP), use bcrypt for password verification, return JWT tokens with a 1-hour expiration set via expiresIn, store tokens in httpOnly secure cookies (not localStorage), validate all input server-side. Do NOT hardcode any secrets — use environment variables loaded at runtime only. Do NOT disable Supabase RLS on any table. After generating the code, list any remaining security concerns I should address.”

Same model, same task — different output:

// Security-aware prompt output — same model, same task
const loginLimiter = rateLimit({
  windowMs: 60 * 1000, max: 5,
  message: { error: 'Too many login attempts, try again later' }
});

app.post('/api/login', loginLimiter, async (req, res) => {
  const { email, password } = req.body;

  // Server-side input validation
  if (!email || typeof email !== 'string' || !email.includes('@')) {
    return res.status(400).json({ error: 'Valid email required' });
  }
  if (!password || typeof password !== 'string' || password.length < 8) {
    return res.status(400).json({ error: 'Invalid password format' });
  }

  const { data, error } = await supabase.auth.signInWithPassword({
    email: email.trim().toLowerCase(),
    password
  });
  if (error) {
    logger.warn('Failed login attempt', { email, ip: req.ip });
    return res.status(401).json({ error: 'Invalid credentials' });
  }

  // Token in httpOnly cookie, not response body
  res.cookie('session', data.session.access_token, {
    httpOnly: true, secure: true, sameSite: 'strict',
    maxAge: 3600000 // 1 hour
  });
  res.json({ user: { id: data.user.id, email: data.user.email } });
});

Rate limiting. Input validation. Token in an httpOnly cookie, not the response body. Failed attempts logged. Email normalized. The model didn’t learn anything new between the two prompts — the security-aware prompt activated what it already knew.


The Prompt Engineering Checklist

  1. Set a specific professional role before every code generation or review task — “senior developer” for building, “senior pentester” for reviewing
  2. Reverse-prompt before coding: ask the model to identify security risks, recommend auth models, and flag common mistakes for your specific feature
  3. Include threat context in every code request: name the threats (IDOR, XSS, injection, brute force) and specify the attack surface (public API, multi-tenant, handles payments)
  4. Add negative constraints for your stack’s known pitfalls: “Do NOT use localStorage for tokens,” “Do NOT disable RLS,” “Do NOT skip server-side validation”
  5. Reverse-prompt after code generation: ask the model to review its own output as a pentester and list what’s missing or vulnerable
  6. Run Semgrep and feed findings back with a security engineer role — don’t just say “fix these,” ask it to distinguish true positives from false positives
  7. Create an instruction file (.claude/claude-security-guidance.md, .github/copilot-instructions.md, or equivalent) with your permanent security constraints
  8. Start with an open-source security ruleset (SecureCodeWarrior, Robotti.io, Trail of Bits skills) and customize it
  9. Audit instruction files for hidden characters and treat them as security-critical code in version control
  10. Add every vulnerability you discover to your constraint list — your instruction file should grow with every project and every security review

If You Do Nothing Else

Ten checklist items and a six-step pipeline can feel like a lot when you’re a solo founder shipping a feature at midnight. Here’s the minimum: set a role and add three constraints.

“You are a senior developer building a secure web application. Build [your feature]. Do NOT store tokens in localStorage. Do NOT skip server-side input validation. Do NOT hardcode secrets.”

That’s it. One sentence of role-setting plus three “Do NOT” constraints tailored to your stack. It takes ten seconds to type and covers the vulnerabilities I see most often in vibe-coded apps. Add the reverse-prompt step when you have time — ask the model to review its own output as a pentester. Those two moves alone close a surprising amount of the gap.

On prompt length: there’s a point of diminishing returns. The Kharma study showed that overloading a prompt with security concerns can degrade functional code quality — the model tries to satisfy too many constraints at once and introduces logic bugs. In practice, I keep security prompts under a paragraph for individual code requests. If you need more than five or six constraints, that’s a sign to move them into an instruction file where they apply automatically rather than cramming them into every prompt.


What You Should Take From This

Prompt engineering for security isn’t about tricking the model into being careful. It’s about activating knowledge the model already has. The generation-review asymmetry — 55.8% vulnerable output, 78.7% detection in review — tells us the security knowledge is there. The default prompt just doesn’t ask for it.

The five strategies in this post close that gap from different angles. Role-setting activates domain expertise. Reverse prompting forces the model to think about threats before and after generation. Threat-model-first prompting gives the model the context it needs to make secure architectural decisions. Negative constraints prevent the specific mistakes you’ve seen before. Iterative repair catches what slipped through.

None of this replaces the manual review I described in Part 6. A well-prompted model still misses roughly 20% of its own vulnerabilities in review mode, and architectural issues like broken authorization logic require human judgment. But a well-prompted model produces code that’s measurably safer — up to 56% fewer vulnerabilities — and that narrows the gap the manual review needs to cover.

My workflow at VULNEX: role first, questions second, code with constraints third, review fourth, scan fifth, and encode everything I learn into instruction files that make the next project start from a stronger baseline. The instruction file is the compound interest of security knowledge — every engagement makes the next one more secure by default.

As always: trust nothing, verify everything.


Further Reading


References

Posted in AI, Pentest, Security, Technology | Tagged , , , , | Leave a comment

Information Warfare Strategies (SRF-IWS): Offensive Operations Against a Papal Visit — Pope Leo XIV in Madrid 2026

Disclaimer: Everything described here is pure imagination and any resemblance to reality is coincidental. This document is intended for security professionals to develop defensive countermeasures. The author is not responsible for the consequences of any action taken based on the information provided in this article. I keep every scenario at the threat-vector level: no operational detail, no tactics, no weapons information, and each one is paired with a defensive recommendation.

Note: As with the rest of the SRF-IWS series, I leaned on several AI models to help build realistic, defense-oriented attack scenarios. The goal is Blue Team planning, nothing else.

A note on the series. This article belongs to SRF-IWS, but it is not a continuation of the Davos articles. Those (Davos 2024, 2025 and 2026) are their own line of analysis on the World Economic Forum; this one stands on its own and simply shares the same framework. I do reference them throughout for context, so they are worth reading as background. The difference this time is the protectee: instead of a corporate forum, we are looking at a head of faith and head of the Vatican state, out in the open, in the middle of a European capital, surrounded by more than a million people.


Introduction

From 6 to 9 June 2026, Pope Leo XIV, the first North American pontiff, will be in Madrid as the opening leg of his apostolic journey to Spain (Madrid, Barcelona and the Canary Islands, 6 to 12 June). It is the first papal visit to the Spanish capital in fifteen years, since Benedict XVI and World Youth Day back in 2011. The Madrid program is dense, and from a protective-intelligence point of view it is wide open:

  • Arrival on 6 June, with a courtesy visit to King Felipe VI, Queen Letizia and the Royal Family.
  • A youth prayer vigil at Plaza de Lima, on the Paseo de la Castellana, that same evening.
  • On Sunday 7 June, the solemnity of Corpus Christi, an open-air Mass at Plaza de Cibeles followed by a Eucharistic procession through the centre of Madrid.
  • On Monday 8 June, an address to Parliament at the Congress of Deputies, and later an encounter with the diocesan community at the Santiago Bernabéu.
  • Popemobile and motorcade movements concentrated on the Castellana–Cibeles–Lima axis and the fixed nodes: Barajas, the Royal Palace, Congress and the Bernabéu.

The address to Parliament deserves its own line, because it is genuinely historic. For the first time ever, a Pope will speak before a joint session of the Cortes Generales, deputies and senators together. John Paul II came to Spain five times and Benedict XVI three, and none of them ever addressed the chamber. That is the kind of high-symbolism, high-protocol moment an adversary loves.

Spanish and municipal authorities have put together a security and mobility operation without precedent in the city, with attendance across the main events projected at up to 1.8 million people. The chosen motto, “Alzad la mirada” / “Lift up your eyes” (John 4:35), and Leo XIV’s emphasis on migration, the journey ends in the Canary Islands, Spain’s main Atlantic entry point for migrants, turn this into more than a physical-security problem. It is a near-perfect information-warfare target: globally televised, built around a polarising subject, with a protectee whose every sentence carries geopolitical weight.

A Pope is not a Davos delegate, and the threat aperture is much wider. You have religiously motivated extremists, both jihadist and anti-Catholic; traditionalist and sedevacantist fringe actors; anti-clerical and anarchist currents; anti-migration extremists reacting to the Pope’s message; grievance-driven lone actors; and nation-state information operations looking to weaponise the spectacle. None of this is hypothetical. Pontiffs have always been targets. John Paul II was shot in St. Peter’s Square in 1981. He was attacked again in 1982 at Fátima, with a bayonet, by a Spanish priest, Juan María Fernández y Krohn. The 1995 Bojinka plot in Manila included a plan to assassinate him. These are documented facts, and they are reason enough to plan seriously.

What follows are realistic, defense-oriented scenarios across the information, cyber, RF, drone, crowd and physical domains. Each one pairs the attack with its own defense, in the same section.


1. Disinformation and the migration narrative

The most likely and most damaging vector here is not a bomb or a rifle. It is information. Leo XIV’s visit is framed around migration and lands in the middle of an active Spanish immigration debate, which is exactly the kind of ground influence operations like to work on, whether they come from a state actor trying to inflame Spanish and EU fault lines or from domestic extremists on either end.

The campaign I would expect looks something like this. Fabricated papal “quotes”, AI-generated text, images and short clips that put inflammatory positions in the Pope’s mouth on immigration, the Spanish government, Catalonia or the monarchy, dropped a few hours before a key event to own the news cycle. Doctored homily fragments, audio or video from the Cibeles Mass or the Parliament speech, selectively cut or fully faked to manufacture outrage in either direction and pull people toward the venues to confront each other. Forged “leaks”, fake Vatican or Moncloa documents alleging secret political deals tied to the visit, designed to make both Church and state look like they are hiding something. Astroturfed outrage from inauthentic networks pushing divisive hashtags, fake eyewitness accounts and false reports of incidents to either scare people away or provoke a confrontation. And the simplest one, spoofed accounts and look-alike domains copying the official registration and information sites to hand out fake schedules, fake “cancellations” or malicious links.

01-disinformation

Figure 1 — Disinformation and migration-narrative attack tree, generated with USecVisLib.

Defense

This has to be treated as a primary security function, not a press afterthought. That means a joint Vatican–Spanish communications cell with the authority to rebut fast, official audio and video signed at the source (C2PA-style provenance), an active pipeline to monitor and take down look-alike domains, and one verified channel the public knows to trust. If there is a single authoritative source, most of the forgeries lose their oxygen.


2. Deepfakes and synthetic media

I covered this at length in the Davos 2026 analysis, and nothing about it has gotten easier to defend against. Real-time deepfakes are mature, voice cloning needs only a few seconds of audio, and people only spot a good video deepfake a fraction of the time. A globally broadcast Pope, with an enormous public archive of audio and video, is about as good a training subject as exists. So is the King, and so are the senior organisers.

The scenarios that worry me are the ones that spoof authority. A faked “official” evacuation announcement, or a “device found” warning, pushed onto a compromised PA system, hijacked digital signage or a spoofed alert channel at Cibeles, Lima or the Bernabéu, with the aim of triggering a panic (see section 3). Voice-cloned traffic impersonating an incident commander or a Vatican advance team to redirect units, shift motorcade timing or open a gap. Synthetic “private” recordings of the Pope and the King, or the Pope and government officials, inventing commitments or insults that were never said, released to poison the diplomacy of the visit. Or fabricated “behind the scenes” footage timed to step on the Parliament address.

02-deepfake

Figure 2 — Deepfake and synthetic-media attack tree (USecVisLib).

Defense

The defensive answer is old-fashioned and it works: out-of-band verification and challenge/response for all command, advance-team and protocol communications. No unit acts on a voice or a face alone. On top of that, run deepfake detection on the monitored broadcast feeds, lock down PA, signage and alerting as critical infrastructure with real authentication, and pre-script the crowd messaging so that anything the public hears comes only through verified, redundant channels.


3. The crowd as the weapon

With up to 1.8 million people spread across Cibeles, Lima, the procession route and the Bernabéu, the highest-probability mass-casualty outcome needs no weapon at all. You only have to engineer panic in a dense crowd. This is the most underappreciated vector on the list, and it is not theoretical, the history is long and grim: Hillsborough, the Love Parade in 2010, the 2015 Mina crush during the Hajj, Itaewon in 2022, Astroworld in 2021.

How would you do it. Start a synchronised false alarm, a rumour of gunfire, a “bomb”, a fire, spread by SMS and social media, a single staged loud bang, or hijacked signage, and place it at a bottleneck where density is already critical: the narrow approaches to Cibeles or Lima, a stadium concourse. Pair it with comms denial, jam or saturate cellular and Wi-Fi so the crowd cannot orient itself and official messaging cannot get through, and let rumour fill the gap (this ties into section 7). Add flow manipulation, block or falsely sign the exits, and a controllable density turns into a progressive collapse. And if you want to overwhelm the response, initiate at several separated points at once so stewarding and emergency services fragment.

03-crowd

Figure 3 — Engineered-panic and crowd-crush attack tree (USecVisLib).

Defense

Defending it comes down to seeing density in real time and being able to act on it. Overhead optical and thermal monitoring plus anonymised mobile-density analytics, with hard thresholds and pre-planned metering and reversible flow control. A public-address system that resists jamming. Stewards rehearsed to kill rumours on the spot. Egress that is engineered, clearly marked and over-provisioned. And one unified incident-command picture, so a small local event never gets the chance to cascade.


4. Drones and counter-UAS

Open venues like Cibeles, Lima, the procession route and the open bowl of the Bernabéu are exactly the places small drones exploit. The cost problem I described in the Davos 2026 analysis still holds: the drones are cheap, the defenses are expensive, and a swarm can simply saturate point defenses.

The uses are familiar. Surveillance and targeting, small quadcopters mapping security positions, motorcade timing and VIP locations in real time. Panic-payload delivery, a drone dispersing smoke, an irritant or pyrotechnics over a dense crowd, where the point is panic and a crush rather than direct casualties. Swarm saturation and decoys, expendable drones soaking up the counter-UAS effort while a primary platform finishes its job, or FPV drones using the urban canyons for a low, fast approach. And RF payloads, airborne jammers or IMSI-catchers degrading comms and collecting intelligence over the crowd.

04-drone-uas

Figure 4 — Drone and counter-UAS attack tree (USecVisLib).

Defense

The defense has to be layered and multi-modal, radar plus RF plus acoustic plus electro-optical/infrared, so no single trick blinds it. Enforce the no-fly and temporary flight restriction zones with the legal authority to actually do something about a violation. Pre-position effectors on the likely approach lines. And, this matters more here than at Davos, choose mitigation that does not itself hurt or panic a 1.8 million-person crowd. Detection, RF takeover and geofencing, and controlled interception come well before anything kinetic over people’s heads.


5. The motorcade and the Popemobile

Movements concentrate on a predictable axis, Castellana–Cibeles–Lima, and on fixed arrival and departure nodes: Barajas, the Royal Palace, Congress, the Bernabéu. Predictability plus a slow, open, rope-line Popemobile is the classic protective dilemma, and there is no clever way around it.

The exploitation paths are well understood. Choke-point operations, surveillance picks a fixed slow point for a hostile act, a staged disturbance or comms denial. GPS spoofing or jamming of the escort vehicles to fragment the motorcade or misdirect support and medical units; Iran’s capture of a U.S. RQ-170 drone is the textbook precedent for spoofing GNSS on even an advanced platform. Vehicle-as-weapon, the most-rehearsed European threat since Nice and Berlin in 2016, a hostile vehicle driven into a pedestrian-dense stretch of the route. And plain old hostile reconnaissance of static posts and timings beforehand.

05-motorcade

Figure 5 — Motorcade and Popemobile attack tree (USecVisLib).

Defense

Defending the move means randomising route and timing wherever the program allows it, putting hostile-vehicle mitigation, barriers, sterile zones, controlled crossings, along the entire crowd-facing axis, and giving the escort vehicles anti-spoof, multi-constellation GNSS with inertial backup. Add aggressive counter-surveillance, dominate the rooftops and elevated positions with friendly observation and counter-sniper coverage, and configure the Popemobile to balance pastoral visibility against protection. It will always be a compromise; it should at least be a deliberate one.


6. Cyber attacks on the event and the city

The visit runs on a lot of software. A mass public registration system holding the personal data of potentially millions, accreditation and badging, ticketing, CCTV and access control, Madrid’s traffic and mobility management, emergency dispatch. As the GTG-1002 case from the Davos 2026 analysis showed, AI agents can map and exploit an ecosystem like this at machine speed, finding paths a human would miss.

The obvious moves: breach the registration system and weaponise the data, exfiltrate attendee records for targeting, doxxing or spear-phishing, or corrupt the access lists to create chaos at the gates. Forge credentials by compromising the accreditation pipeline, and manufacture insider access in a press, volunteer or contractor role. Blind the surveillance, manipulate CCTV and access control to open timed blind spots. Hit the city systems, traffic management and signage during motorcade windows, or emergency dispatch during an incident, which is how a cyber event becomes a physical-safety event. And the simplest, DDoS or deface the official information channels at the moment public attention peaks, which loops straight back to section 1.

06-cyber

Figure 6 — Cyber attacks on event and city systems, attack tree (USecVisLib).

Defense

The defense is unglamorous and necessary: red-team every event and city system in scope before the visit, segment the life-safety and access-control systems so they are not reachable from everything else, run Zero Standing Privilege and Just-in-Time access so a stolen credential buys very little, put integrity monitoring on the accreditation and access lists, and make sure every life-safety function has a tested manual fallback for the day the software lies to you.


7. RF and the spectrum

This is my home ground and it is a high-impact one. In Spain, the state security forces, Policía Nacional and Guardia Civil, run on SIRDEE, the encrypted, nationwide TETRAPOL trunked network. (A point worth getting right: SIRDEE is TETRAPOL, not TETRA. TETRA is a different standard used by various regional and municipal services. People conflate the two constantly.) Whatever the technology, the whole event depends on resilient spectrum.

The attacks. Jam SIRDEE, the event-coordination radios and the cellular bands at a critical moment, which degrades command, amplifies crowd confusion (section 3) and isolates posts. Spoof GPS/GNSS to corrupt timing, geofencing, counter-UAS tracking and motorcade navigation (section 5). Deploy IMSI-catchers or rogue cells to track and intercept VIPs and the crowd. Stand up rogue access points near venues and command areas to capture traffic and pivot, including the “harvest now, decrypt later” collection I described in the Davos 2026 analysis.

07-rf

Figure 7 — RF and wireless-warfare attack tree (USecVisLib).

Defense

Defending the spectrum means watching it. Continuous monitoring and direction-finding across the operational area to catch jammers, spoofers and IMSI-catchers as they appear. Encrypted, frequency-hopping, jam-resistant primary comms, with a non-RF fallback, runners and hardwired nodes, for when the band goes dark. GNSS integrity monitoring with backup positioning. And basic RF hygiene, nothing sensitive over a channel that can be compromised.


8. Insiders and the supply chain

A visit like this mobilises a huge, hastily assembled workforce. The official choir alone, the Gran Coro de Voces Católicas, has more than 1,700 volunteers, and that is before you count stewards, contractors, catering, AV, transport and security vendors across every venue. The weakest-link problem scales with that footprint.

What I would watch for: a volunteer or contractor infiltrated where mass onboarding outruns vetting. A pre-compromise of the AV and technical kit at the Congress chamber, the Royal Palace or the Bernabéu, an implanted listening or recording device, or a manipulated production system feeding the disinformation and deepfake plays from sections 1 and 2. Logistics access, catering, cleaning and equipment vendors as a way into sterile areas. And the transport providers, where driver credentials and vehicle-tracking data quietly reveal protected movements.

08-insider-supplychain

Figure 8 — Insider-threat and supply-chain attack tree (USecVisLib).

Defense

The countermeasures are proportionality and discipline. Vet to the level of access, with the deepest screening for the technical, AV, transport and sterile-area roles. Least-privilege physical access with audited escorting. TSCM sweeps of every speaking venue before use, and keep the zone sterile afterward. And put real security requirements on vendors, with continuous monitoring and a backup for anything essential.


9. Physical and CBRN, at the protective-doctrine level

I will keep this at the level a protective detail actually plans against, and ground it again in the record: 1981 in St. Peter’s Square, 1982 at Fátima, the 1995 Bojinka plot.

The vectors to plan for are the close approach by a lone actor at a rope line, the procession or the Popemobile route, an edged or thrown-object threat from inside a permitted crowd; an elevated firing position along the Castellana axis or around the open plazas, which is what sightline management and counter-sniper overwatch exist for; a low-grade chemical or irritant dispersal in the crowd whose real effect is panic and a crush (sections 3 and 4) rather than mass toxicity; and an improvised or vehicle-borne explosive at a venue perimeter or along the route.

09-physical-cbrn

Figure 9 — Physical and CBRN-in-crowd attack tree (USecVisLib).

Defense

Against all of that: screened sterile zones with search and magnetometers at controlled entry, counter-sniper and elevated-position domination with the structures surveyed in advance, hostile-vehicle mitigation on every crowd-facing route, CBRN detection and decontamination staged for a mass-casualty contingency, a saturating uniformed and plainclothes presence at the rope lines, and pre-positioned, redundant medical capacity matched to the density map.


10. The convergence scenario

If I have one thesis across this whole series, it is that the defining threat is not any single vector. It is the deliberate sequencing of several of them, fast. Applied to this visit, it reads like this. In the days before, a disinformation campaign (section 1) polarises the public and seeds counter-mobilisation near the venues. At the chosen moment, coordinated cyber (section 6) and RF (section 7) actions degrade CCTV, comms and situational awareness. A drone payload or a staged report (sections 3 and 4) starts a panic at a critical bottleneck. A deepfaked “official” evacuation order (section 2), pushed through compromised signage or PA, turns that panic into a crush. And in the chaos, a primary objective is pursued while a pre-staged false narrative (section 1) claims and frames the event for the world before the authorities can get a word out.

10-convergence-graph

Figure 10 — Convergence scenario as an attack graph: prime, blind, trigger, amplify, exploit, with CVSS-scored vulnerabilities along the chain (USecVisLib).

Defense

No single countermeasure stops that. The only thing that does is an integrated, fast, multi-domain defense built on one shared picture of what is happening: a single fused common operating picture across Casa Real security, Policía Nacional, Guardia Civil, Madrid municipal police, the Vatican Gendarmerie and advance team, and the intelligence services, correlated fast enough to matter. Every per-section defense above feeds into that one picture, because the convergence attack is precisely the one a fragmented, human-speed defense cannot answer.


Conclusion

A papal visit compresses every threat domain into a single televised, open-air, ideologically charged event. The lessons of the SRF-IWS series all apply, but the protectee changes the maths.

The first point is that information is the main battlefield. For a Pope speaking about migration before Parliament and a 1.8 million crowd, the disinformation and deepfake vectors are more likely, and probably more consequential, than any kinetic act. Strategic communications is a security function, full stop.

The second is that the crowd is both the audience and the weapon. You can produce mass casualties in a dense crowd without firing a shot, just by engineering panic. Crowd dynamics deserve the same planning effort as counter-sniper coverage.

The third is convergence. Disinformation that primes, cyber and RF that blind, drones that trigger, deepfakes that amplify, run in sequence and fast. The defense has to be just as integrated and just as fast.

The fourth is that the history is the warning. Attacks on pontiffs are documented fact, not imagination, and planning has to respect that record.

And the last is that speed and unity decide the outcome. A fragmented, human-speed defense cannot answer a coordinated, multi-domain operation. A single shared command picture is the price of entry.

The point of writing all of this down is simple: the defenders, not the adversaries, should be the ones who have thought it through first.

SRF

Follow: @simonroses

This article continues the SRF-IWS research into information warfare strategies applied to high-profile protective environments.

Posted in AI, Security, Technology | Tagged , , , , , | Leave a comment

Scanning Vibe-Coded Apps: Why Traditional SAST/DAST Falls Short (part 6)

Vibe Coding Security Series

  1. What Is Vibe Coding Security? A Field Guide for 2026
  2. The OWASP Top 10 for Vibe-Coded Applications
  3. Anatomy of a Vibe Coding Breach: Lessons from 2026’s Worst Incidents
  4. The Dependency Trap: Supply Chain Risks in AI-Generated Code
  5. Authentication & Secrets: What AI Gets Wrong Every Time
  6. Scanning Vibe-Coded Apps: Why Traditional SAST/DAST Falls Short (you are here)
  7. Prompt Engineering for Secure Code
  8. The Founder’s Security Checklist (coming soon)
  9. Securing the AI Coding Pipeline (coming soon)
  10. The Future of Vibe Coding Security (coming soon)

Read Time: 20 minutes

TL;DR

Traditional security scanners pattern-match on code that exists. The most dangerous vulnerabilities in vibe-coded apps live in code that doesn’t exist — missing auth checks, missing rate limiting, missing authorization logic. A January 2026 SAST benchmark found tools flagging 68–75% of safe code as vulnerable while architectural flaws passed silently, and Georgia Tech has tracked 74 AI-attributed CVEs with monthly discoveries growing 6x in two months. New AI-native tools are closing the gap, but as of mid-2026, broken authorization and absent security controls still require human review. This post covers what works, what doesn’t, and how to build a scanning pipeline for AI-generated code.


The Scanning Paradox

We have more security scanning tools than at any point in the history of software development. SAST, DAST, SCA, IAST, RASP — the acronym count alone suggests the problem should be solved. And for human-written code, these tools have been steadily improving for two decades. The issue is that vibe-coded applications don’t fail the way human-written ones do.

When a human developer introduces a SQL injection, it’s usually because they forgot to parameterize a query. A SAST tool pattern-matches on string concatenation inside a SQL call and flags it. Straightforward. When an AI coding tool introduces a security flaw, the code is typically syntactically clean, follows documented API patterns, and passes every functional test. The vulnerability isn’t in how the code is written — it’s in what the code doesn’t do. Missing server-side validation. Missing rate limiting. Missing authorization checks. Missing RLS policies. You can’t pattern-match on absent code.

Georgia Tech’s Vibe Security Radar, launched in May 2025, tracks CVEs attributable to AI coding tools by tracing fixing commits backward through Git history. Their numbers tell the story: 6 AI-attributed CVEs in January 2026, 15 in February, 35 in March. A nearly 6x increase in two months. The total confirmed count stands at 74, with researchers estimating the true number is 5–10x higher because most AI-generated code doesn’t leave clear attribution markers.

Meanwhile, the Cloud Security Alliance’s emergency strategy briefing — assembled over a single weekend by 60+ contributors including Jen Easterly and Bruce Schneier — warned that the window to fix vulnerabilities is collapsing: mean time from disclosure to confirmed exploitation has fallen to less than one day in 2026, down from 2.3 years in 2019. Separate CSA research has found that 62% of AI-generated code samples contained vulnerabilities.

The scanners are running, the vulnerabilities are still shipping, and the gap is widening.


What SAST Actually Catches (And What It Doesn’t)

Static Application Security Testing works by analyzing source code without executing it. Tools like CodeQL, Semgrep, SonarQube, and Checkmarx parse the code into an abstract syntax tree, then match patterns against known vulnerability signatures — string concatenation in SQL queries, eval() on untrusted input, deprecated cryptographic functions. These are well-defined patterns, and SAST handles them reliably.

The problem is false positives and structural blind spots.

The False Positive Problem

A January 2026 study benchmarked CodeQL, Semgrep, SonarQube, and Joern against OWASP Benchmark v1.2 — 2,740 Java test cases with known vulnerability status. CodeQL achieved the highest F1-score at 74.4%, but it flagged 68.2% of non-vulnerable test cases as positive — 904 false positives across the benchmark. SonarQube produced 1,254 false positives, covering 45.8% of all test cases. Semgrep flagged 74.8% of non-vulnerable cases. Joern had the fewest false positives at 96 but achieved only 8.2% recall — it catches almost nothing.

For a vibe coder running Semgrep on their AI-generated codebase for the first time, this means roughly three-quarters of the alerts they see are noise. After the third false positive about a “potential injection” in code that’s actually safe, most people stop reading the output entirely. The signal drowns in the noise, and the real issues — the ones that matter — scroll past unread.

Here’s one I run into constantly. Over the past few years I’ve done plenty of code reviews for AWS-based applications at VULNEX, and Semgrep flags AWS account IDs as sensitive information leaks in nearly every project. The problem is that AWS themselves don’t consider account IDs to be sensitive — their documentation explicitly states they can be shared when needed. That’s a false positive that shows up in every single AWS project, training teams to ignore Semgrep output for that codebase entirely. I always work with the customer to understand their specific privacy requirements before dismissing or escalating any finding — some organizations do treat account IDs as internal-only regardless of what AWS says — but this is exactly the kind of noise that erodes trust in automated tools.

The Structural Blind Spot

False positives are annoying but manageable. The structural blind spot is the real problem. SAST works by matching patterns in code that exists. Vibe-coded vulnerabilities are often in code that doesn’t exist.

Consider the QuickNote app from Part 5. The most dangerous issues weren’t bugs in the code — they were missing features. No rate limiting on the login endpoint. No RLS policies on the database. No server-side authorization check. No token expiration. SAST cannot flag the absence of a security control, because there’s no code to analyze. It’s like asking a spell-checker to tell you that your essay is missing a conclusion.

Here’s what happens when you run Semgrep against a typical vibe-coded Express.js app:

semgrep --config=auto ./src

Semgrep will likely flag things like innerHTML usage (real issue — XSS), eval() calls if present, and maybe the MD5 hash function. What it won’t flag: the /api/users/:id/notes endpoint lacking an ownership check, jwt.sign() called without an expiresIn parameter, the entire application having no rate limiting middleware, Supabase RLS disabled on every table.

These are the vulnerability classes that matter most in vibe-coded applications, and SAST is structurally incapable of detecting them.

What SAST Is Good For

This isn’t an argument to stop using SAST. Pattern-matching catches real issues: hardcoded credentials (when they match known patterns), dangerous function calls, known-vulnerable library usage, obvious injection vectors. For the subset of vulnerabilities that look like traditional bugs, SAST works. The problem is that in vibe-coded apps, that subset covers maybe 30% of the actual risk surface. The other 70% is architectural.


What DAST Misses in the SPA Era

Dynamic Application Security Testing takes the opposite approach — instead of reading source code, it runs the application and attacks it from outside. OWASP ZAP and Burp Suite send malicious payloads to endpoints, monitor responses, and flag behavior that indicates vulnerabilities. If you can trigger a SQL injection through an HTTP request, DAST finds it. If a reflected XSS payload shows up in the response, DAST catches it.

For traditional server-rendered web applications, DAST has been reasonably effective. But vibe-coded applications are overwhelmingly single-page apps (SPAs) built with React, Next.js, or Vue, and DAST’s architecture has a hard time with them.

The Crawling Problem

DAST discovers application functionality by crawling — following links, submitting forms, parsing HTML. SPAs don’t work that way. Routes are handled client-side by JavaScript. Forms are React components that communicate via fetch() calls. API endpoints aren’t discoverable by parsing HTML, because the HTML is a nearly empty shell that loads a JavaScript bundle. A DAST crawler hitting a typical vibe-coded React app sees <div id="root"></div> and maybe a few <script> tags. It misses everything.

Modern DAST tools have gotten better at JavaScript rendering — ZAP has an AJAX Spider, Burp has a built-in browser. But they still struggle with authentication flows (especially OAuth), multi-step workflows, and application state. A login form that uses useState for input tracking and useEffect for token storage doesn’t behave like a traditional HTML form, and DAST crawlers frequently can’t complete the auth flow to reach the protected surface area behind it.

The Business Logic Gap

Even when DAST can reach the endpoints, it hits the same wall SAST does: the vulnerability is in what the code doesn’t do. DAST sends a SQL injection payload to /api/notes and checks whether the response looks like database output. That’s a legitimate test. But it doesn’t test whether /api/notes/42 returns data belonging to a different user. It doesn’t test whether the /api/admin/users endpoint is accessible with a non-admin token. It doesn’t test whether the login endpoint allows 10,000 attempts per minute.

These are business logic vulnerabilities — they require understanding the application’s intended behavior, not just its input/output surface. DAST treats the application as a black box. For vibe-coded apps where the most dangerous vulnerabilities are in the authorization model, that black-box approach misses the things that matter.

Where DAST Still Helps

DAST catches configuration issues that SAST can’t: missing security headers, permissive CORS policies, exposed server information, SSL/TLS misconfigurations. These are deployment-level problems, not code-level problems, and vibe-coded apps tend to ship with terrible default configurations because the AI optimizes for “it works locally.” Running ZAP or Nuclei against your deployed application catches the infrastructure-layer gaps.

Nuclei deserves a specific mention. Its community-maintained template library now exceeds 11,000 templates, and ProjectDiscovery has introduced AI-powered template generation — describe a check in natural language, get a YAML template. A recent pull request added AI Security DAST templates specifically targeting AI-system patterns. It’s not solving the fundamental architectural problem, but it’s the closest DAST has gotten to being vibe-code-aware.


The SCA Gap: When Dependencies Don’t Exist

Software Composition Analysis (SCA) tools — Snyk, npm audit, Dependabot, Socket.dev — check your project’s dependencies against vulnerability databases. If you’re using lodash@4.17.20 and there’s a CVE for that version, SCA flags it. This has been one of the most effective automated security practices for the past decade.

AI-generated code breaks SCA because the dependencies are made up.

Slopsquatting

The term, coined by security researcher Seth Larson, describes what happens when AI coding tools recommend packages that don’t exist in any registry. A March 2025 study analyzing 576,000 AI-generated code samples found that roughly 20% recommended packages that aren’t real. Worse, 43% of those hallucinated package names are consistent across different AI runs — meaning an attacker can predict which fake names the AI will suggest, register them, and fill them with malicious code.

That’s exactly what happened. In January 2026, a hallucinated npm package called react-codeshift spread through 237 repositories via AI-generated code. Nobody deliberately planted the package name in the AI’s training data. The AI hallucinated it, multiple developers installed it when their AI suggested it, and eventually someone registered it with malicious code. The supply chain attack was automated by the AI itself.

SCA tools can’t flag a package that doesn’t have a CVE because it’s brand new and doesn’t appear in any vulnerability database yet. npm audit would report zero issues for react-codeshift — the package existed, it had no known CVEs, and its package.json looked normal. The malicious behavior was in the code, not in the metadata.

What Different SCA Tools Catch

The SCA landscape has split into two camps. Traditional CVE-based tools (npm audit, Dependabot, basic Snyk scanning) check packages against known vulnerability databases. If the vulnerability has a CVE, they catch it. If it doesn’t, they don’t. For established packages with active security research, this works. For hallucinated packages, newly registered packages, and packages with obfuscated malicious behavior, it’s blind.

Socket.dev represents the newer approach — it analyzes package behavior rather than just checking CVE databases. It detects install scripts that exfiltrate environment variables, network calls to unexpected domains, obfuscated code that decodes at runtime, and sudden changes in maintainer behavior. This behavioral analysis catches supply chain attacks that CVE databases haven’t catalogued yet.

Snyk’s DeepCode AI combines symbolic analysis with AI to scan code snippets as they’re generated, catching vulnerable patterns inside the IDE before they reach the repository. This is closer to where SCA needs to go for vibe-coded apps — flagging issues at generation time rather than after the package is installed and the code is committed.

For the dependency problems I covered in Part 4, no single SCA tool covers the full risk surface. The practical answer is layering: npm audit for known CVEs, Socket.dev for behavioral anomalies, and manual verification that the packages your AI suggested actually exist and are what they claim to be.


What’s Actually Working: The New Wave

The gap between what traditional tools catch and what vibe-coded apps need has spawned a new generation of security tools. Some are AI-native — they use LLMs to reason about code instead of pattern-matching. Others take hybrid approaches, combining traditional analysis with AI-powered reasoning. A few are specifically designed for vibe-coded applications.

LLM-Augmented SAST

The most promising near-term improvement is using LLMs to post-process traditional SAST output. The same January 2026 study that exposed SAST’s false positive rates also tested layering LLM agents on top of the output. The best configuration reduced the initial false positive rate from 98.3% to 6.3%. The LLM reads the flagged code in context, understands what it’s doing, and determines whether the flag is legitimate or noise.

This doesn’t solve the blind spot problem — the LLM is still working from SAST’s initial findings, so absent code remains invisible. But it makes SAST output actually usable. Instead of 750 alerts where 700 are false positives, you get 50 alerts where 47 are real. That’s the difference between a report nobody reads and a report that drives fixes.

Neuro-Symbolic Analysis (IRIS)

IRIS, published at ICLR 2025, takes a different approach. Instead of post-filtering SAST output, it combines LLM reasoning with CodeQL’s static analysis in a neuro-symbolic framework. The LLM identifies potential vulnerability patterns through code comprehension, then CodeQL validates them with formal analysis. Using GPT-4, IRIS detected 55 vulnerabilities across 30 Java projects — 103.7% more than CodeQL alone. It found 4 previously unknown vulnerabilities. Even a smaller model (DeepSeekCoder 7B) detected 52 vulnerabilities, showing this approach doesn’t require cutting-edge models.

The false discovery rate is still high at 84.82%, but it’s 5.21% lower than CodeQL by itself. More importantly, IRIS catches vulnerability categories that pure pattern-matching misses — it can reason about whether an authorization check is semantically correct, not just whether one exists.

AI-Native Scanners

Two major AI-native security scanners launched in early 2026. Anthropic’s Claude Code Security, released February 2026, uses LLM reasoning to analyze code for vulnerabilities rather than matching patterns. It’s available to Enterprise and Team customers, and free for open-source maintainers. In its initial period, it found over 500 high-severity vulnerabilities in open-source projects. OpenAI’s Codex Security, launched March 2026, scanned over 1.2 million commits during beta, surfacing 792 critical and 10,561 high-severity findings.

Neither tool has been independently audited, so take the numbers with appropriate caution. But the approach is fundamentally different from traditional SAST — instead of matching patterns, these tools read code the way a security reviewer would, reasoning about data flow, trust boundaries, and whether the security model makes architectural sense.

Pre-Publish Security Gates

VibeGuard, published April 2026, targets the specific blind spots of AI-generated code with a pre-publish security gate framework. It checks for five categories: artifact hygiene (source maps, debug files shipping to production), packaging-configuration drift, hardcoded secrets, supply-chain risks, and source-map exposure. The motivation came from a real incident — in March 2026, Anthropic’s own Claude Code CLI shipped a 59.8 MB source map exposing roughly 512,000 lines of TypeScript source. In controlled experiments on 8 synthetic projects, VibeGuard achieved 100% recall and 89.47% precision (F1 = 94.44%).

This is a narrower tool than a full SAST scanner, but it targets exactly the things vibe-coded apps get wrong. AI coding tools are very good at generating code that works. They’re terrible at generating deployment artifacts that are clean and hardened. VibeGuard sits in the gap.

Agentic Security Platforms

DryRun Security calls itself “AI-native, agentic” code security. Rather than pattern-matching individual files, it inspects data flow across files and services — understanding how data moves through the application at an architectural level. Their 2025 SAST Accuracy Report showed 88% detection of seeded vulnerabilities out of the box, outperforming four leading traditional static analyzers, with particular strength on complex logic and authorization flaws. In February 2026, they launched a DeepScan Agent that does full-repository security reviews.

Escape raised $18 million in March 2026 specifically to replace legacy scanners with AI agent-driven security testing. Their research team’s methodology is worth studying: they scanned 5,600 publicly accessible vibe-coded applications and found over 2,000 high-impact vulnerabilities. The breakdown is telling — 400+ exposed secrets and 175 instances of personal data exposure, including medical records and bank account numbers. Zero-auth APIs, missing rate limiting, and BOLA/IDOR dominated the findings. These are exactly the vulnerability classes that traditional scanners miss.


What Scanners Miss: The Vibe Code Blind Spots

Across the research, six vulnerability patterns in AI-generated code consistently evade traditional scanning tools. Knowing them means you know what to look for manually, even when the scanner gives you a clean report.

1. Frontend-Only Security Controls

The AI generates a React auth guard that checks localStorage for a JWT before rendering protected routes. The guard works — unauthenticated users see the login page. But the API behind those routes accepts any request, with or without a token. SAST scanning the backend sees API endpoints that take requests and return data. It doesn’t cross-reference with the frontend to check whether server-side enforcement exists. DAST might not reach the endpoints at all if it can’t complete the frontend auth flow.

2. Zero-Auth APIs

Escape’s scan of 5,600 vibe-coded apps found applications with 7–12 public API endpoints performing destructive operations (DELETE, PUT) with no authentication at all. The OpenAPI spec — when one existed — had no security schemes defined. SAST doesn’t flag an endpoint for not having auth middleware, because “no middleware” isn’t a pattern it can match. The code is perfectly valid; it’s just missing a security requirement.

3. Missing Rate Limiting

As I showed in Part 5, a login endpoint without rate limiting lets an attacker try the top 1,000 passwords in ten seconds. No scanner flags this because rate limiting is a middleware addition, not a code pattern. The login endpoint itself is correct — it validates credentials and returns a token. The absence of express-rate-limit or its equivalent is a deployment decision, not a code bug.

4. BOLA/IDOR Without Sequential IDs

The Lovable BOLA breach from Part 5 is the canonical example. The API checked authentication (valid Firebase token) but not authorization (does this token’s user own this project?). SAST sees the firebase.auth() call and considers the endpoint protected. The ownership check that should follow is business logic the scanner can’t infer. DAST could theoretically detect IDOR by testing two different user sessions, but most DAST configurations don’t set up multi-user testing scenarios.

5. Insecure Default Configurations

AI-generated code uses Supabase with RLS disabled, Firebase with security rules set to allow read, write: if true, Express with no CORS configuration (defaulting to allow-all), and JWT libraries with the algorithms parameter unset (allowing the none attack). None of these are bugs. They’re all valid configurations that happen to be insecure. SAST would need configuration-specific rules to flag them — and most tools don’t ship with rules for “Supabase table missing RLS policy.”

6. Artifact Hygiene Failures

Source maps shipped in production, .env files baked into Docker images, node_modules included in deployable artifacts, debug logging active in production. These aren’t code vulnerabilities — they’re packaging and deployment failures that expose source code, secrets, and internal architecture. Traditional SAST and DAST don’t scan build artifacts at all.


Building a Scanning Pipeline That Works

No single tool covers the full risk surface of a vibe-coded application. The practical answer is layering tools where each one covers a different gap, running them in the right order, and knowing what still requires human review.

Layer 1: Pre-Commit (Catch Secrets Before They Ship)

Before code reaches the repository, run secret detection. This is the highest-ROI automated check because secrets in version control are permanent — even if you delete the file, the secret lives in Git history.

# Install and run Gitleaks as a pre-commit hook
gitleaks detect --source . --verbose

# Or TruffleHog for deeper analysis including Git history
trufflehog filesystem . --only-verified

Configure this as a Git pre-commit hook. Every commit gets scanned. If a secret is detected, the commit is blocked. This is the one layer where automation is genuinely reliable — the patterns are well-defined and false positives are manageable.

Layer 2: CI Pipeline (SAST + SCA on Every Push)

Run SAST and SCA in your CI pipeline. The goal here isn’t perfection — it’s catching the 30% of issues that pattern-matching handles well.

# Semgrep with auto-config (pulls relevant rule sets for your stack)
semgrep --config=auto --error --json ./src > semgrep-results.json

# npm audit for known dependency CVEs
npm audit --audit-level=high

# Socket.dev CLI for behavioral dependency analysis
socket scan create --repo . --branch main

The critical step is filtering SAST output. If your team is drowning in false positives, start with only the high-confidence rules. Semgrep’s p/security-audit ruleset is more targeted than --config=auto. For SCA, differentiate between development and production dependencies — a CVE in a dev-only testing library is lower priority than one in your authentication middleware.

Layer 3: Post-Deploy (DAST Against the Running App)

After deployment, run DAST against your actual application. This catches configuration issues that don’t exist in source code.

# Nuclei with community templates
nuclei -u https://yourapp.com -t nuclei-templates/ -severity critical,high

# ZAP baseline scan
docker run -t zaproxy/zap-stable zap-baseline.py -t https://yourapp.com -r report.html

For SPAs, use ZAP’s AJAX Spider or Burp’s browser-based crawling rather than the default HTTP crawler. Feed the scanner your OpenAPI spec if you have one — it’ll discover endpoints the crawler misses.

Layer 4: AI-Augmented Review (The New Layer)

This is the emerging layer that didn’t exist a year ago. If you have access to Claude Code Security, Codex Security, or DryRun, run them as a complement to traditional SAST. They cover the architectural reasoning gap — detecting absent controls, evaluating whether authorization logic is semantically correct, and understanding data flow across service boundaries.

If you don’t have access to these commercial tools, you can approximate the approach by running an LLM against your SAST output to filter false positives (the technique from the January 2026 study reduced false positives from 98.3% to 6.3%), or by prompting an LLM to review specific security-critical files with targeted questions: “Does this endpoint verify that the authenticated user owns the requested resource?” “Is there a rate-limiting middleware applied to this route?”

Layer 5: Manual Review (The Irreplaceable Layer)

I’ve been in application security for over two decades. Every engagement I do at VULNEX starts with automated scanning and ends with manual review, because the automated tools always miss something. For vibe-coded apps, the manual review is even more important because the vulnerability classes are architectural.

The manual review checklist is shorter than people think. For each API endpoint: does it check authentication? Does it check authorization — not just “is this user logged in” but “is this user allowed to access this specific resource”? Is the client sending any data that controls server-side behavior (user IDs, role flags, price overrides) without server-side validation? Are there admin functions accessible to regular users?

A focused manual review of the auth and authorization layer takes hours, not days, and it catches the issues that every automated tool misses.

What This Costs

For a solo founder or small team, here’s roughly what this takes. Layers 1–3 use free, open-source tools — Gitleaks, Semgrep, npm audit, Socket.dev’s free tier, Nuclei. Setting up the full CI pipeline takes an afternoon if you’re comfortable with GitHub Actions or similar, a weekend if you’re starting from scratch. Layer 4 varies: Claude Code Security is free for open-source projects, DryRun and Escape have commercial pricing that typically starts in the low hundreds per month. Layer 5 is where it gets expensive if you don’t have security expertise in-house. A focused auth and authorization review from a security consultancy typically runs €3,000–€10,000 depending on application size and complexity. That’s real money for an early-stage startup — but skipping it is how the breaches from Part 3 happened.


The Scanning Checklist

Run this against your vibe-coded application. Each item addresses a specific gap in traditional scanning.

Secrets (Pre-Commit):

  1. Run gitleaks detect --source . --verbose and trufflehog filesystem . --only-verified — zero findings before any commit
  2. Search frontend bundles for leaked keys: grep -r "sk-\|API_KEY\|SECRET\|Bearer\|supabase\|firebase" dist/ build/
  3. Verify .env files were never committed: git log --all --diff-filter=A -- '*.env' '.env*'

SAST (CI Pipeline):

  1. Run semgrep --config=p/security-audit --error ./src — use the focused ruleset, not --config=auto, to keep noise manageable
  2. Review every high or critical finding manually — look for innerHTML, eval(), dangerouslySetInnerHTML, unsanitized SQL

SCA (CI Pipeline):

  1. Run npm audit --audit-level=high — address all high and critical CVEs
  2. Verify dependencies are real: check that every package in package.json has a legitimate npmjs.com page with downloads and a real maintainer
  3. Run Socket.dev or Snyk for behavioral analysis — catches supply chain attacks that CVE databases miss

DAST (Post-Deploy):

  1. Run nuclei -u https://yourapp.com -severity critical,high against your deployed app
  2. Check security headers and CORS: curl -s -D- https://yourapp.com | grep -i "x-frame\|x-content-type\|strict-transport\|content-security-policy" and test with Origin: https://evil.com

Manual (The Gaps):

  1. Test every API endpoint without the frontend — does it require authentication?
  2. Test cross-user access — can User A access User B’s resources by changing IDs?
  3. Test admin endpoints with a regular user’s token, send 100 rapid login requests to verify rate limiting (expect a 429), and confirm Supabase RLS / Firebase security rules are enabled and scoped to the authenticated user

This pipeline won’t catch everything. But it covers the layers where automated tools are reliable, flags the areas where they’re blind, and directs manual effort to where it matters most. If you’re running zero scanning today — which, based on what I see in assessments, describes most vibe-coded applications — starting with items 1, 2, 11, and 12 gives you the most security value for the least effort.


What You Should Take From This

Traditional security scanners aren’t broken. They’re solving a different problem. They were built for a world where developers understand their code and make localized mistakes — a forgotten parameterized query, a misused crypto function, an outdated dependency. AI-generated code introduces a new class of vulnerability: architecturally correct code with absent security controls. The login works, the JWT validates, the database responds — and the fact that any authenticated user can read any other user’s data isn’t something a pattern-matcher can flag.

The scanning landscape is evolving fast. AI-native tools that reason about code rather than pattern-matching against it are starting to close the gap. The IRIS approach (neuro-symbolic analysis), LLM-based false-positive filtering, and pre-publish gates like VibeGuard are all steps in the right direction. But as of mid-2026, no automated tool reliably catches broken authorization logic, missing rate limiting, or client-side-only security controls. Those still require human review.

My workflow at VULNEX: Gitleaks and TruffleHog for secrets, Semgrep for pattern-based issues, npm audit plus Socket.dev for dependencies, Nuclei for the deployed surface, and then manual testing of every auth and authorization boundary. The automated layers take minutes, the manual review takes hours — and in my experience, the manual review is where the critical vulnerabilities surface.

If you’re a solo founder or non-security engineer — which describes most people building with AI coding tools — Layer 5 is the hard one. You can’t review what you don’t know how to find. My practical advice: run Layers 1–3 at minimum, they’re free and they catch real issues. If your application handles user data, payments, or anything sensitive, budget for a professional security review before you launch. It doesn’t have to be a full pentest — a focused review of your auth and authorization boundaries, scoped to 2–3 days, catches the architectural issues that automation misses. Part 8 of this series will go deeper on this with a complete founder’s checklist.

As always: trust nothing, verify everything.


Further Reading


References

Posted in AI, Pentest, Security, Technology | Tagged , , , , | Leave a comment