↑ ↓ arrows · space · scroll
OWASP A06: Insecure Design

Can an LLM find design flaws
in code it can't read?

LLMs review code one file at a time. Design flaws live between files.

Joern CPG Express + Sequelize OWASP A06 LLM-Assisted
02

The keyhole problem

LLMs see code through a keyhole: one file, one function, one context window at a time.

  • Local bugs are visible, since an SQL injection lives in one function
  • Design flaws are cross-cutting: they live in relationships between components
  • "No auth on this route" requires knowing all routes, all middleware, and which routes lack it
  • Design flaws are defined by absence and relationships
Design flaw spans 6 cells, spotlight sees 1
03

Three approaches, three blind spots

Bug finding is a solved problem. Design analysis is not. Design flaws require negative queries (what's missing?) and cross-file relationship analysis (what connects to what?).

LLM reads code

Understands intent, finds local bugs. But can't hold architecture. It sees one file at a time.

SAST (Semgrep)

Sees entire codebase, 3,000+ rules for injection & XSS. No semantic reasoning. It can't ask "why is this wrong?"

The gap

Pattern matchers can't express "find all routes that do NOT have auth middleware." Design analysis needs something else.

LLM Semantics SAST Coverage Bug finding (solved) CPG Structure + Flow Design analysis ? LLM and SAST overlap on bugs. Design flaws live below. Can a CPG reach them?
04

Three graphs, merged into one

A Code Property Graph overlays three views of the same code. Together they capture structure, control flow, and data movement: 443,000 nodes for the target app.

AST Abstract Syntax Tree: the code’s grammatical structure. Functions contain parameters, if-statements contain branches. This is what your editor sees.
CFG Control Flow Graph: the possible execution paths. Which branch runs after the if? Where does the loop return? Dashed blue lines trace every route the program can take.
PDG Program Dependence Graph: where data actually flows. A parameter feeds into a function call three levels away. Green lines reveal the invisible wiring that makes taint analysis possible.
function params if-stmt call then else args control flow data flow AST (structure) CFG (execution) PDG (data flow)

Toggle each layer to see how structure, execution paths, and data flow overlay on the same code.

05

Two tools, two levels of abstraction

Joern

CPG Engine
  • Expression-level: every call, every argument, every data flow
  • Query language: CPGQL (Scala-based traversals)
  • Strengths: negative sub-traversals, taint analysis, get_source
Answered all 20 questions (11 solo)

GitNexus

Code Knowledge Graph
  • Symbol-level: functions, files, relationships, modules
  • Query language: Cypher + semantic search
  • Strengths: blast radius, community detection, execution flows
Contributed on 8 questions (architectural context)
06

The LLM doesn't read code. It queries the graph

  • The CPG encodes cross-file relationships in a queryable form
  • The LLM formulates a question → Joern traverses → short structured answer
  • Negative sub-traversals: find what's missing, not what's there
  • Pattern matchers can't express "find all routes that do NOT have auth middleware"

The graph carries the cross-partition knowledge. The LLM provides the reasoning. Neither works alone.

app.get() no security.* handler() db.update() GAP The CPG makes absence queryable
07

LLMs can query the graph, but someone still has to point

LLMxCPG (Lekssays et al., 2025) combines fine-tuned LLMs with Joern's CPG for vulnerability detection.

What it proved

  • CPGQL is learnable (but needs fine-tuning)
  • CPG slicing reduces code by 68–91%
  • 15–40% F1 improvement over baselines

What it doesn't do

  • No architectural discovery
  • No negative queries ("what's missing?")
  • No whole-codebase analysis
HUMAN
CWE + Code
LLMxCPG-Q
Joern
Code Slice
LLMxCPG-D
Vuln?
Yes / No

↑ The human provides the CWE and the code. The bottleneck remains

08

Two passes: learn the architecture, then find the flaws

Input
Stack Profile
package.json → Express, Sequelize, custom auth
Pass 1
Generic Structural Queries
9 questions: routes? middleware? validation?
LLM
Structured facts → targeted CPGQL
No source code, only query results
Pass 2
Design Analysis
8 CWEs, 11 queries → 30 findings
LLMxCPG
Human provides CWE + snippet
OUR APPROACH
Framework name only → discover everything
09

9 questions, zero lines of code read

The LLM formulated CPGQL queries using only "this is an Express app with Sequelize."

S1 Routes

109 handlers

All in server.ts, 3 route groups

click to flip
// CPGQL
cpg.call.name("get|post|put|delete")
  .where(_.file.name("server.ts"))
  .l.size
// → 109
S2 Middleware

9 reusable

security.* pattern identified

click to flip
// CPGQL
cpg.call.name("get|post|...")
  .flatMap(c => c.argument
    .map(a => a.code))
  .groupBy(identity).l
S3 Auth

Custom, not passport

isAuthorized, denyAll, appendUserId

click to flip
// CPGQL + GitNexus
cpg.method
  .where(_.file.name(
    "insecurity.ts"))
  .name.l
// → 19 functions
S4 DB Ops

33 files

Sequelize + MongoDB, writes in 20 route files

click to flip
// CPGQL
cpg.call.name("findOne|findAll
  |create|update|destroy")
  .groupBy(_.file.name).l
// → 33 server-side files
S5 Validation

Zero

No library, no ad-hoc validation

click to flip
// CPGQL
cpg.call.name("check|body
  |query|param
  |validationResult")
  .l.size
// → 0
S6 Errors

1 global handler

Leaks raw errors + Express version

click to flip
// CPGQL
cpg.call.name("status")
  .where(_.code(
    ".*status\\((4|5)\\d\\d\\).*"))
  .l
// → 30+ files
S7 Uploads

4 routes

Validators exist but always pass

click to flip
// CPGQL
cpg.call.name("single|array")
  .where(_.code(
    ".*(upload|multer).*"))
  .l
// → 4 multer configs
S8 Rate Limits

4 of 109 (3.7%)

Only reset-password + 2FA

click to flip
// CPGQL
cpg.call.name("rateLimit")
  .map(c => (
    c.code, c.lineNumber))
  .l
// → 4 endpoints
S9 WebSocket

1 handler, no auth

socket.io, no JWT on connection

click to flip
// CPGQL
cpg.call.name("on")
  .where(_.code(
    ".*(connection).*"))
  .l
// → 1 handler
10

8 CWEs, 11 queries, 30 confirmed findings

CWE FINDING SEVERITY
269 58 routes (53%) have no auth middleware Critical

Negative sub-traversal on app.verb() calls in server.ts, filtering out routes with security.* arguments. Cross-referenced with path-scoped app.use() auth. Notable: GET /rest/user/change-password, admin endpoints, all file uploads left unprotected.

311 MD5 passwords, hardcoded HMAC key, plaintext TOTP Critical

hash() uses crypto.createHash('md5') with no salt, no stretching. hmac() key 'pa4qacea4VK9t9nGv7yZtwmj' hardcoded in source. Card numbers stored as plaintext integers. TOTP secrets unencrypted in DB.

434 4 upload routes, validators exist but always pass High

checkFileType checks extension for challenge solving but always calls next(). checkUploadSize always calls next(). No magic byte validation, no content-type check, no multer fileFilter configured.

501 22 DB ops use untrusted input directly Critical

Mass assignment in POST /api/Users: full req.body to Sequelize with no field allowlist. Chatbot query text overwrites username. req.body.UserId used instead of middleware-injected value in 14+ files.

602 5 validation checks across 109 handlers (4.6%) High

All 5 are password-emptiness checks. Zero type coercion, zero regex, zero schema validation. Model-layer sanitization exists but is output encoding, not input validation.

653 Unauthenticated users reach hash, JWT signing High

changePassword calls hash(), resetPassword calls hmac(), chatbot calls authorize() and verify(). Prometheus metrics endpoint exposed without auth.

799 Login has no rate limiting, reset uses spoofable header Medium

4/109 routes rate-limited (reset-password + 2FA only). All use 5min/100req. Reset-password keyGenerator uses X-Forwarded-For, which is trivially bypassable.

841 Purchase workflow has no state machine High

Checkout verifies basket and delivery but NOT address or payment card. No session-level step tracking. Angular frontend stores state in sessionStorage. Coupon applied to any basket without ownership check.

11

30 findings, 30 confirmed, zero false positives

Every finding maps to at least one known exploitable challenge in vulnerable-app.

CWE FINDING MATCH EXAMPLE
269 58 unprotected routes 4 ✓ Admin Section

Admin Section: admin panel accessible without auth. Change User1's Password: password change via GET, no auth. View Basket: another user's basket viewable. Five-Star Feedback: delete feedback without auth.

311 MD5, hardcoded HMAC 3 ✓ Weird Crypto

Password Strength: admin password crackable (MD5 reversible). Weird Crypto: "algorithm it should not use." Two Factor Auth: TOTP secret stored unsafely.

434 Fake file validators 3 ✓ Upload Type

Upload Size: upload >100kB (checkUploadSize always passes). Upload Type: upload non-PDF/ZIP (checkFileType always passes). Arbitrary File Write: overwrite legal information file.

501 22 untrusted DB ops 5 ✓ Admin Registration

Admin Registration: mass assignment sets role:admin. Forged Feedback: post as another user. Forged Review: edit any review. Manipulate Basket: put product in another's basket. NoSQL Manipulation: update multiple reviews.

602 4.6% validation 4 ✓ Zero Stars

Zero Stars: UI prevents 0 stars, server accepts it. Empty User Registration: register with empty fields. Repetitive Registration: DRY violation. Payback Time: negative quantities not validated.

653 Unprotected security functions 2 ✓ Exposed Metrics

Exposed Metrics: Prometheus endpoint without auth. Change User1's Password: password change reaches hash function without auth.

799 Login not rate-limited 2 ✓ CAPTCHA Bypass

CAPTCHA Bypass: submit 10+ feedbacks in 20 seconds. Reset User3's Password: brute force despite rate limiting (bypassable).

841 No workflow state machine 3 ✓ Deluxe Fraud

Deluxe Fraud: obtain membership without paying. Payback Time: order makes you rich (negative quantities). Expired Coupon: redeem expired coupon.

12

From reading every file to querying for answers

WITHOUT CPG
974 files all must be read
WITH CPG
5 functions targeted verification
5 / 974
functions read vs files in codebase (0.5% of the code)

The CPG didn't eliminate code reading; it reduced it to targeted verification. 5 of 11 Pass 2 queries needed get_source, reading 10–20 lines each.

13

What this doesn't prove (yet)

  1. Single target codebase. All results come from one application. Not validated across multiple projects.
  2. Intentionally vulnerable target. The application ships with documented vulnerabilities. Real codebases are messier.
  3. Single framework. Express/Sequelize only. Django, Spring, Rails, FastAPI, all remain untested.
  4. No SAST baseline. Didn't run Semgrep alongside for direct comparison of what each approach finds.
  5. CPGQL queries not independently validated. No expert review for completeness; alternative query formulations may find more.
  6. Human-in-the-loop. A human directed the session. This describes a workflow, not an automated pipeline.
  7. OWASP A06 scope only. Injection (A03), XSS (A07), misconfiguration (A05) were not tested.
  8. Static analysis only. No runtime behavior, no dynamic taint, no exploitation. Structural findings only.
14

Conclusion: Design analysis needs a structural bridge

1

Bug finding is solved. Design analysis is not.

SAST has 3,000+ rules for injection and XSS. Zero rules for "does this workflow enforce step ordering?" or "what percentage of routes lack auth?" Design flaws are architectural.

2

LLMs can't read their way to design flaws.

The keyhole problem is structural, not a context window limitation. Even with infinite context, file-by-file reading loses the cross-file relationships that define design.

3

The CPG is the right abstraction layer.

Not source code (too granular). Not pattern matching (no semantics, no absence detection). A structural graph the LLM can query, reason over, and verify against. Two passes: learn the architecture, then interrogate the design.

Validated on vulnerable-app (443K CPG nodes, 107 known challenges). 30 findings confirmed across 8 OWASP A06 categories. Zero false positives. All tools open source.

15

Tools and references

Tools used

ToolPurposeURL
JoernOpen-source CPG engine, CPGQL query languagejoern.io
GitNexusCode knowledge graph, community detection, blast radiusgithub.com/BlockSecCA/GitNexus
ClaudeLLM for query generation, reasoning, interpretationanthropic.com
joern-mcpMCP server wrapping Joern's HTTP APIgithub.com/BlockSecCA/joern-mcp
vulnerable-appTarget application: intentionally vulnerable Express/Sequelize app (107 challenges)github.com/BlockSecCA/vulnerable-app

Standards

ReferenceURL
OWASP Top 10: A06:2021 Insecure Designowasp.org/Top10/A06
CWE (Common Weakness Enumeration)cwe.mitre.org
CPGQL Documentationdocs.joern.io/cpgql

Papers

PaperAuthorsURL
LLMxCPG Lekssays, Mouhcine, Tran, Yu, Khalil (2025) arxiv.org/abs/2507.16585
Code Property Graphs Yamaguchi, Golde, Arp, Rieck (2014) ieeexplore.ieee.org/…