OWASP A06: Insecure Design

Can an LLM find design flaws
in code it can't read?

LLMs review code one file at a time. Design flaws live between files.

Joern CPG Express + Sequelize OWASP A06 LLM-Assisted

github.com/BlockSecCA/llm-cpg-exploration

02

The keyhole problem

LLMs see code through a keyhole: one file, one function, one context window at a time.

Local bugs are visible, since an SQL injection lives in one function
Design flaws are cross-cutting: they live in relationships between components
"No auth on this route" requires knowing all routes, all middleware, and which routes lack it
Design flaws are defined by absence and relationships

03

Three approaches, three blind spots

Bug finding is a solved problem. Design analysis is not. Design flaws require negative queries (what's missing?) and cross-file relationship analysis (what connects to what?).

LLM reads code

Understands intent, finds local bugs. But can't hold architecture. It sees one file at a time.

SAST (Semgrep)

Sees entire codebase, 3,000+ rules for injection & XSS. No semantic reasoning. It can't ask "why is this wrong?"

The gap

Pattern matchers can't express "find all routes that do NOT have auth middleware." Design analysis needs something else.

04

Three graphs, merged into one

A Code Property Graph overlays three views of the same code. Together they capture structure, control flow, and data movement: 443,000 nodes for the target app.

AST Abstract Syntax Tree: the code’s grammatical structure. Functions contain parameters, if-statements contain branches. This is what your editor sees.

CFG Control Flow Graph: the possible execution paths. Which branch runs after the if? Where does the loop return? Dashed blue lines trace every route the program can take.

PDG Program Dependence Graph: where data actually flows. A parameter feeds into a function call three levels away. Green lines reveal the invisible wiring that makes taint analysis possible.

Toggle each layer to see how structure, execution paths, and data flow overlay on the same code.

05

Two tools, two levels of abstraction

Joern

CPG Engine

Expression-level: every call, every argument, every data flow
Query language: CPGQL (Scala-based traversals)
Strengths: negative sub-traversals, taint analysis, get_source

Answered all 20 questions (11 solo)

GitNexus

Code Knowledge Graph

Symbol-level: functions, files, relationships, modules
Query language: Cypher + semantic search
Strengths: blast radius, community detection, execution flows

Contributed on 8 questions (architectural context)

06

The LLM doesn't read code. It queries the graph

The CPG encodes cross-file relationships in a queryable form
The LLM formulates a question → Joern traverses → short structured answer
Negative sub-traversals: find what's missing, not what's there
Pattern matchers can't express "find all routes that do NOT have auth middleware"

The graph carries the cross-partition knowledge. The LLM provides the reasoning. Neither works alone.

07

LLMs can query the graph, but someone still has to point

LLMxCPG (Lekssays et al., 2025) combines fine-tuned LLMs with Joern's CPG for vulnerability detection.

What it proved

CPGQL is learnable (but needs fine-tuning)
CPG slicing reduces code by 68–91%
15–40% F1 improvement over baselines

What it doesn't do

No architectural discovery
No negative queries ("what's missing?")
No whole-codebase analysis

HUMAN

CWE + Code

→

LLMxCPG-Q

→

Joern

→

Code Slice

→

LLMxCPG-D

→

Vuln?

Yes / No

↑ The human provides the CWE and the code. The bottleneck remains

08

Two passes: learn the architecture, then find the flaws

Input

Stack Profile

package.json → Express, Sequelize, custom auth

↓

Pass 1

Generic Structural Queries

9 questions: routes? middleware? validation?

↓

LLM

Structured facts → targeted CPGQL

No source code, only query results

↓

Pass 2

Design Analysis

8 CWEs, 11 queries → 30 findings

LLMxCPG

Human provides CWE + snippet

OUR APPROACH

Framework name only → discover everything

09

9 questions, zero lines of code read

The LLM formulated CPGQL queries using only "this is an Express app with Sequelize."

S1 Routes

109 handlers

All in server.ts, 3 route groups

click to flip

// CPGQL
cpg.call.name("get|post|put|delete")
.where(_.file.name("server.ts"))
.l.size
// → 109

S2 Middleware

9 reusable

security.* pattern identified

click to flip

// CPGQL
cpg.call.name("get|post|...")
  .flatMap(c => c.argument
    .map(a => a.code))
  .groupBy(identity).l

S3 Auth

Custom, not passport

isAuthorized, denyAll, appendUserId

click to flip

// CPGQL + GitNexus
cpg.method
  .where(_.file.name(
    "insecurity.ts"))
  .name.l
// → 19 functions

S4 DB Ops

33 files

Sequelize + MongoDB, writes in 20 route files

click to flip

// CPGQL
cpg.call.name("findOne|findAll
|create|update|destroy")
.groupBy(_.file.name).l
// → 33 server-side files

S5 Validation

Zero

No library, no ad-hoc validation

click to flip

// CPGQL
cpg.call.name("check|body
  |query|param
  |validationResult")
  .l.size
// → 0

S6 Errors

1 global handler

Leaks raw errors + Express version

click to flip

// CPGQL
cpg.call.name("status")
  .where(_.code(
    ".*status\\((4|5)\\d\\d\\).*"))
  .l
// → 30+ files

S7 Uploads

4 routes

Validators exist but always pass

click to flip

// CPGQL
cpg.call.name("single|array")
  .where(_.code(
    ".*(upload|multer).*"))
  .l
// → 4 multer configs

S8 Rate Limits

4 of 109 (3.7%)

Only reset-password + 2FA

click to flip

// CPGQL
cpg.call.name("rateLimit")
  .map(c => (
    c.code, c.lineNumber))
  .l
// → 4 endpoints

S9 WebSocket

1 handler, no auth

socket.io, no JWT on connection

click to flip

// CPGQL
cpg.call.name("on")
  .where(_.code(
    ".*(connection).*"))
  .l
// → 1 handler

10

8 CWEs, 11 queries, 30 confirmed findings

CWE FINDING SEVERITY

269 58 routes (53%) have no auth middleware Critical ▶

Negative sub-traversal on app.verb() calls in server.ts, filtering out routes with security.* arguments. Cross-referenced with path-scoped app.use() auth. Notable: GET /rest/user/change-password, admin endpoints, all file uploads left unprotected.

311 MD5 passwords, hardcoded HMAC key, plaintext TOTP Critical ▶

hash() uses crypto.createHash('md5') with no salt, no stretching. hmac() key 'pa4qacea4VK9t9nGv7yZtwmj' hardcoded in source. Card numbers stored as plaintext integers. TOTP secrets unencrypted in DB.

434 4 upload routes, validators exist but always pass High ▶

checkFileType checks extension for challenge solving but always calls next(). checkUploadSize always calls next(). No magic byte validation, no content-type check, no multer fileFilter configured.

501 22 DB ops use untrusted input directly Critical ▶

Mass assignment in POST /api/Users: full req.body to Sequelize with no field allowlist. Chatbot query text overwrites username. req.body.UserId used instead of middleware-injected value in 14+ files.

602 5 validation checks across 109 handlers (4.6%) High ▶

All 5 are password-emptiness checks. Zero type coercion, zero regex, zero schema validation. Model-layer sanitization exists but is output encoding, not input validation.

653 Unauthenticated users reach hash, JWT signing High ▶

changePassword calls hash(), resetPassword calls hmac(), chatbot calls authorize() and verify(). Prometheus metrics endpoint exposed without auth.

799 Login has no rate limiting, reset uses spoofable header Medium ▶

4/109 routes rate-limited (reset-password + 2FA only). All use 5min/100req. Reset-password keyGenerator uses X-Forwarded-For, which is trivially bypassable.

841 Purchase workflow has no state machine High ▶

Checkout verifies basket and delivery but NOT address or payment card. No session-level step tracking. Angular frontend stores state in sessionStorage. Coupon applied to any basket without ownership check.

11

30 findings, 30 confirmed, zero false positives

Every finding maps to at least one known exploitable challenge in vulnerable-app.

CWE FINDING MATCH EXAMPLE

269 58 unprotected routes 4 ✓ Admin Section ▶

Admin Section: admin panel accessible without auth. Change User1's Password: password change via GET, no auth. View Basket: another user's basket viewable. Five-Star Feedback: delete feedback without auth.

311 MD5, hardcoded HMAC 3 ✓ Weird Crypto ▶

Password Strength: admin password crackable (MD5 reversible). Weird Crypto: "algorithm it should not use." Two Factor Auth: TOTP secret stored unsafely.

434 Fake file validators 3 ✓ Upload Type ▶

Upload Size: upload >100kB (checkUploadSize always passes). Upload Type: upload non-PDF/ZIP (checkFileType always passes). Arbitrary File Write: overwrite legal information file.

501 22 untrusted DB ops 5 ✓ Admin Registration ▶

Admin Registration: mass assignment sets role:admin. Forged Feedback: post as another user. Forged Review: edit any review. Manipulate Basket: put product in another's basket. NoSQL Manipulation: update multiple reviews.

602 4.6% validation 4 ✓ Zero Stars ▶

Zero Stars: UI prevents 0 stars, server accepts it. Empty User Registration: register with empty fields. Repetitive Registration: DRY violation. Payback Time: negative quantities not validated.

653 Unprotected security functions 2 ✓ Exposed Metrics ▶

Exposed Metrics: Prometheus endpoint without auth. Change User1's Password: password change reaches hash function without auth.

799 Login not rate-limited 2 ✓ CAPTCHA Bypass ▶

CAPTCHA Bypass: submit 10+ feedbacks in 20 seconds. Reset User3's Password: brute force despite rate limiting (bypassable).

841 No workflow state machine 3 ✓ Deluxe Fraud ▶

Deluxe Fraud: obtain membership without paying. Payback Time: order makes you rich (negative quantities). Expired Coupon: redeem expired coupon.

12

From reading every file to querying for answers

WITHOUT CPG

WITH CPG

5 / 974

functions read vs files in codebase (0.5% of the code)

The CPG didn't eliminate code reading; it reduced it to targeted verification. 5 of 11 Pass 2 queries needed get_source, reading 10–20 lines each.

13

What this doesn't prove (yet)

Single target codebase. All results come from one application. Not validated across multiple projects.
Intentionally vulnerable target. The application ships with documented vulnerabilities. Real codebases are messier.
Single framework. Express/Sequelize only. Django, Spring, Rails, FastAPI, all remain untested.
No SAST baseline. Didn't run Semgrep alongside for direct comparison of what each approach finds.
CPGQL queries not independently validated. No expert review for completeness; alternative query formulations may find more.
Human-in-the-loop. A human directed the session. This describes a workflow, not an automated pipeline.
OWASP A06 scope only. Injection (A03), XSS (A07), misconfiguration (A05) were not tested.
Static analysis only. No runtime behavior, no dynamic taint, no exploitation. Structural findings only.

14

Conclusion: Design analysis needs a structural bridge

1

Bug finding is solved. Design analysis is not.

SAST has 3,000+ rules for injection and XSS. Zero rules for "does this workflow enforce step ordering?" or "what percentage of routes lack auth?" Design flaws are architectural.

2

LLMs can't read their way to design flaws.

The keyhole problem is structural, not a context window limitation. Even with infinite context, file-by-file reading loses the cross-file relationships that define design.

3

The CPG is the right abstraction layer.

Not source code (too granular). Not pattern matching (no semantics, no absence detection). A structural graph the LLM can query, reason over, and verify against. Two passes: learn the architecture, then interrogate the design.

Validated on vulnerable-app (443K CPG nodes, 107 known challenges). 30 findings confirmed across 8 OWASP A06 categories. Zero false positives. All tools open source.

15

Tools and references

Tools used

Tool	Purpose	URL
Joern	Open-source CPG engine, CPGQL query language	joern.io
GitNexus	Code knowledge graph, community detection, blast radius	github.com/BlockSecCA/GitNexus
Claude	LLM for query generation, reasoning, interpretation	anthropic.com
joern-mcp	MCP server wrapping Joern's HTTP API	github.com/BlockSecCA/joern-mcp
vulnerable-app	Target application: intentionally vulnerable Express/Sequelize app (107 challenges)	github.com/BlockSecCA/vulnerable-app

Standards

Reference	URL
OWASP Top 10: A06:2025 Insecure Design	owasp.org/Top10/A06
CWE (Common Weakness Enumeration)	cwe.mitre.org
CPGQL Documentation	docs.joern.io/cpgql

Papers

Paper	Authors	URL
LLMxCPG	Lekssays, Mouhcine, Tran, Yu, Khalil (2025)	arxiv.org/abs/2507.16585
Code Property Graphs	Yamaguchi, Golde, Arp, Rieck (2014)	ieeexplore.ieee.org/…

Can an LLM find design flawsin code it can't read?

The keyhole problem

Three approaches, three blind spots

LLM reads code

SAST (Semgrep)

The gap

Three graphs, merged into one

Two tools, two levels of abstraction

Joern

GitNexus

The LLM doesn't read code. It queries the graph

LLMs can query the graph, but someone still has to point

What it proved

What it doesn't do

Two passes: learn the architecture, then find the flaws

9 questions, zero lines of code read

109 handlers

9 reusable

Custom, not passport

33 files

Zero

1 global handler

4 routes

4 of 109 (3.7%)

1 handler, no auth

8 CWEs, 11 queries, 30 confirmed findings

30 findings, 30 confirmed, zero false positives

From reading every file to querying for answers

What this doesn't prove (yet)

Conclusion: Design analysis needs a structural bridge

Bug finding is solved. Design analysis is not.

LLMs can't read their way to design flaws.

The CPG is the right abstraction layer.

Tools and references

Tools used

Standards

Papers

Can an LLM find design flaws
in code it can't read?