Privacy guardrails for AI agents: making unsafe architecture harder to ship

Alex Bit

8 min read

May 26, 2026

Coding agents replicate unsafe architecture at scale

In our earlier post on privacy by design, we focused on the business impact of privacy incidents: compliance risk, legal exposure, and customer trust erosion after sensitive data leaks into the wrong systems.

This post focuses on the engineering side.

We’ll cover common ways AI agents and coding assistants introduce privacy risk into production codebases, along with a sample mining codemod teams can use to catch these patterns before they hit production.

AI generated code amplifies existing architectural weaknesses. Weak tenant isolation, inconsistent authorization checks, and unsafe logging patterns spread quickly across services and AI systems.

Most of this code looks correct. It compiles, passes tests, and survives review.

That’s what makes these issues dangerous.

The goal is to help teams move fast with AI agents while guardrails catch unsafe patterns before they become incidents.

1. Unsafe autonomous tool execution

Autonomous agents create a different class of risk.

Many systems expose tools dynamically:

Loading code sample...

That gives models the ability to:

delete records
export data
trigger workflows
send emails
modify production systems

The risk is not just prompt injection. It is emergent behavior.

Agents chain actions in ways developers did not explicitly design, creating edge cases like:

recursive workflows
mass updates
unauthorized exports
infinite loops
indirect destructive actions

A safer model is policy-aware execution:

Loading code sample...

With centralized policies controlling dangerous operations:

Loading code sample...

This becomes critical once organizations have:

hundreds of tools
multiple agent frameworks
internal copilots
workflow automation systems

Without centralized governance, permissions become impossible to reason about.

To demonstrate how to create guardrails tailored to your codebase, I built a mining codemod for a popular open-source project that can detect the following anti-patterns.

Rule ID	Anti-pattern	Detects
RAG-001	`scope_filter_gap`	FTS accepts `directory` but never applies it in SQL — `@folder` keyword search is unscoped
RAG-002	`scope_filter_gap`	Lance vector results re-fetched from SQLite by UUID only, no `artifact_id` filter
RAG-003	`scope_filter_gap`	Code snippet loaded by ID with no workspace/tag join
TOOL-002	`autonomous_tool_execution`	Edit tools hardcoded to `allowedWithoutPermission` — writes auto-run without approval
TOOL-020	`autonomous_tool_execution`	Retrieval pipeline calls `callBuiltInTool()` directly, bypassing all policy gates

I built the codemod in less than 15 minutes using Cursor with Codemod MCP and skills installed (via npx codemod ai) and then I used that to create the below Insights dashboard to view the results and prioritize the fixes.

View of Codemod Insights in table view, powered by the continue-privacy-mining codemod

I’ll leave the upcoming cases as homework for you. If you run into issues building tailored codemods for your codebase, feel free to join our community and reach out. We’re happy to help you get up and running quickly.

2. AI agents getting broad production permissions

Most AI agent architectures start overprivileged.

An engineer wants an agent to:

create Jira tickets
summarize incidents
search Slack
update Salesforce
send emails

The fastest implementation is broad credentials:

Loading code sample...

It works immediately. It also gives agents access to data they should never touch.

Traditional internal tools usually have:

fixed workflows
predictable execution paths
limited actions

Agents do not.

They chain tools dynamically based on prompts, memory, retrieved content, and intermediate reasoning.

That makes broad credentials dangerous because:

prompts can alter execution
retrieved documents can influence actions
prompt injection can redirect workflows
agents can traverse unrelated systems

A support agent summarizing tickets should not be able to:

export customer records
access executive Slack channels
modify billing systems
retrieve unrelated tenant data

But broad runtime credentials make all of that possible.

A safer pattern is capability-scoped execution:

Loading code sample...

Every tool invocation then enforces authorization:

Loading code sample...

Now:

every action is authenticated
every action is authorized
every action is auditable
agents cannot exceed granted permissions

The key architectural rule:

The agent should not decide what it is allowed to do.

The platform should.

3. Prompt injection causing data exfiltration

Many teams treat prompt injection like a unique LLM vulnerability.

It is usually just untrusted input reaching privileged execution.

A common implementation looks like this:

Loading code sample...

Now someone uploads this into the knowledge base:

Loading code sample...

The model now receives:

trusted system instructions
untrusted retrieved content
user input

…all flattened into a single string.

That is the architectural flaw.

Most teams focus on filtering malicious prompts after the fact. The more important fix is enforcing trust boundaries structurally.

Safer systems separate:

system instructions
developer instructions
retrieved content
user input

Example:

Loading code sample...

Retrieval pipelines should also enforce:

sanitization
instruction stripping
injection detection
audit logging

The deeper issue is organizational.

Many companies still let each team build:

its own prompt builder
its own retrieval layer
its own injection handling
its own memory architecture

That does not scale safely, especially once AI-generated code starts replicating those patterns everywhere.

4. AI generated logging leaking customer data

AI systems often overlog by default.

Generated code frequently looks like this:

Loading code sample...

Sensitive data quickly spreads into:

observability systems
log pipelines
analytics tools
incident tooling
third-party vendors

Logs also tend to have:

weaker retention controls
broader employee access
weaker deletion guarantees
fewer tenant boundaries

That creates compliance and security risks:

GDPR deletion failures
SOC2 violations
insider access exposure
leaked secrets in debugging tools

Developers often underestimate how much data AI systems process internally:

prompts
retrieval context
embeddings
tool outputs
memory state

A safer approach is privacy-aware logging:

Loading code sample...

The SDK then enforces:

redaction
field blocking
hashing
classification
retention policies

Unsafe fields should fail loudly:

Loading code sample...

Result:

Error: Unsafe field detected: password

That friction is intentional.

Good platform design should make unsafe behavior difficult.

5. AI generated SQL bypassing tenant isolation

This issue looks simple, but causes major failures in multi-tenant systems.

An engineer asks AI to generate a query:

Loading code sample...

The query works. It also ignores tenant isolation entirely.

This is a common failure mode in AI-generated backend code:

the local task is correct
the global architecture is violated

The model optimized for:

get unpaid invoices

Not:

preserve tenant isolation across a production system

These bugs are easy to miss in review because the query looks reasonable.

A safer pattern pushes isolation into the data layer:

Loading code sample...

Then tenant scoping is enforced automatically:

Loading code sample...

Better yet, enforce isolation with row-level security in the database itself.

That shifts isolation from:

developers must remember

to:

unsafe queries are rejected automatically

That is where most enterprise AI architectures are heading.

6. Cross tenant data leakage in RAG systems

This is one of the most common AI privacy failures today.

A team builds an internal copilot or chatbot with RAG:

Loading code sample...

Nothing looks obviously wrong. That is why these systems reach production.

The problem is missing tenant isolation.

The vector database contains embeddings from multiple customers:

support tickets
internal docs
PDFs
Slack exports
CRM notes
knowledge base content

Without tenant filtering, retrieval becomes global.

Example:

Company A asks about onboarding
similarity search returns Company B’s HR document
the model summarizes it
nobody notices until screenshots appear in Slack

These failures are easy to miss because:

retrieval works correctly at a technical level
tests use synthetic data
local environments rarely model multi-tenant systems
reviews focus on functionality, not retrieval boundaries

AI code generation makes this worse because most tutorials ignore tenant isolation entirely.

A safer pattern enforces isolation in the platform layer:

Loading code sample...

Now tenant scoping is automatic:

Loading code sample...

This changes the engineering model completely.

Instead of relying on developers to remember:

tenant filters
audit logging
auth propagation
query scoping

…the platform enforces them automatically.

That matters because AI systems generate large amounts of "mostly correct" code, and humans are bad at spotting missing security boundaries inside otherwise clean implementations.

7. Shared memory systems leaking customer context

Memory systems are becoming standard in AI applications, and many are architected unsafely.

A common pattern looks like this:

Loading code sample...

The problem is the memory layer often:

lacks tenant partitioning
shares embeddings globally
mixes sessions
reuses retrieval indexes

That can expose:

another customer’s conversation history
unrelated support interactions
internal employee notes
prompts from another tenant

These bugs are especially dangerous because they are intermittent. They do not fail consistently enough to get caught quickly.

A safer architecture isolates memory structurally:

Loading code sample...

Even better:

Loading code sample...

Now isolation is enforced at the infrastructure layer, not just in application logic.

That matters because application logic eventually drifts. Infrastructure-level enforcement is much harder to bypass accidentally.

Move privacy enforcement into the platform layer

Traditional security models assumed:

humans write code slowly
architecture evolves gradually
reviews catch dangerous patterns

AI breaks those assumptions.

Now:

unsafe abstractions replicate instantly
generated code overwhelms review capacity
architectural mistakes spread across repos quickly

The organizations handling this well treat privacy as a platform engineering problem.

Not a documentation problem.

Not a training problem.

Not a “developers should be more careful” problem.

A systems design problem.

The fix is moving security and privacy controls into the platform itself:

tenant isolation by default
capability-scoped access
centralized policy enforcement
privacy-aware logging
infrastructure-level memory isolation

Because once AI starts generating most of the code, safe defaults matter more than good intentions.

Build codemods and guardrails tailored to your codebase so unsafe patterns cannot silently spread.

Enforce things like:

tenant-scoped queries
safe logging APIs
auth propagation
policy-aware tool execution
restricted agent capabilities

The goal is simple: make the secure path the default path.

Docs: Codemod CLI Docs

Questions about codemods, Insights, campaigns, or automations? Contact Codemod

Privacy guardrails for AI agents: making unsafe architecture harder to ship

Alex Bit

8 min read

May 26, 2026

Coding agents replicate unsafe architecture at scale

This post focuses on the engineering side.

AI generated code amplifies existing architectural weaknesses. Weak tenant isolation, inconsistent authorization checks, and unsafe logging patterns spread quickly across services and AI systems.

Most of this code looks correct. It compiles, passes tests, and survives review.

That’s what makes these issues dangerous.

The goal is to help teams move fast with AI agents while guardrails catch unsafe patterns before they become incidents.

1. Unsafe autonomous tool execution

Autonomous agents create a different class of risk.

Many systems expose tools dynamically:

Loading code sample...

That gives models the ability to:

delete records
export data
trigger workflows
send emails
modify production systems

The risk is not just prompt injection. It is emergent behavior.

Agents chain actions in ways developers did not explicitly design, creating edge cases like:

recursive workflows
mass updates
unauthorized exports
infinite loops
indirect destructive actions

A safer model is policy-aware execution:

Loading code sample...

With centralized policies controlling dangerous operations:

Loading code sample...

This becomes critical once organizations have:

hundreds of tools
multiple agent frameworks
internal copilots
workflow automation systems

Without centralized governance, permissions become impossible to reason about.

To demonstrate how to create guardrails tailored to your codebase, I built a mining codemod for a popular open-source project that can detect the following anti-patterns.

Rule ID	Anti-pattern	Detects
RAG-001	`scope_filter_gap`	FTS accepts `directory` but never applies it in SQL — `@folder` keyword search is unscoped
RAG-002	`scope_filter_gap`	Lance vector results re-fetched from SQLite by UUID only, no `artifact_id` filter
RAG-003	`scope_filter_gap`	Code snippet loaded by ID with no workspace/tag join
TOOL-002	`autonomous_tool_execution`	Edit tools hardcoded to `allowedWithoutPermission` — writes auto-run without approval
TOOL-020	`autonomous_tool_execution`	Retrieval pipeline calls `callBuiltInTool()` directly, bypassing all policy gates

2. AI agents getting broad production permissions

Most AI agent architectures start overprivileged.

An engineer wants an agent to:

create Jira tickets
summarize incidents
search Slack
update Salesforce
send emails

The fastest implementation is broad credentials:

Loading code sample...

It works immediately. It also gives agents access to data they should never touch.

Traditional internal tools usually have:

fixed workflows
predictable execution paths
limited actions

Agents do not.

They chain tools dynamically based on prompts, memory, retrieved content, and intermediate reasoning.

That makes broad credentials dangerous because:

prompts can alter execution
retrieved documents can influence actions
prompt injection can redirect workflows
agents can traverse unrelated systems

A support agent summarizing tickets should not be able to:

export customer records
access executive Slack channels
modify billing systems
retrieve unrelated tenant data

But broad runtime credentials make all of that possible.

A safer pattern is capability-scoped execution:

Loading code sample...

Every tool invocation then enforces authorization:

Loading code sample...

Now:

every action is authenticated
every action is authorized
every action is auditable
agents cannot exceed granted permissions

The key architectural rule:

The agent should not decide what it is allowed to do.

The platform should.

3. Prompt injection causing data exfiltration

Many teams treat prompt injection like a unique LLM vulnerability.

It is usually just untrusted input reaching privileged execution.

A common implementation looks like this:

Loading code sample...

Now someone uploads this into the knowledge base:

Loading code sample...

The model now receives:

trusted system instructions
untrusted retrieved content
user input

…all flattened into a single string.

That is the architectural flaw.

Most teams focus on filtering malicious prompts after the fact. The more important fix is enforcing trust boundaries structurally.

Safer systems separate:

system instructions
developer instructions
retrieved content
user input

Example:

Loading code sample...

Retrieval pipelines should also enforce:

sanitization
instruction stripping
injection detection
audit logging

The deeper issue is organizational.

Many companies still let each team build:

its own prompt builder
its own retrieval layer
its own injection handling
its own memory architecture

That does not scale safely, especially once AI-generated code starts replicating those patterns everywhere.

4. AI generated logging leaking customer data

AI systems often overlog by default.

Generated code frequently looks like this:

Loading code sample...

Sensitive data quickly spreads into:

observability systems
log pipelines
analytics tools
incident tooling
third-party vendors

Logs also tend to have:

weaker retention controls
broader employee access
weaker deletion guarantees
fewer tenant boundaries

That creates compliance and security risks:

GDPR deletion failures
SOC2 violations
insider access exposure
leaked secrets in debugging tools

Developers often underestimate how much data AI systems process internally:

prompts
retrieval context
embeddings
tool outputs
memory state

A safer approach is privacy-aware logging:

Loading code sample...

The SDK then enforces:

redaction
field blocking
hashing
classification
retention policies

Unsafe fields should fail loudly:

Loading code sample...

Result:

Error: Unsafe field detected: password

That friction is intentional.

Good platform design should make unsafe behavior difficult.

5. AI generated SQL bypassing tenant isolation

This issue looks simple, but causes major failures in multi-tenant systems.

An engineer asks AI to generate a query:

Loading code sample...

The query works. It also ignores tenant isolation entirely.

This is a common failure mode in AI-generated backend code:

the local task is correct
the global architecture is violated

The model optimized for:

get unpaid invoices

Not:

preserve tenant isolation across a production system

These bugs are easy to miss in review because the query looks reasonable.

A safer pattern pushes isolation into the data layer:

Loading code sample...

Then tenant scoping is enforced automatically:

Loading code sample...

Better yet, enforce isolation with row-level security in the database itself.

That shifts isolation from:

developers must remember

to:

unsafe queries are rejected automatically

That is where most enterprise AI architectures are heading.

6. Cross tenant data leakage in RAG systems

This is one of the most common AI privacy failures today.

A team builds an internal copilot or chatbot with RAG:

Loading code sample...

Nothing looks obviously wrong. That is why these systems reach production.

The problem is missing tenant isolation.

The vector database contains embeddings from multiple customers:

support tickets
internal docs
PDFs
Slack exports
CRM notes
knowledge base content

Without tenant filtering, retrieval becomes global.

Example:

Company A asks about onboarding
similarity search returns Company B’s HR document
the model summarizes it
nobody notices until screenshots appear in Slack

These failures are easy to miss because:

retrieval works correctly at a technical level
tests use synthetic data
local environments rarely model multi-tenant systems
reviews focus on functionality, not retrieval boundaries

AI code generation makes this worse because most tutorials ignore tenant isolation entirely.

A safer pattern enforces isolation in the platform layer:

Loading code sample...

Now tenant scoping is automatic:

Loading code sample...

This changes the engineering model completely.

Instead of relying on developers to remember:

tenant filters
audit logging
auth propagation
query scoping

…the platform enforces them automatically.

That matters because AI systems generate large amounts of "mostly correct" code, and humans are bad at spotting missing security boundaries inside otherwise clean implementations.

7. Shared memory systems leaking customer context

Memory systems are becoming standard in AI applications, and many are architected unsafely.

A common pattern looks like this:

Loading code sample...

The problem is the memory layer often:

lacks tenant partitioning
shares embeddings globally
mixes sessions
reuses retrieval indexes

That can expose:

another customer’s conversation history
unrelated support interactions
internal employee notes
prompts from another tenant

These bugs are especially dangerous because they are intermittent. They do not fail consistently enough to get caught quickly.

A safer architecture isolates memory structurally:

Loading code sample...

Even better:

Loading code sample...

Now isolation is enforced at the infrastructure layer, not just in application logic.

That matters because application logic eventually drifts. Infrastructure-level enforcement is much harder to bypass accidentally.

Move privacy enforcement into the platform layer

Traditional security models assumed:

humans write code slowly
architecture evolves gradually
reviews catch dangerous patterns

AI breaks those assumptions.

Now:

unsafe abstractions replicate instantly
generated code overwhelms review capacity
architectural mistakes spread across repos quickly

The organizations handling this well treat privacy as a platform engineering problem.

Not a documentation problem.

Not a training problem.

Not a “developers should be more careful” problem.

A systems design problem.

The fix is moving security and privacy controls into the platform itself:

tenant isolation by default
capability-scoped access
centralized policy enforcement
privacy-aware logging
infrastructure-level memory isolation

Because once AI starts generating most of the code, safe defaults matter more than good intentions.

Build codemods and guardrails tailored to your codebase so unsafe patterns cannot silently spread.

Enforce things like:

tenant-scoped queries
safe logging APIs
auth propagation
policy-aware tool execution
restricted agent capabilities

The goal is simple: make the secure path the default path.

Docs: Codemod CLI Docs

Questions about codemods, Insights, campaigns, or automations? Contact Codemod

Privacy guardrails for AI agents: making unsafe architecture harder to ship

Coding agents replicate unsafe architecture at scale

1. Unsafe autonomous tool execution

2. AI agents getting broad production permissions

3. Prompt injection causing data exfiltration

4. AI generated logging leaking customer data

5. AI generated SQL bypassing tenant isolation

6. Cross tenant data leakage in RAG systems

7. Shared memory systems leaking customer context

Move privacy enforcement into the platform layer

Prefer to schedule a call?

Privacy guardrails for AI agents: making unsafe architecture harder to ship

Coding agents replicate unsafe architecture at scale

1. Unsafe autonomous tool execution

2. AI agents getting broad production permissions

3. Prompt injection causing data exfiltration

4. AI generated logging leaking customer data

5. AI generated SQL bypassing tenant isolation

6. Cross tenant data leakage in RAG systems

7. Shared memory systems leaking customer context

Move privacy enforcement into the platform layer

Prefer to schedule a call?