Strategic Framework
January 2026 · Vincent Verdet

The Enterprise Architect's Guide to AI Governance

Code Review, Security, Ownership & Audit Trails

18 min read

Introduction

As organisations advance beyond Level 3 on the AI development maturity scale, governance shifts from optional to essential. Traditional governance approaches—designed for code written by humans—don't translate directly to AI-generated code. The frameworks need rethinking.

The AI Development Maturity Model identifies Level 4 (Specification-Driven Development) as the point where governance frameworks become critical. At this level, AI agents execute implementation autonomously based on specifications. Without proper governance, organisations risk security vulnerabilities, compliance failures, and uncontrolled technical debt.

Governance for AI-generated code requires a different mental model. When someone asks "show me the prompt," they're asking the wrong question. How the code was built matters far less than what was built and whether it meets your standards.

Key Takeaway

Effective AI governance focuses on outcomes, not process. Your audit trail should contain documentation, tests, and compliance artifacts—not conversation logs with an AI. The artifact matters. The generation method is secondary.

The Four Pillars of AI Development Governance

Governing AI-generated code requires a structured framework built on four interconnected pillars. Each pillar addresses a critical question that organisations must answer as they scale AI-assisted development.

1
Code Review Policies

How do we validate AI-generated code when reviewers didn't write it? Risk-tiered review intensity ensures appropriate scrutiny without creating bottlenecks.

2
Security Scanning Workflows

What automated checks must AI-generated code pass before deployment? Integrated security gates catch vulnerabilities regardless of code origin.

3
Ownership Models

Who is accountable for AI-generated code quality, maintenance, and lifecycle? Clear ownership prevents orphan applications and ensures sustained support.

4
Audit Trails

What evidence demonstrates compliance and quality? Focus on what was built and verified, not how it was generated.

These pillars work together. Weak code review undermines security. Unclear ownership creates audit gaps. Missing audit trails make compliance impossible. The following sections detail each pillar.

Pillar 1: Code Review Policies

Code review for AI-generated code presents a paradox: reviewers must validate code they didn't write, often at volumes that would overwhelm traditional review processes. The solution isn't to abandon review—it's to make review smarter through risk-based tiering.

The Risk Classification Framework

Not all code carries equal risk. A utility script for data formatting doesn't warrant the same scrutiny as authentication logic. Classify every AI-generated deliverable by risk level, then apply appropriate review intensity.

C Critical Risk

Scope

Authentication systems, payment processing, PII handling, security controls, regulatory compliance logic, infrastructure configurations.

Review Intensity

Full manual review by senior engineer. Line-by-line examination of logic, security, and edge cases. Mandatory second reviewer for approval.

Rationale

Failures in critical systems have severe consequences—data breaches, financial loss, regulatory penalties. No automation can substitute for human judgment here.

H High Risk

Scope

External API integrations, database operations, business logic with financial impact, user-facing features, data transformations affecting downstream systems.

Review Intensity

Focused manual review of integration points, error handling, and business logic. Automated scanning required. Single senior reviewer sufficient.

Rationale

These components interact with external systems or impact business operations. Focused review catches integration failures and logic errors.

M Medium Risk

Scope

Internal utilities, reporting tools, UI components, test automation, documentation generation, non-production tooling.

Review Intensity

Automated scanning with spot-check review. Reviewer validates test coverage and samples key functionality. Any team member can approve.

Rationale

Impact is contained to internal processes. Automated checks catch most issues; spot checks verify AI didn't miss obvious problems.

L Low Risk

Scope

Prototypes, proof-of-concepts, local development tools, documentation, configuration for non-production environments.

Review Intensity

Automated validation only. Self-approval permitted if all automated checks pass. No manual review required.

Rationale

These deliverables have no production impact. Automated gates prevent obvious security issues; manual review adds overhead without proportional value.

What Reviewers Should Focus On

When reviewing AI-generated code, don't try to understand every line. Focus on what matters:

Logic Correctness

Does the code actually implement the requirement? AI can produce plausible-looking code that subtly misses the point.

Security Boundaries

Are inputs validated? Are outputs sanitised? Are permissions checked? AI often takes the happy path.

Integration Points

How does this code interact with existing systems? Are contracts respected? Are error cases handled?

Test Coverage

Do tests verify the actual requirements? Are edge cases covered? AI-generated tests can be superficial.

Review Checklist for AI-Generated Code

Requirements match implementation
Input validation present and appropriate
Error handling covers failure modes
No hardcoded secrets or credentials
Tests verify business logic, not just code paths
Dependencies are appropriate and up-to-date
Code follows established patterns and standards

Pillar 2: Security Scanning Workflows

AI-generated code can contain vulnerabilities—not because AI is malicious, but because it optimises for functionality over security unless explicitly instructed otherwise. Automated security scanning provides a consistent safety net regardless of code origin.

Essential Security Scans

1
Static Application Security Testing (SAST)

Analyses source code for security vulnerabilities without executing it. Catches issues like SQL injection patterns, insecure cryptography, and unsafe data handling. Must run on every commit.

2
Dependency Vulnerability Scanning

Checks all dependencies against known vulnerability databases. AI often suggests popular but outdated libraries. Must block deployment if critical vulnerabilities detected.

3
Secrets Detection

Scans for accidentally committed credentials, API keys, and tokens. AI can generate placeholder secrets that developers forget to replace. Must run pre-commit and in CI/CD.

4
Dynamic Application Security Testing (DAST)

Tests running applications for vulnerabilities by simulating attacks. Essential for web applications. Should run in staging environment before production deployment.

Pipeline Integration

Security scans must be automated, mandatory, and blocking. A scan that runs but doesn't prevent deployment is security theatre.

Pre-Commit

Scans

Secrets detection, basic linting

Action

Block commit if secrets detected

CI Pipeline

Scans

SAST, dependency scanning, unit tests

Action

Block merge if critical issues found

Pre-Deployment

Scans

DAST, integration tests, compliance checks

Action

Block deployment if gates fail

AI-Specific Vulnerabilities to Watch

AI-generated code exhibits certain vulnerability patterns more frequently than human-written code:

Insufficient Input Validation

AI often assumes inputs are well-formed. Explicitly request validation for all external inputs.

Verbose Error Messages

AI generates helpful error messages that may expose system internals. Sanitise error responses.

Default Configurations

AI uses default settings which may be insecure. Review all configuration for production appropriateness.

Outdated Patterns

AI training data includes deprecated security practices. Verify approaches against current standards.

Critical Principle

Security scans are not optional, not advisory, and not something to "fix later." If a scan fails, deployment stops. No exceptions. This is the price of accelerated development—automated gates must be non-negotiable.

Pillar 3: Ownership Models

Who owns AI-generated code? This question trips up many organisations. Some treat it as a special category requiring new ownership structures. That's a mistake. AI-generated code should follow the same ownership principles as any other code.

Product Team Ownership

The generating method is irrelevant to ownership. Code belongs to the product team responsible for the capability it enables. This principle ensures:

Clear Accountability

One team is responsible for the code's quality, security, and behaviour. No ambiguity about who fixes issues or makes decisions.

Sustained Maintenance

The team that benefits from the code maintains it. No orphan applications slowly degrading without attention.

Standard Lifecycle

AI-generated code follows the same retirement, deprecation, and upgrade processes as all other code.

Domain Expertise

The team with the deepest understanding of the business domain owns the code that implements it.

Preventing Orphan Applications

AI acceleration makes it easy to spin up new applications. Without governance, this leads to proliferation—dozens of small tools created for momentary needs, then forgotten. Establish guardrails:

  1. Registration Requirement: Every deployed application must be registered in a central catalogue with identified owner, purpose, and criticality classification.
  2. Periodic Review: Quarterly review of all applications. Owners must confirm continued need and commit to maintenance, or schedule retirement.
  3. Automatic Alerts: Applications without activity (deployments, commits, or confirmed reviews) for 6 months trigger owner notification and escalation.
  4. Retirement Process: Clear process for decommissioning applications including data handling, dependency notification, and archive procedures.

Ownership Transfer

Teams change. People leave. Applications outlive their original creators. Define a clear ownership transfer process:

Documentation Handoff

Architecture decisions, operational runbooks, and known issues must be documented before transfer.

Knowledge Transfer

Outgoing team conducts walkthrough sessions covering architecture, dependencies, and operational procedures.

Support Transition

Overlap period where both teams handle incidents, gradually shifting responsibility to new owners.

Registry Update

Central catalogue updated to reflect new ownership with clear effective date and contact information.

Pillar 4: Audit Trails—What Actually Matters

This is where most organisations get AI governance wrong. When auditors or compliance teams ask about AI-generated code, the instinct is to show them the prompts. "Look, here's exactly what we told the AI to build." This approach is misguided.

"Show me the prompt" is the wrong question. How the code was built matters far less than what was built and whether it meets your standards.

The Misconception

Prompt logs are not an audit trail. They're a record of conversation, not a record of compliance. Consider: if a developer wrote code by copying from Stack Overflow, would you audit their browser history? The generation method is irrelevant. What matters is whether the resulting code meets your requirements, passes your tests, and complies with your policies.

What Belongs in an Audit Trail

1
Requirements Documentation

What was the code supposed to do? User stories, acceptance criteria, or specifications that define the expected behaviour. This is your baseline for validating the result.

2
Architecture Decision Records

Why were specific approaches chosen? ADRs document the reasoning behind significant technical decisions, enabling future teams to understand context.

3
Test Coverage and Results

What was tested and did it pass? Unit tests verify the code does what requirements specify. Test reports prove verification actually occurred.

4
Security Scan Reports

What security checks passed? SAST, DAST, dependency scans—reports showing what was checked, when, and results.

5
Review Approvals

Who approved what, when? Code review records showing human validation occurred according to risk-tiered policies.

6
Deployment Records

What was deployed where, when? Complete chain from commit to production, with all gates passed and approvals granted.

What Does NOT Need Auditing

Not Required in Audit Trails

Prompts used to generate code
Conversation logs with AI assistants
Iteration history during development
Which AI model or tool was used
Developer's problem-solving process

These items are interesting for process improvement but irrelevant for compliance. An auditor asking "was this code AI-generated?" is asking the wrong question. The right question is: "Does this code meet requirements, pass tests, and comply with security policies?"

Mapping to Compliance Frameworks

Standard compliance frameworks already provide what you need—apply them to AI-generated code the same way you would any other code. Don't create special AI categories; use established principles.

ISO 27001

Relevant Controls

A.14.2 (Secure Development), A.12.1 (Operational Procedures)

Application

Security scanning, change management, documentation

SOC 2

Relevant Criteria

CC7.1 (System Operations), CC8.1 (Change Management)

Application

Review processes, deployment controls, testing

GDPR

Relevant Articles

Art. 25 (Data Protection by Design), Art. 32 (Security)

Application

Privacy review, security testing, documentation

You don't need framework certification to benefit from framework principles. Adopt the practices that make sense—documented requirements, verified testing, controlled deployment—without the overhead of formal certification unless your business requires it.

Governance by Maturity Level

Governance requirements scale with AI development maturity. Applying Level 5 governance to a Level 2 organisation creates friction without value. Match your governance intensity to your actual practices.

3 Level 3: Iterative Collaboration

Characteristics

Developers prompt, review outputs, provide feedback, and iterate. Human judgment remains central. AI accelerates but doesn't autonomously deliver.

Governance Needs

Lightweight guidelines, standard code review with AI awareness, basic security scanning, informal ownership.

Key Focus

Ensure existing code review practices apply to AI-generated code. Introduce security scanning if not present. Document that AI is being used.

4 Level 4: Specification-Driven Development

Characteristics

Detailed specifications drive AI implementation. AI agents execute autonomously within defined scope. Human review validates outputs.

Governance Needs

Risk classification framework, tiered review policies, automated security gates, formal ownership model, documented audit trails.

Key Focus

Establish all four governance pillars. Automate security scanning in CI/CD. Define risk tiers and review requirements. Create audit trail templates.

5-6 Levels 5-6: Autonomous Pipeline & Operations

Characteristics

Multiple AI agents collaborate on implementation, testing, and review. Humans supervise the pipeline rather than individual outputs.

Governance Needs

Comprehensive automated validation, AI-assisted review with human oversight, continuous compliance monitoring, full audit automation.

Key Focus

Shift from reviewing code to reviewing pipelines. Automated governance that scales with output volume. Exception-based human intervention.

Implementation Framework

Phase 1: Foundation 0–3 months

  1. Establish Risk Classification: Define what constitutes critical, high, medium, and low risk for your organisation. Document criteria and examples. Train teams on classification.
  2. Define Review Policies: Create tiered review requirements mapped to risk levels. Specify who can approve what, and what automated checks are required at each tier.
  3. Implement Security Scanning: Deploy SAST, dependency scanning, and secrets detection. Integrate into existing CI/CD or create basic pipeline. Start with blocking critical issues only.
  4. Document Ownership Model: Clarify that product teams own all code regardless of generation method. Create application registry if none exists. Define ownership transfer process.

Phase 2: Integration 3–6 months

  1. Automate Pipeline Gates: Security scans become mandatory and blocking. Review approvals required before merge. Risk classification automated where possible.
  2. Generate Audit Artifacts: Automate creation of security reports, test coverage summaries, and deployment records. Store in accessible, searchable repository.
  3. Train Teams: Educate developers on governance requirements. Train reviewers on AI-specific review focus areas. Align all teams on ownership expectations.
  4. Establish Metrics: Track scan pass/fail rates, review cycle times, and coverage metrics. Use data to refine policies and identify friction points.

Phase 3: Maturation 6–12 months

  1. Refine Based on Data: Adjust risk classification based on actual incidents and near-misses. Tune scan sensitivity to reduce false positives. Optimise review requirements.
  2. Expand Automation: Implement DAST for web applications. Add compliance-specific scans if required. Automate audit report generation for stakeholders.
  3. Continuous Improvement: Regular retrospectives on governance effectiveness. Feedback loops from incidents to policy updates. Evolve with AI capability growth.
  4. Framework Alignment: If pursuing certification, map existing practices to framework requirements. Address gaps systematically. Otherwise, continue pragmatic adoption of useful principles.

The Governance Mindset

Good governance enables speed—it doesn't impede it. When developers trust that automated gates catch problems, they move faster with confidence. When reviewers focus on what matters, reviews complete quickly. When audit trails generate automatically, compliance becomes effortless. The goal isn't to slow down AI-assisted development. It's to make accelerated development sustainable, secure, and auditable. Start with what matters—outcomes—not with what's familiar—process. Build governance that answers "did we build the right thing correctly?" rather than "how did we build it?"

Comments

Share your experiences with AI development governance or questions about implementation.