Adopting AI in Your Security Practice Without Inheriting New Risk

Security
Adopting AI in Your Security Practice Without Inheriting New Risk
Yannic Scheef
Yannic Scheef 01 Jun 2026

Reading length Full · 8 min read
TL;DRShortMediumFull

The fastest way to turn an AI security tool into a liability is to treat its output as authoritative and its inputs as harmless. Both assumptions are wrong, and the gap between them is where new risk enters an otherwise careful security practice.

If you already run a change-approval process and an access log, the pattern will feel familiar: a new class of third-party tool that reads sensitive context and shapes decisions, governed by the same instincts you apply to any privileged actor. If you're still building those capabilities, treat this as a strong reason to prioritize them.

Decide what the model is allowed to see, before anyone uses it

The most consequential decision is data handling, and it has to be made centrally, not left to whoever opens a chat window. A consumer-grade assistant and a contracted enterprise deployment have very different blast radii; knowing which your team uses is the baseline.

What matters is not branding but the concrete controls behind the deployment: a Data Processing Agreement (DPA), retention and training controls you can toggle, a published sub-processor list, customer-managed logging, and ideally regionalization and customer-managed encryption keys. Absent those, you are largely blind to how your data is handled — so require them as procurement conditions rather than assume the tier you pay for delivers them.

Three tiers of data, one hard line

Draw a hard line between context the model needs and material that must never leave your boundary. The table below is the version worth pinning to a wiki page; the prose under it carries the caveats that the cells cannot.

TierExamplesCondition
Lower-risk, with caveatsSanitized log excerpts, generic error messages, public CVE identifiers, architecture descriptions stripped of hostnames and addresses"Lower-risk to disclose" is not the same as "safe to trust"
Source code, narrowlyRepository snippets, functions under reviewOnly when vendor training on your inputs is disabled, secrets and credentials are scrubbed, and no IP, export, or contractual restriction applies — otherwise treat as the next tier
Never provideLive credentials, private keys, session tokens, customer PII, full configuration files, internal network topology, anything under a regulatory regime (health, payment, personal data)Unless the deployment is explicitly contracted and architected for it

The reason this matters beyond training-data leakage is prompt retention, whose defaults vary by vendor and tier. Consumer products often retain prompts by default; enterprise contracts may log them for abuse detection unless you negotiate a shortened window. Verify and enforce the "no-training / no-retention" settings wherever offered, document which you enabled, and treat the prompt as third-party data you do not fully control even then. If a value would require rotation when leaked, it does not belong in a prompt.

One caveat undercuts any comfortable label on log data: logs frequently contain attacker-controlled content. A crafted User-Agent string or a planted filename can carry instructions aimed at the model rather than at you — prompt injection that tries to steer the output or coax out the surrounding context. The OWASP Top 10 for LLM Applications ranks this class of input manipulation at the top for good reason. Treat log content as adversarial, and scrutinize AI conclusions drawn from it more, not less.

AI is most dangerous when it sounds most certain

A language model produces fluent, plausible text regardless of whether the underlying claim is true, and fluency reads as confidence. In most domains a confident wrong answer is a nuisance; in security it can mean a fabricated CVE number that sends an engineer chasing a nonexistent patch, or an assessment that misreads the code.

AI output is a hypothesis until a human or a deterministic tool confirms it. Fluency is not evidence — and in security it is most convincing exactly when it is wrong.

The hypothesis-until-verified checklist

The rule is simple to state and non-negotiable in practice. Make it operational by attaching these checks to anything an AI tool hands you:

  • Citations: treat any cited CVE, advisory, or version number as unverified until checked against the authoritative source.
  • Remediations: re-run a suggested fix against a scanner, a test, or a compiler rather than trusting that it works.
  • Dismissals: apply the most scrutiny to outputs that dismiss or close a finding, not those that flag one.

That last item deserves weight. A model that wrongly calls a real finding a false positive quietly retires a live vulnerability; a false alarm only costs an investigation hour. Weight verification toward the answers that retire issues, because that is where a mistake is most expensive and least visible.

None of this is a reason to avoid AI for first-pass work. For bounded tasks — summarizing a known CVE, classifying alerts, drafting a remediation outline — the time saving is often real and substantial. For novel threat analysis or complex code review, verification consumes much of any gain. So measure it: compare time genuinely saved against the verification overhead, and build that verification into the workflow so whatever gain remains survives contact with reality.

Keep a human accountable for every consequential action

Using AI to understand is low-stakes; using it to act is not. Reading and proposing cost little when wrong, but acting — blocking an IP range, closing a finding, changing a firewall rule, deleting data, notifying a regulator — is often irreversible.

AI: read, rank, propose Named human approval gate Irreversible action free to automate | accountability lives here | state changes
The automation boundary: AI proposes freely; a named human gates anything that changes state.

Draw your automation boundary along that line: let AI draft, rank, and recommend freely, but require a named human to approve anything that changes state in a production or security-relevant system — unless you have a risk-accepted auto-remediation playbook for the action, one that earns its autonomy with guardrails: scoped credentials, rate limits, a dry-run pass, and a tested rollback. Accountability cannot be delegated to a model: "the AI decided" is not an answer your incident review will accept; "we accepted this risk deliberately, with these limits" is.

Agentic tools that chain actions raise a distinct hazard: a misjudgment in step one feeds step two, and an agent acting against external APIs can fire off changes with no rollback. Put the approval gate at each irreversible or high-impact action rather than one blanket "proceed" at the start; atomic, rollbackable steps can be batched under one approval. And run the agents under least privilege: time-boxed, environment-scoped tokens, rate limits, and an outbound allow-list so a compromised agent cannot reach systems it was never meant to.

Make the use auditable, or you cannot defend it

If you cannot reconstruct what an AI tool was asked, what it returned, and who acted on it, you have introduced an untraceable actor into your operations — as unacceptable as an unlogged administrator account. An auditable workflow records four things:

  1. The prompt and the context it carried.
  2. The output the model returned.
  3. The reviewer who evaluated it.
  4. The action that followed.

That record serves three purposes: post-incident reconstruction when an AI-influenced decision turns out wrong, detection of misuse or data exposure, and the evidence demanded by vendor-security reviews. Consensus Assessments Initiative Questionnaire (CAIQ)-style forms — the kind built on the Cloud Security Alliance's Cloud Controls Matrix — and reviews aligned to the NIST AI Risk Management Framework now ask, in plain terms, how AI is governed in your environment.

Engineer the guardrails

Policy that lives only in people's heads degrades the moment someone is in a hurry, which in security is most of the time. The controls above hold up only if the safe path is the easy path — an engineering problem, not a memo.

A gateway that strips secrets before they leave

Start with a gateway in front of approved tools that strips or flags secrets before a prompt leaves your boundary. A policy gate can be as blunt as a set of patterns that refuse the request when they match — enough to catch the common structured secrets:

BLOCK_PATTERNS = [
    r"-----BEGIN [A-Z ]*PRIVATE KEY-----",  # private-key blocks
    r"(?i)aws_secret_access_key\s*[=:]\s*\S+", # AWS secrets
    r"sk-[A-Za-z0-9]{20,}",                   # API key prefixes
    r"eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.",   # JWT-shaped tokens
]

def gate(prompt):
    for pat in BLOCK_PATTERNS:
        if re.search(pat, prompt):
            raise PolicyViolation("secret-shaped content blocked")
    return prompt  # still not "safe" — see below

Automated stripping reduces exposure from common structured secrets — API key prefixes, private-key blocks, high-entropy token formats. It will not catch everything: short tokens fall below entropy thresholds, base64-encoded or embedded keys slip past pattern matching, and semantic material defeats it entirely — a hostname in prose, a codename, PII inside a question. It reduces the blast radius; it does not eliminate it, and the rest is policy and judgment.

Treat the audit log as a secrets store

Then make auditability a property of the system, not a habit: have the gateway log every prompt and response. But that log is now a high-value dataset in its own right — a concentrated record of your secrets and internal context — so treat it like one:

  • Redact prompts and responses before writing to the log.
  • Encrypt the log at rest.
  • Enforce strict Role-Based Access Control (RBAC) on who can read it.
  • Keep retention short with enforced deletion policies.
  • Monitor access to the log itself as you would any privileged data store.

Wire verification into the ticket workflow too: a finding cannot reach "remediated" until its CVE is confirmed against the authoritative source, and an AI-proposed fix cannot merge until it passes a scanner or test. When the verified path is the path of least resistance, you stop relying on memory under pressure.

What makes AI distinct is the combination: natural-language input that resists filtering, retention defaults you can constrain only as far as the contract allows, and output fluent precisely when it is wrong. Decide the rules once, deliberately, and wire them into the tools — rather than discovering them after something irreversible.