OpenAI’s Deployment Simulation Extends Pre-Deployment Risk Assessment to Agentic Coding Through Simulated Tool Calls

OpenAI published a new pre-deployment safety method called Deployment Simulation. The idea is direct. Before a model ships, simulate its deployment first. Replay past conversations through the new candidate model. Then study how it behaves in realistic contexts.

OpenAI already uses insights from the method during model development. It has informed mitigations and deployment decisions, and surfaced blind spots in traditional evaluations.

https://cdn.openai.com/pdf/predicting-llm-safety-before-release-by-simulating-deployment.pdf

Understanding Deployment Simulation

Deployment Simulation is a method for simulating a future deployment before it happens. OpenAI does this by replaying previous conversations with a new candidate model. The replay is privacy-preserving.

The technique is simple at its core. Take recent conversations from deployment. Remove the original assistant response from the older model. Regenerate that response with the candidate model to be released. Then evaluate the completions for new failure modes.

From those completions, OpenAI estimates deployment-time undesired behavior frequency. The same measurement can run after release on real traffic. That makes pre-deployment forecasts checkable later.

There is a floor. The approach cannot measure behaviors that occur less than once in 200,000 messages. It targets non-tail risks, not the rarest events.