Policy Options for Preserving Chain of Thought Monitorability
The most advanced AI models produce detailed reasoning steps in human language—known as "chain of thought" (CoT)—that provide crucial oversight capabilities for ensuring these systems behave as intended. However, competitive pressures may drive developers toward more efficient but non-monitorable architectures that lack a human-readable CoT. This report presents a framework for determining when coordination mechanisms are needed to preserve CoT monitorability.
The Hidden AI Frontier
The most advanced AI systems remain hidden inside corporate labs for months before public release—creating both America's greatest technological advantage and a serious security vulnerability. IAPS researchers identify critical risks and propose lightweight interventions to lessen the threat.
Managing Risks from Internal AI Systems
The most powerful AI systems are used internally for months before they are released to the public. These internal AI systems may possess capabilities significantly ahead of the public frontier, particularly in high-stakes, dual-use areas like AI research, cybersecurity, and biotechnology. To address these escalating risks, this report recommends a combination of technical and policy solutions.
Helping the AI Industry Secure Unreleased Models is a National Security Priority
While attention focuses on publicly available models like ChatGPT, the real risk to U.S. national interests is the theft of unreleased “internal models.” To preserve America’s technological edge, the U.S. government must work with AI developers to secure these internal models.
Mapping Technical Safety Research at AI Companies
This report analyzes the research published by Anthropic, Google DeepMind, and OpenAI about safe AI development, as well as corporate incentives to research different areas. This research reveals where corporate attention is concentrated and where there are potential gaps.