Policy Options for Preserving Chain of Thought Monitorability
The most advanced AI models produce detailed reasoning steps in human language—known as "chain of thought" (CoT)—that provide crucial oversight capabilities for ensuring these systems behave as intended. However, competitive pressures may drive developers toward more efficient but non-monitorable architectures that lack a human-readable CoT. This report presents a framework for determining when coordination mechanisms are needed to preserve CoT monitorability.
IAPS Researchers React: The US AI Action Plan
The Trump Administration unveiled its comprehensive AI Action Plan on Wednesday. Experts at the Institute for AI Policy and Strategy reviewed the plan with an eye toward its national security implications. As AI continues to accelerate towards very powerful artificial general intelligence, our researchers discuss promising proposals for addressing critical AGI risks, offer key considerations for government implementation, and explore the plan's gaps and potential solutions.
Who should develop which AI evaluations?
This paper, published by the Oxford Martin AI Governance Initiative, explores how to determine which actors are best suited to develop AI model evaluations. IAPS staff Renan Araujo, Oliver Guest, and Joe O’Brien were among the co-authors.
Key questions for the International Network of AI Safety Institutes
In this commentary, we explore key questions for the International Network of AI Safety Institutes and suggest ways forward given the upcoming San Francisco convening on November 20-21, 2024. What should the network work on? How should it be structured in terms of membership and central coordination? How should it fit into the international governance landscape?
Understanding the First Wave of AI Safety Institutes: Characteristics, Functions, and Challenges
AI Safety Institutes (AISIs) are a new institutional model for AI governance that has expanded across the globe. In this primer, we analyze the “first wave” of AISIs: the shared fundamental characteristics and functions of the institutions established by the UK, the US, and Japan that are governmental, technical, with a clear mandate to govern the safety of advanced AI systems.