Research Report Oscar Delaney Research Report Oscar Delaney

Policy Options for Preserving Chain of Thought Monitorability

The most advanced AI models produce detailed reasoning steps in human language—known as "chain of thought" (CoT)—that provide crucial oversight capabilities for ensuring these systems behave as intended. However, competitive pressures may drive developers toward more efficient but non-monitorable architectures that lack a human-readable CoT. This report presents a framework for determining when coordination mechanisms are needed to preserve CoT monitorability.

Read More
Blog Post Christopher Covino Blog Post Christopher Covino

IAPS Researchers React: The US AI Action Plan

The Trump Administration unveiled its comprehensive AI Action Plan on Wednesday. Experts at the Institute for AI Policy and Strategy reviewed the plan with an eye toward its national security implications. As AI continues to accelerate towards very powerful artificial general intelligence, our researchers discuss promising proposals for addressing critical AGI risks, offer key considerations for government implementation, and explore the plan's gaps and potential solutions.

Read More
Renan Araujo Renan Araujo

Who should develop which AI evaluations?

This paper, published by the Oxford Martin AI Governance Initiative, explores how to determine which actors are best suited to develop AI model evaluations. IAPS staff Renan Araujo, Oliver Guest, and Joe O’Brien were among the co-authors.

Read More
Commentary Sumaya Nur Adan Commentary Sumaya Nur Adan

Key questions for the International Network of AI Safety Institutes

In this commentary, we explore key questions for the International Network of AI Safety Institutes and suggest ways forward given the upcoming San Francisco convening on November 20-21, 2024. What should the network work on? How should it be structured in terms of membership and central coordination? How should it fit into the international governance landscape?

Read More
Renan Araujo Renan Araujo

Understanding the First Wave of AI Safety Institutes: Characteristics, Functions, and Challenges

AI Safety Institutes (AISIs) are a new institutional model for AI governance that has expanded across the globe. In this primer, we analyze the “first wave” of AISIs: the shared fundamental characteristics and functions of the institutions established by the UK, the US, and Japan that are governmental, technical, with a clear mandate to govern the safety of advanced AI systems.

Read More