Managing Risks from Internal AI Systems
The most powerful AI systems are used internally for months before they are released to the public. These internal AI systems may possess capabilities significantly ahead of the public frontier, particularly in high-stakes, dual-use areas like AI research, cybersecurity, and biotechnology. This makes them a valuable asset but also a prime target for theft, misuse, and sabotage by sophisticated threat actors, including nation-states.
We argue that the industry's current security measures are likely insufficient to defend against these advanced threats. Beyond external attacks, we also analyze the inherent safety risks of these systems. In the future, we expect advanced AI models deployed internally could learn harmful behaviors, leading to possible scenarios like an AI making rogue copies of itself on company servers ("internal rogue deployment"), leaking its own source code ("self-exfiltration"), or even corrupting the development of future AI models ("successor sabotage").
To address these escalating risks, this report recommends a combination of technical and policy solutions. We argue that, as the risks of AI development increase, the industry should learn from the stringent security practices common in fields like nuclear and biological research. Government, academia, and industry should combine forces to develop AI-specific security and safety measures. We also recommend that the U.S. government increase its visibility into internal AI systems through expanded evaluations and provide intelligence support to defend the industry.
Proactively managing these risks is essential for fostering a robust AI industry and for safeguarding U.S. national security.
This report was authored by Ashwin Acharya and Oscar Delaney.