Crucial Considerations in ASI Deterrence

This article was originally published on AGI Strategy, a Substack by IAPS Associate Researcher Oscar Delaney.

In Superintelligence Strategy, Hendrycks, Schmidt, and Wang forecast that great powers will fear for their continued security and survival if a rival unilaterally develops artificial superintelligence (ASI), i.e., a system exceeding human capabilities across all strategically important domains. Therefore, they argue, countries will forcibly prevent rivals from racing to ASI by launching cyberattacks on foreign AI data centers, or even kinetic strikes on compute and energy infrastructure needed for frontier AI training. In what follows, I reformulate their argument as three key claims leading to two conclusions. Then, I review the grounds for (dis)believing each of the three claims, citing critiques and responses where relevant.

My formulation of the MAIM (mutual assured AI malfunction) argument is:

  1. China will expect to be near-totally disempowered if the US develops ASI without any other countries having similarly advanced capabilities.

  2. Conditional on 1, China will take MAIMing actions to avoid being disempowered.

  3. Conditional on 1-2, the US will back down rather than risk WW3.

Conclusion (descriptive MAIM): The US will not develop ASI unilaterally, because of MAIMing threats and/or actions.

Conclusion (normative MAIM): The safest (likely) pathway to developing ASI is where the US and China slowly and safely increase their AI capabilities together, under the continued threat of MAIMing actions. So altruistic actors should try to bring this pathway about.

There has been some confusion about these two conclusions in various response pieces. The original Superintelligence Strategy paper defended both that MAIM is the default state of affairs (descriptive MAIM) and that we should strive to make a MAIM equilibrium more stable and more likely (normative MAIM).

The game-theoretic logic of MAIM applies to any pair of rival countries. For simplicity, I assume throughout this post that the US is the country leading AI development and that China is the one MAIMing. But analogous dynamics could apply if we swap the roles of China and the US, or replace China with Russia. Indeed, Russia may have more reason to engage in MAIMing attacks, given that it cannot otherwise compete in AI, unlike China.

I refer throughout to the US and China as unified actors making coherent decisions. This is a somewhat accurate model of China. But in the US, many independent private companies make key decisions about AI development. However, the US government will likely take a more active role in managing geopolitical risks from AI if China takes MAIMing actions, so in some cases, it is still a reasonable simplifying assumption to model the US as a unified decision-maker.

Note that I do not endorse all the reasons given for and against the premises; I have simply tried to collate the strongest reasons from the literature,¹ and at times supplement them with considerations of my own. I think this is one of the most important topics in AI policy, and I would be keen to see more work assessing these premises and conclusions.

1. China expects to be disempowered by ASI

China will expect to be near-totally disempowered if the US develops ASI without any other countries having similarly advanced capabilities.

By ‘near-totally disempowered’ I mean that either the CCP has been ousted entirely, or that the PRC is no longer a great power on the international stage.

Note that there is no discrete threshold for ASI, where the US getting there a week before China automatically leads to US supremacy. Rather, the gap in capabilities is what counts, and if Chinese AI capabilities always remain nearly as good as US AI capabilities, China is unlikely to be disempowered.

Reasons for:

  • An intelligence explosion could trigger a decisive strategic advantage (DSA) – a position of unrivaled dominance on the world stage, where competitors are greatly militarily outmatched. A small initial lead in AI development could expand rapidly once one group successfully automates AI R&D, enabling a rapid, self-reinforcing cycle of AI capability improvements. There are many orders of magnitude of scaling left to be had in chip production and algorithms, such that even the initial Earth-based intelligence explosion could go very far. Once one side has a sufficiently large lead in AI, they could use this to interfere with adversaries’ AI development and build up a decisive economic, industrial, and eventually military advantage. They can also develop novel ‘wonder weapons.’

  • A misaligned superintelligence may take over. If the US fails to align and/or control its ASI, then the ASI itself could disempower both the US and China, so this is no better for China than a US DSA.

  • China will closely monitor US AI development and its implications. It is in China’s strategic interests to carefully track US AI development, and we should generally model China as a competent actor with advanced espionage and intelligence analysis capabilities. Moreover, the US likely won’t have sufficient cyber and information security to prevent China from tracking technical developments inside US AI projects. Therefore, the question of whether China believes US ASI will lead to China’s disempowerment is closely related (but not identical) to whether this is really the case.

    • [Counterpoint] Compared to the US, a larger share of influential actors within China appear skeptical that the LLM paradigm will scale to superintelligence. If this skepticism persists, but scaling does work, China may be left working on novel architectures (e.g., that are more compute efficient, given China’s lower compute stockpile) and blindsided by US ASI.

    • [Counterpoint] Predicting the effects of unprecedented technology is far harder than using espionage to track current developments. So even if China forms an accurate picture of US AI development, they may fail to predict its ramifications.

  • Takeoff will be slow enough for China to notice. While an intelligence explosion would be rapid on world-historic timescales, it will likely be slow enough to be detected and responded to, especially in the early stages as AI-automated R&D scales up. Therefore, China will have time to notice trends in US AI progress before it is too late. There is some tradeoff here where if takeoff is too fast, China may be blindsided, while if takeoff is too slow, the US may not get a DSA at all. But the middle ground of takeoff over months to years likely gives China time to notice and respond.

    • [Counterpoint] If there is an unexpected jump in capabilities (e.g., from a large algorithmic breakthrough), China may be caught unawares. But this would be surprising given the baseline trends of fairly steady progress.

Reasons against:

  • Even ASI will struggle to overcome nuclear deterrence, making a DSA less likely. Neutralizing nuclear weapons before rivals realize and pre-emptively intervene is very difficult, possibly even for an early ASI. For instance, nuclear command and control systems can be thoroughly air-gapped to avoid disabling cyberattacks. Attempts to mechanically disable all of an adversary’s nuclear arsenal with more speculative methods, e.g., with AI-powered miniaturized attack drones detonating small explosions to temporarily block missile launch mechanisms, are unlikely to succeed completely. This would leave potentially dozens of nuclear warheads still active and likely to be launched in retaliation.

My credence in premise 1 holding is ~70%, since I expect China’s situational awareness to grow greatly in the coming years as AI’s capabilities and real-world implications advance. However, futuristic speculations about DSAs may remain difficult for the Chinese establishment to countenance.

2. China will MAIM

Conditional on 1, China will take MAIMing actions to avoid being disempowered.

Assuming China does expect disempowerment, what might they actually do? MAIMing actions can be more or less extreme. I define this premise as holding if US AI development is delayed by at least six months by these attacks. In ascending order of severity, some possible MAIMing actions are:²

  • Cyberattacks on AI training runs and data centers

  • Unattributed sabotage to data centers (e.g., covertly hiring US domestic criminals to cut fiber optic cables, damage grid connections, or even blow up data centers)

  • Frequent assassinations of leading AI researchers

  • Conventional missile strikes³ to destroy data centers and grid infrastructure

  • Tactical nuclear strikes on AI infrastructure

  • High-altitude EMP nuclear detonations to destroy a wide swath of electronics (not just AI-related) across the continental US

The likelihood of China pursuing these tactics decreases greatly from top to bottom, though some similar considerations apply. The extent of China’s convictions will likely determine how far they are willing to escalate.

Reasons for:

  • China will not trust the US to respect China’s sovereignty post-DSA. Historically, the US has sometimes pursued regime change in cases where it perceived a government as challenging the US’s core interests. If the US achieves a DSA, China would become vulnerable to US interventions.⁴ Protecting one’s continued sovereignty is the highest priority of most countries, so China will be willing to go to extraordinary lengths to preserve its sovereignty, including taking large risks.

    • [Counterpoint] Perhaps the US could credibly commit to China that it will not seek a DSA, or will not use a DSA to force regime change in China.⁵ This could help on the margin, but it is difficult to imagine the US saying or doing anything pre-DSA that will bindingly constrain its options post-DSA, given that the normal mechanisms of international pressure will no longer apply. One possibility is that the US could enshrine in its AI model specs that the AI must not help the US overthrow foreign governments, and that it must block attempts to create a new AI without this safeguard. But the US may be reluctant to tie its hands to this extent.

    • [Counterpoint] Even with a DSA, the costs to the US of disempowering China may outweigh the benefits. By definition, a DSA means the US would have a negligible chance of failing to achieve its strategic objectives against any adversary. However, this is distinct from an ‘overwhelming strategic advantage’ (OSA), which would allow the US to achieve its objectives with negligible costs to itself. For instance, a confrontation with China could divert valuable resources away from the US’s industrial-economic explosive growth and lead to missing out on gains from trade. Or, if China’s nuclear arsenal has not been fully incapacitated, there would be some risk to US cities.

Reasons against:

  • China will fear escalation. China may (reasonably) believe that MAIMing actions, especially kinetic strikes on US soil, will be met with fierce retaliation and possible escalation. This would particularly be the case if the US has publicly pre-committed to responding assertively to MAIMing strikes. To date, nuclear deterrence has held, with no major direct conflicts between nuclear-armed rivals. Moreover, AI infrastructure is closely integrated into economic and military power, so an attack may be seen as more escalatory than intended. This (perceived) likelihood of military escalation greatly raises the costs and risks to China from MAIMing.

  • China will back itself to fast-follow in AI, preventing a US DSA. If takeoff is relatively slow, and/or if the laggard is still able to steal unsecured model weights deep into the intelligence explosion, there may never be a large lead in AI capabilities, and therefore no DSA and no need to launch pre-emptive MAIMing attacks.

    • [Counterpoint] China’s compute disadvantage may prevent it from being competitive even with the same models as the US.

  • There are no clear red lines. There is no definitive point at which an AI project becomes sufficiently existentially dangerous to a country to warrant MAIMing actions (unlike with nuclear deterrence, where a nuclear strike is a discrete event). Putatively, recursive self-improvement – where automated AI R&D leads to accelerating progress – is a red line, but even this is quite fuzzy and continuous. The US may use salami-slicing tactics – slowly increasing installed compute capacity and the size of training runs – such that there is never a clear discontinuity to prompt China to intervene.

    • [Counterpoint] Strategic ambiguity can also deter. For China to deter US ASI racing, there need not be a clearly communicated red line. For instance, it is ambiguous what a Russian jet or drone would need to do for a NATO country to shoot it down – there is no bright red line. But Russia is still deterred: it would likely be more aggressive if aerial incursions were guaranteed to go unpunished. Similarly, even if the US doesn’t know exactly when China will be sufficiently concerned to start MAIMing, this could still deter the US from fully racing.

    • [Counterpoint] Gradual escalation can communicate red lines. Even if red lines are not set and agreed in advance, because MAIMing actions occur on an escalation ladder, MAIMing can itself communicate to the US where China’s red lines are. China would not jump straight to nuclear attacks on US energy infrastructure; they would likely begin with cyberattacks and small-scale sabotage, providing time for a negotiated de-escalation before an all-out war begins.

  • The status quo excludes MAIM. Kinetic strikes against an adversary’s civilian AI projects are far outside the current Overton window and rules/norms of engagement in international affairs. (Whereas cyberattacks are already normalized.) Taking drastic action would require radical, though possibly rational, rethinking of Chinese foreign and military policy, which may not occur in time.

    • [Counterpoint] The commonly accepted strategic logic can change quickly (e.g., MAD would sound absurd in the 1930s).⁶

My credence in premise 2 holding is ~60%, since the strategic logic for MAIMing seems strong, but it may be psychologically and institutionally difficult to actually initiate MAIM. (Although if hostilities have already broken out over Taiwan, further attacks on US soil to damage AI infrastructure will seem less outlandish.)

3. The US will acquiesce

Conditional on 1-2, the US will back down rather than risk WW3.

Note that the likelihood of premise 3 affects premise 2 as well, since China may be less likely to MAIM if they expect the US to escalate strongly. Additionally, premises 1-3 collectively influence whether the US will seek to race towards ASI in the first place. If the US government deems it very likely that China will initiate MAIMing attacks and the US will back down, then the optimal strategy for the US may be to not (visibly) race in the first place.

Reasons for:

  • The US is risk-averse and will want to avoid WW3. Achieving a DSA is not a key priority for the US public,⁷ whereas avoiding a nuclear war is (or would be if it were a more salient possibility). The US government will still be democratically accountable enough at the key moment to prefer a negotiated settlement. Outcomes where both the US and China continue to survive indefinitely as strategically significant polities (i.e., the US does not get a DSA) could still satisfy most of the US’s objectives. So there would be large gains from negotiation rather than risking everything in a total war.

    • [Counterpoint] Warmongers may prevail. Even if it were in the rational interests of the US to negotiate, ‘submitting’ to MAIM may be seen as the weak thing to do. These psychological/sociological factors could allow militarists and China hawks to win out, escalating the conflict rather than negotiating. That said, ‘acquiescing’ to MAIM may not be framed as such, e.g., if the US says it is not racing ahead because of its own safety worries.

    • [Counterpoint] If the US is better modeled by offensive realism, the risk of war may be more acceptable relative to the alluring possibility of achieving a DSA.

  • China may agree not to race. If China were simultaneously MAIMing US AI development while racing ahead with its own AI buildout, this would likely trigger US escalation. However, if China is sufficiently worried about ASI being built to take MAIMing actions, they will likely also be willing to constrain their own AI development in exchange for the US slowing down. This would make the US more amenable to a negotiated mutual slowdown.

Reasons against:

  • MAIMing attacks may fail to constrain US AI development. The US may choose to continue racing despite cyber and kinetic MAIMing attacks, if AI development can continue in hardened underground data centers or via distributed training.

    • [Counterpoint] Compute is currently very centralized and relatively easy to sabotage or destroy. And even if the US could defend against MAIMing attacks, this would take a lot of time, effort, and money, potentially giving China time to catch up. China may also be able to insert trojans and malicious code to backdoor US training runs, or claim that they have done so. This could prevent the US from trusting its own AI models for sensitive use cases, making it harder to achieve a DSA (which relies on using AI in military-strategic applications).

    • [Counterpoint] A high-altitude electromagnetic pulse from an atmospheric nuclear detonation would severely damage grid infrastructure and electronic circuits, including AI data centers. The US likely cannot intercept all nuclear weapons (if several are launched, for redundancy) before they detonate in the upper atmosphere, and a small number of detonations is thought to be sufficient to cause very widespread damage, setting AI development back by ~years.

  • The US may not perceive China’s MAIMing threats as credible. Historically, China has issued many ‘final warnings’ that have not amounted to much. The US may (mistakenly) think China is bluffing.

    • [Counterpoint] Gradually progressing through the escalation ladder could credibly signal China’s seriousness; there is no need to jump from rhetoric to nuclear strikes.

My credence in premise 3 holding is ~60%, since I expect a determined China could MAIM somewhat effectively even without going nuclear, and that the US would strongly disprefer a major war.

Descriptive MAIM

Taking these three premises as largely independent, multiplying through my conditional probabilities yields a ~25% chance for descriptive MAIM occurring, i.e., that the dynamics discussed above will slow AI development by at least six months and quite possibly much longer.

Even if a MAIM regime emerges, it is a further question whether it will be sufficiently stable to persist for years or decades while ASI is slowly and safely developed, or if some country will attempt a secret breakout to ASI. This depends crucially on verification: without confidence that rivals are complying, each side faces pressure to defect first to gain the upper hand. But with sound verification, a stable cooperative equilibrium is more achievable. In a MAIM equilibrium, it would be in both countries’ interests to develop robust technical and institutional verification methodologies, thereby preventing unilateral breakout to ASI.

Normative MAIM

Beyond predicting what will happen, policymakers face the distinct question of what should happen. Hendrycks et al. propose that a MAIM equilibrium is not only likely, but also the safest feasible path forward. We cannot fully assess this comparative claim without considering all the other strategic visions of how ASI development should go. But as a start, consider three illustrative scenarios:

  1. MAIM escalates. A US-China war triggered by MAIM dynamics goes nuclear. The value of this scenario is based on just how bad we think a nuclear war (and maybe winter) would be, especially for the long-term future.

  2. MAIM succeeds. The US and China negotiate a peaceful solution, including a lengthy joint slowdown of AI development. The value of this scenario is given by the product of the probability of solving alignment (given the additional time bought by MAIM slowing capabilities progress) and the value of a future shared between the US and China.

  3. China does not MAIM. The US races to ASI unilaterally. The value of this scenario is given by the product of the probability of solving alignment (without MAIM) and the value of a future dominated by the US.

Of course, other less extreme scenarios are possible, such as a limited US-China war, or continued US-China racing without a war or a DSA. But these more clear-cut scenarios offer a useful starting point for analysis.

Reasonable people can disagree greatly on key factors – badness of nuclear war, probability of solving alignment, and value of sharing the future with China – and therefore which scenario to prefer. For instance, if nuclear winter is sufficiently bad, and MAIM makes this sufficiently likely, avoiding aggression and escalation is the top priority. Conversely, if solving alignment is nearly impossible without a major pause, forcing the US to slow down is imperative, even if this involves some kinetic strikes and escalation risk.

Based on these scenarios, we can draw various policy implications.

If one prefers MAIM over US racing, broad policy recommendations could include:

  • Build data centers in remote places separate from critical military or civilian infrastructure, so they can be sabotaged with less loss of life and lower escalation risk.

  • Don’t pursue cybersecurity level 5, to allow nations more visibility into rival AI projects and make cyberattacks more effective. MAIMing actions would then be less likely to escalate to kinetic strikes.

  • Increase China’s awareness of and concern about risks from racing to ASI.

  • Create contingency war plans for various MAIM scenarios, to give leaders better options if they suddenly decide to halt a rival’s AI development.

If one prefers US racing over MAIM, broad policy recommendations could include:

  • Pre-emptively harden data centers and invest in distributed training, in case AI development needs to go underground.

  • Invest in confidence-building measures and credible commitments that make China think (hopefully rightly) that the US is not planning to crush China with a DSA.

  • Get the US to credibly commit to severe retaliation against MAIMing attacks.

    • This is also desirable if we strongly disprefer China controlling a large fraction of the future, and so want the US not to give in to MAIM threats.

Fortunately, some interventions are robust across these disagreements. Many conventional AI safety goals are desirable regardless of views on MAIM:

  • Invest in verification R&D to create optionality for enforcing an agreement to pause or constrain AI development if needed.

  • Further technical alignment and robustness research to reduce the length of time for which we would need to pause AI development.

  • Make AI risks more salient to the US government, such that they monitor and regulate domestic AI R&D activities more closely.

Endnotes

  1. In particular, I focus on the original paper, critical responses from individuals at RAND, IAPS, and MIRI (not speaking for their orgs), and the response to the critiques by Hendrycks and Khoja.

  2. Drawn both from the original Superintelligence Strategy paper, and my own thinking. Note too that as well as these MAIMing actions, countries may take actions not directly related to AI, but that impose pressure in other ways, such as sanctions or export controls, and proxy wars.

  3. China’s long-range missile capabilities (DF-41 and DF-5B systems) seemingly do not allow them to launch strikes that the US would know are non-nuclear. This might make China especially reluctant to use this option because a conventional strike might be misinterpreted as nuclear, raising the likelihood of escalation.

  4. Arguably if the US has a DSA, neither China nor anyone can challenge its core interests. This may be true in the short term; however, to ensure that the CCP will not come to pose a future threat after catching up in AI development, the US may need to (partly) disempower it preemptively.

  5.  Patell and Guest (forthcoming).

  6. Much of the criticism of the original MAIM paper was around disanalogies with MAD, but in their updated response, Hendrycks and Khoja note that the analogy was somewhat loose and the MAIM argument stands on its own.

  7. However, public opinion may not be that determinative of foreign policy decisions, at least in the short term.

  8.  See e.g., Rodriguez (2020; 2022), Belfield (2023), and my own unpublished work.

  9.  Drawn primarily from Superintelligence Strategy.


Thanks to Onni Aarne, David Abecassis, Bill Anderson-Samways, Haydn Belfield, Tom Davidson, Lukas Finnveden, Oliver Guest, Oliver Habryka, Rose Hadshar, Fynn Heide, Edward Kembery, Adam Khoja, Daniel Kokotajlo, Maria Kostylew, Jam Kraprayoon, Jenny Marron, Cullen O'Keefe, Liam Patell, and Claire Zabel for helpful discussion and comments. The views expressed are solely my own, and others disagree on many particulars.

Next
Next

The Emergence of Autonomous Cyber Attacks: Analysis and Implications