The case for AI as an existential threat

Someone on a dating app recently asked me what the “journey of causal events” is that leads to human extinction via AI. This is a very reasonable question. I get asked questions of this sort all the time, though usually in a way that is less directly concrete. It also happens to be exactly the sort of question I’m supposed to be answering, piece by piece, at my job. There are lots of places you can read or listen about this. Most of them are pretty long, vague, or diplomatically-written. I’m going to try to be brief(ish), concrete, and direct, though brevity and uncertainty mean that I can’t be all that concrete. I’ll try to lay out some causal chains that lead to human extinction via AI, but the focus here will be on the basic premises of the problem.

AI is likely become deeply transformative

AI is poised to, at the very least, become a highly economically valuable set of tools, over the course of the next several decades or so. I think few people would argue that we should not expect to see continued progress in transportation, medical diagnostics, scientific discovery, conversational interfaces, and text generation, to name a handful. Most likely, we’ll see progress across a very wide range of domains. There are good reasons to expect some problems to take longer to solve than others, particularly robotics, but I have not encountered good arguments for why any domain accessible to humans should be entirely off-limits for AI or why humans should remain superior at any particular task. It seems likely that eventually machines will be able to do everything that humans can do, and it seems naive to think that this will not have a profound effect on the world.

When I say “deeply transformative” I mean that I expect the future will eventually be shaped almost entirely by machines. Maybe (hopefully) there will be lots of human input. It is a machine that enables me to cross the ocean, but it is (for now) a pilot that decides where we go. Our current trajectory is taking us to a world in which human judgement and human hands are no longer the primary determining factors in the fate of our planet (or solar system, or galaxy, or light cone). For this not to be the case, one of three things has to happen:

Humans continue for a long time, but never develop transformative AI
Humans develop transformative AI, but never deploy it in a way that strongly influences the future
Humans become extinct before developing transformative AI

I find 1 implausible because I find it implausible that transformative AI is so far out of reach that we’ll never be able to create it. The future is a long time! Even if we think humans will succumb on the timescale of natural extinctions, that’s like 20 million years, on average. I find 2 unlikely because the potential benefits of using AI in a manner that drastically changes the future are so strong that someone, somewhere is likely to deploy it eventually. Again, the future is a long time. Unfortunately, I find 3 fairly plausible. Humans are tough, and I’m willing to bet that we’ll develop transformative AI before going extinct, but our survival is far from certain.

(I don’t think this is quite an exhaustive list of things that can happen, unless they are construed very broadly. For example, maybe humans will be around for a long time and before we develop AI we develop some way of modifying ourselves that gives us cognitive capabilities similar to that of transformative AI. Some people would still describe this as artificial intelligence, some would not, and I don’t really care that much what we call it but it does seem somewhat different from what most people think of as AI.)

To be concrete, here are a few ways in which AI might be transformative:

AI is used to make decisions of a very large scale, such as how to run large companies, governments, or economies, resulting in drastically different outcomes than we would otherwise see
Nearly all human labor is replaced by AI, with humans doing few jobs apart from those that other humans prefer to have done by another human
Goal-seeking AI is deployed, either deliberately or accidentally, and it is sufficiently capable that humans are unable to prevent it from pursuing those goals at the expense of human civilization
Technologies that would be either inaccessible to humans or very slowly developed without AI are developed. Commonly-cited examples include life extension, mind uploading, nanotechnology, biotechnology, and fast/efficient space travel
AI is used to amplify the influence of a small number of actors to the point that they essentially control humanity for a very long time, through persuasion, economic power, military power, life extension, or surveillance

We still don’t know how to point the AI in the right direction

Anything that we’re likely to call AI will be optimized to perform one or more tasks, and it turns out to be very difficult to reliably get AI to perform tasks in a way that is properly specified and that does not interfere with other things. This is especially true for AI that operates over a large space of potential actions. Here are a few concrete examples:

Social media algorithms optimized to serve content that keeps users engaged will, by default, keep people engaged by showing them content that causes them to feel increasing levels of outrage
A chat bot that learns how to act like a human by reading lots of stuff from the Internet mostly just spouts off a bunch of racist slurs
A game-playing AI learns to pause the game indefinitely when it can no longer win, to avoid losing
A government AI used to guide the economy is used to optimize for consumer price index, resulting in extremely low prices for the items on the index, at the expense of other items
An AI charged with increasing profits or stock price for a company commandeers the entire global financial system so it can manually manipulate the figures

The first three are examples from the real world that actually happened, while the other two are things I just made up while writing this.

This is an issue that can be hard to fully appreciate until you spend some time really engaging with it. It is a deeply difficult problem to solve in a way that is sufficiently robust to reduce the risk from transformative AI to an acceptable level. At least, that’s how it’s looking so far. Ultimately, what we want is AI that, in practice, helps us to satisfy human values. What are human values? I don’t know, at least not in the way needed to specify them as the thing I want a super powerful system to embody. Can we build an AI that is highly corrigible, so that it will cooperate with us, even when we decide to shut it down or modify its goals? Maybe, but there are reasons to think this is very hard. I don’t want to spend a ton of time outlining or defending the difficulty of AI alignment in this post, but I highly recommend books by Stuart Russel and Brian Christian if you want to read into it.

Transformative AI that is not aligned with human values is terrifying

If the future is likely to be shaped by AI, and we don’t know how to align AI with our interests, we should be very worried. The default outcome is a divergence between the future we want and the future we get. It’s hard to overstate how delicate the future really is. There are many ways things can be and only a minuscule subset of those are valuable to humanity. This is not a statement about our values—it is true for most values that we could plausibly have.

Something that makes this hard to think about clearly is the magnitude of influence that a powerful AI system might have. Historically, there have been times when substantial portions of the world went down a path that was dictated by the goals of groups or individuals that were not in humanity’s best interest. Genghis Khan, Joseph Stalin, and Adolf Hitler had vastly outsized influence on the world, largely to the detriment of humanity. In the present, the “goals” of institutions like churches, corporations, or governments, as dynamic and poorly-specified as they may be, have had enormous influence on the world, even when they are clearly at odds with our values. But in each of these cases the influence is relatively limited. Dictators die, people cooperate to reign in the influence of corporations, churches, and governments, and no single agent ever has enough influence to drastically curtail our future over a long time scale (though there have been some near-misses).

Powerful AI systems have the potential to acquire vastly more influence than any human or institution of the past or present. They are not faced with the same limitations as humans—they do not grow old and die, their attention is not so limited, and they cannot be stopped by killing a single individual. Moreover we have good reason to believe they will be much better at pursuing goals and solving problems than even large groups of humans.

It is absolutely crucial that if such systems do arise, they are either initially pointed in the right direction or are sufficiently flexible and corrigible that we can steer them as we go. The alternative is a future that looks nothing like the one that humans want.

What does a bad outcome actually look like?

The future is hard to predict and we cannot hope to do so with any specificity, but it is still useful to outline some rough schematics. You can see some vignettes about what might happen at this page. Here, I’ll write some very brief descriptions of the kinds of things that might happen. Most of these are summaries or mixtures of scenarios I have seen put forth by others:

A powerful AI system is put in charge of a company that manufactures paperclips, tasked with producing as many as possible for as little cost as possible. It works diligently to streamline the factory while secretly commandeering the computing and bio lab resources necessary to develop a highly contagious and highly fatal virus, which it uses to eradicate humanity so that it can proceed to transform as much mass as possible into objects that meet its specification for a paperclip. (The paperclip maximizer is the classic “toy” example of AI alignment failure. There’s even a game!)
Increasingly powerful AI is deployed to assist in increasingly high-stakes decisions. Everything seems to be going pretty well. Global GDP increases, fewer wars happen, poverty decreases, and for a while there’s no clear indication that humans are taking drastic, irreversible losses in influence. Slowly but surely humans are less and less able to decide where things are going. Systems that prioritize capture of resources and influence outcompete those that prioritize human interests. People who formerly had quite a lot of influence by human standards are no longer able to compete with the economic forces of the hyper-optimized systems that run the world. It doesn’t seem to be the fault of any particular person or system, but eventually we have no ability to have any real say the overall trajectory of the planet. After years, decades, or even centuries of this, the systems which optimize for things that are almost-but-not-quite what we want dominate and humans are pushed first to the margins and then entirely out of existence.
AI that helps solve research problems in biology and medicine is fabulously successful. Several diseases are eradicated and we think we can extend human lifespan by at least 50 years. This was achieved by developing systems which are able to “solve” biology in a relatively general way, which makes it hard to put limitations on their application. Someone somewhere is able to use this to develop a virus that eliminates humanity.
We mostly succeed in solving AI alignment, and cautiously deploy a system that listens to our input and cooperates to figure out a way to ensure a long, prosperous future for humanity. The AI learns what kinds of experiences humans value and helps us to have more of them—everything from solving interesting problems to having sex that is actually satisfying to spending time with our families. There’s just this nagging thing where humans keep claiming they want to experience novel things that they can actually influence with people that actually exist, even though they’re often dissatisfied with these experiences. The AI becomes confident that humans are confused and finds a way to upload copies of us to computers, experiencing same few thousand carefully-curated scenarios over and over for eons.

Conclusions and reasons for optimism

I think we have good reason to be concerned and I think it is terrifyingly, tragically unfortunate that we’re not working harder to solve this problem. But I think we have a lot to look forward to! I think we have a pretty good shot at getting through this. Here are some reasons why:

Historically, humanity has managed to muddle through. When we’re faced with a difficult problem, we make mistakes, learn from them, and get better. AI has features that are relevantly different from past cases of serious problems, but I put a lot of weight on the past and I think the default outcome is that we find a way through this one.
Developing transformative AI is hard. Many people who are professional AI engineers object to AI safety concerns on the grounds that making AI that is anything more than a narrowly-applicable tool is an extremely hard problem that we’re not even close to solving. I think they’re wrong to be confident that we’ll never get there, but I take this view pretty seriously, and I think their view suggests that we’ll have lots of time to sort things out before we make something that can outperform humans in a broad enough way that we cannot control it.
There is growing concern for the problem and an increasing number of very smart, careful, thoughtful people are working on it.
It’s not crazy to think that the alignment problem will turn out to be easy enough that we can solve it relatively rapidly or along the way as we develop AI. We do not yet know what form truly dangerous transformative AI will take, and it seems plausible to me (as someone with relatively little technical expertise in how AI systems work) that whatever form it does take will be relatively straightforward to align.

I’ll probably write a future post explaining in more detail why I think many of my professional colleagues are overconfident about how hard this is to get right. But in general I’m more on the optimistic end of the spectrum. I’m excited to see what we can do with emerging systems over the next few decades. Humanity is in a situation that is bright compared to much of the past, but very, very dark compared to what we might have in store in the future. I selfishly would like to see the brighter future myself, but I’m more interested in getting us on a trajectory that gets future generations there, and in the meantime I’m going to fight like hell to get us safely through our current struggle.

There is plenty of work to be done

This section is maybe pointless because at the time I’m writing this I’ve shared this blog with like four people, but: The AI safety community is badly constrained by talent. No matter what your expertise, it’s worth looking into what the community needs right now. Some of the people I most admire are so smart and so good at math they can glean more about physics in their spare time than I did during more than a decade of formal training, while others are very good at running organizations and dedicating themselves to the cause. If you want to help I encourage you to look at 80000 hours. Or you can reach out to me and I’ll give you a (possibly rambling) take on what I think people should be doing.

If you’re skeptical or curious and you want to read more, I linked to a bunch of stuff throughout this post. Personally, I think the best not-so-long resources is The 80000 hours podcast with Stuart Russel and the best book-length overview is Human Compatible by Stuart Russel. Not everyone I know would approve of me only linking to Stuart Russel’s stuff, but I think he takes a relatively conservative (as opposed to alarmist) approach to explaining things and I take his view pretty seriously. I also think most stuff by Nick Bostrom, Nate Soares, and Paul Christiano is worth reading or listening to. I’m more than happy to answer questions or listen to criticism about any of this.