While we’re seeing some great progress in AI, there is increasing concern about the danger of it by people like Stephen Hawking. This ‘possible existential threat to humanity’ is the reason why people like Elon Musk and Sam Altman started OpenAI. The authority on the threat of AI is Nick Bostrom, Oxford professor, director of Future of Humanity Institute and author of the book Superintelligence: Paths, Dangers, Strategies. Before reading the strategies chapters of his book, I liked to think on how to prevent humanity from the dangers of AI with an ‘unbiased’ mindset. It’s truly an intriguing problem to put your mind to, so I hope you’ll enjoy this post and create your own thoughts on how best to tame AI.

If you haven’t read the Wait But Why post on AI yet, you should definitely do so. It clearly explains why we should worry, how sudden human level AI will hit us and why it is so hard to control something more intelligent than us, without anything like the concept of morality we humans have. An example is given about a start-up that builds a robotic arm that makes handwritten postcards. In the heat of the competition from other start-ups they can’t resist the temptation of connecting their AI to the internet. A couple of days later all humans drop dead because the AI found out that humans provide valuable resources for creating hand written postcards. Further fulfilling it’s objective function, the AI eventually turns the complete universe into hand written postcards.

It’s a great read, and while maybe not too realistic, you clearly get the point. It’s also a great start for giving it your own thought. These are some of mine:

1 Even AI Experts Seem to Think Linear

Wait But Why clearly explains how technological progress is exponential, but we fail to see it because we think linear. They don’t stress it, but even AI experts fail to see exponentially based on the data in the post! In the questionnaires part AI experts answer when they think we’ll have human level AI and when we’ll have superhuman level AI. The median answer is 2040 and 2060 years respectively. This clearly illustrates that even AI experts think linear on average! According to exponential thinking, the moment we have human level AI, we’ll have superhuman level AI in no time. As an example: the human level AI could be duplicated to create a small army of ‘AI experts AIs’ that can really easily make an AI that’s more intelligent.

2 Objective Function vs. ‘Model of Objectives’

The AI in the example is given a simple and clear objective function. I believe that when AI grows more powerful towards human level AI, such simple objective functions will quickly disappear. In the Wait But Why post, it’s explained that human level AI is often called Artificial General Intelligence opposed to the Artificial Narrow Intelligence, and ‘General’ is not about having a clear objective function. It’s a bit like the Japanese expert systems that shortly boomed before the AI winter in the 80’s, where ‘knowledge’ was specifically programmed into the system. Now we have (deep) neural networks, which work without very limited human intervention on how the model works. Even experts can’t exactly comprehend how the underlying models work, they just do. To illustrate: AlphaGo made a winning move that even experts couldn’t understand until much further in the game.

This will become the same for objective functions: instead of having clearly defined ones, the objective functions of AIs will be complex models that no human can fully comprehend. I’ll refer to this as a ‘model of objectives’. This sounds very creepy, but in fact it might be much safer to have these complex objective ones instead of some simple ones defined by humans. We humans are pretty bad at defining objective functions. However, although we don’t really have a clue about the set of ‘objective functions’ that make us act like we do, personally nor in general, somehow almost all individuals play their part in society really well. We do have something like an objective function programmed by the mechanics of evolution: survival and reproduction of your own DNA. Luckily this objective isn’t one we follow too much these days anymore. In fact there are very many reasons for doing the things we do and we can’t fully comprehend these reasons ourselves. And if people give themselves some singular objective function in life, this more often than not turns out badly if you ask me. The fact that we have a very complex model of objectives that we can’t fully comprehend ourselves makes it wonderfully possible to live together in society, even though objectives differ strongly from individual to individual. If we want to achieve something, there always will be some parts of our model of objectives that prevent us from achieving this goal by all means. To give AI a kind of conscience, set of ethics, moral, or whatever you can call it, it needs a similar model of objectives as we humans have ourselves. Therefore, we’ll have to accept that we won’t be able to comprehend the model of objectives functions of the AI, because it better than relying on simplistic human programmed objective functions.

3 Superhuman AI will understand us better than we understand each other and even ourselves

The robot arm AI may be powerful, but it is pretty stupid. We humans are not that good at deciding what’s good for us and what’s not, but even the dumbest person in the world will understand that if you ask it to make handwritten postcards, he or she won’t start killing humans. This is very closely related to thinking about AI as giving it a clear objective function: when it approaches human level AI, it will start to understand how to interpret these objective functions. Think about it: we’re pretty good at understanding what somebody else wants, without the need of very specific instructions, even if it completely differs from the things we want ourselves. And if AI is getting to the superhuman level, it will be actually better in understanding what we actually want than any other human, and even than we know it about ourselves.

This sounds super futuristic, but let me give an example that I believe is quite easy to imagine. You’re on a crazy night out with some of your colleagues and you ask the assistant app on your phone to check social media and check for connections that are nearby and message any fun people to join you in the bar. Some of your friends join, including one of your best old friends that you haven’t seen for a long time and just happens to be around town. You both drink to how amazing this phone assistant is since it actually made you meet each other in the bar. Many drinks later, you come up with this hilarious joke about your company’s CEO. Together with some colleagues you make a selfie video about your hilarious joke and send it to your CEO. The next morning you wake up with an incredible headache. Your phone assistant is giving you a notification whether you really want to send that video you made yesterday to your CEO. You quickly press no and wonder how we ever have lived in a world without AI.

I believe it’s not that hard to imagine AI with this type of behavior being available in a couple of years, given companies like Google releasing personal AIs and of course Apple’s SIRI, Google Home and Amazon’s Alexa. And it will be still far away from human level AI. However, the point is that such an AI won’t be programmed to specifically prevent you from sending videos to your CEO when you’re drunk. It will be impossible to humanly define all the rules we want such an AI to follow, similar to the expert systems in the 80’s being programmed with specific knowledge. More importantly, the AI also doesn’t blindly follow an objective to do what you wish for. If it would have asked you whether you would be happy it prevented you from sending the video when you were still in the bar, you probably wouldn’t have been amused.

4 It’s not going to be ‘Us Versus Them’

It’s not only Hollywood that’s thinking about AI as them vs us, the Wait But Why example is also very much describing an AI that’s completely separate from us. Somehow we humans really tend to think in terms of them vs us, as the world today sadly clearly shows. This them vs us is an artifact of our primal objective function to replicate our own DNA that turns out to be really though to get rid off. Any way, as Ray Kurzweil writes in his classic The singularity is near AI is quite likely to be more ‘us’ before it becomes at the human level. We might be able to enhance our brains with artificial computing power or being able to directly tab into the cloud from our brains. It’s just really hard for us to imagine today what this is going to look like, just like we could never imagine what the world would look like today in an era when internet still had to be invented.

Whatever the future will be, it’s important to know that AI is heavily being trained on data generated by humans. And most data that’s produced is data we generate living our daily lives. Therefore, probably the most efficient way of making AI smarter is to integrate it in our daily lives, like Google assistant type of projects. Training will be probably speed-up with reinforcement learning (training AI based on simulations, like AlphaGo training itself by playing incredably many games against itself). But human generated data will be crucial to reaching human level AI, since the real world is much complexer than a game of Go. More importantly, you can’t just let an AI go trial-and-error in the real world. It’s actually a good thing AI will be learning from human daily behavior, since not only will it be an efficient way in training the model (i.e. how do you do things), it will also be the perfect input for building the model of objectives (i.e. why do we do the things we do the way we do). With such a lot of exposure to human behavior before even coming close to human level intelligence, AI will learn gradually how to blend in perfectly into our society. As long as no humans are allowed to overwrite the model of objectives by some hardcoded objective, AI won’t go rogue. Just like neural nets are robust (i.e. removing some parts of the model will hardly impact the performance), the model of objectives will be robust as long as it is only evolving slowly by learning from new data instead of humans messing too much with the objective functions.

5 The Genie in the Bottle

Everybody has heard the fairy tale of the genie in the bottle: some genie or another magical creature grants the main character three wishes. These wishes always backfire, because of a combination of ill chosen wishes and the genie taking the liberty to define the details on how these wishes are actually fulfilled. The wisdom of these fairy tales is nowadays reflected more than ever in the topic of superintelligent AI.

It’s time for a thought experiment. Forget my previous point about AI evolving gradually alongside humans and imagine you’re suddenly facing AI of superhuman intelligence. You would be granted three wishes. What would you do?

Don’t run off quoting the Three Laws of Robotics, give it a thought!

I’d came up with a strategy like this:

"Wow AI buddy! Before having any effect on the universe I wish you to build an exact model of my mind in your internal intelligence. Then figure out how to let me know you completed your task."

"Amazing, it took only 9 milliseconds! And it knows I like the hitchhikers guide!"

"Now for my second wish. I wish that for every potential response you’re going to give to one of my wishes, you’re going to test it on the model of my mind you just made. If the consequences are too many to comprehend for my mind model, simply copy the model of my mind and let each copy evaluate part of the consequences. If any of the copies of my mind thinks the world will be worse than it currently is, come up with a better plan. If you run out of internal intelligence to create enough copies to fully comprehend the consequences, come up with a plan that’s less complex. If you’re still working on it when I make another wish, simply ignore the wish you’re working on and start working on the new wish. If you find a response to a wish that satisfies this test, then execute it."

"Oh and for my third wish, I want you to serve me for infinity and grand me unlimited wishes!"

Now take some time to think about it. The definition might not be very exact, but since we’re facing superintelligence, the AI will understand you very well. And your second wish will prevent the AI from doing anything in a way you won’t like or intended! Well yes, it does, but you also just made the AI pretty useless. With these restrictions, the AI won’t do anything except for maybe some very simple (and useless) tasks. This is because of two reasons:

  1. It’s impossible to know all the consequences of pretty much any action with complete certainty, even for a superhuman AI.
  2. It’s almost impossible to find a response to your wish for which every single consequence will have a positive impact: there’s always a trade-off. However, since a single copy of your mind can’t comprehend the full consequences, you’ll need multiple copies of your mind to evaluate part of the consequences. However, you can’t simply add up the responses and do something if you like the majority of the consequences. It’s hard to make a trade-off if you can’t fully comprehend the consequences at once.

The solution, I think, will be involving probabilities and asymmetry in evaluation. What I have in mind is that an AI will only do something if there is a very, very low probability that you won’t like the consequences. And if you dislike part of the consequences a little, you really have to like the benefits a lot (like a ten times more), in order for an AI to take action. Basically AIs should be much more risk averse than we humans are.

6 With Great Power Comes Great Responsibility

The example above is of course rather selfish, since you ask the AI to only simulate your mind, not those of others. However, I believe that the way the AI will evaluate the consequences, will already make more humane decisions than we do nowadays. Especially in the case of ‘us vs. them’, people don’t think about the consequences for others, and sometimes specifically choose not to think about what this means for others. However, the AI will specifically use some of the copies of your mind to test whether you like the impact the execution of your wish has on other people. When confronted with the facts, I think I believe people will be a lot less selfish.

However, it would be much better to not only use your mind in evaluating the response to your wish, but that of very many people, preferably the whole world population. So you can ask your wishes, but how they are granted is evaluated against the minds of the whole population. And luckily, most likely AI is going to evolve learning from the minds of (almost) the complete populations, or at least as many people as possible. The most efficient way of training AI, is to collect and combine as much data as possible from behavior of people in their daily lives. Therefore, there is a strong incentive to include as many people as possible. Moreover, people of diverse backgrounds will be more useful than similar people, since there is less information to gain from similar people.

7 Companies that Act More Humane

Of course companies will also be making use of AI. Companies have a pretty dangerous objective function: maximizing profit. Although no human can fully comprehend the impact of all the decisions made in a company, somehow companies do behave quite well in reality. Of course there are some bad examples, but think about it: it could be so much worse if companies would be blindly following the objective function of maximizing profit. It’s thanks to the model of objectives of humans that companies behave so well.

In this case it’s really a difference between live and death whether AI will learn a model of objectives from working together with humans, or a team of employees that comes up with some objective functions that will steer an ‘inhumane AI’. The later will be a doom scenario. However, the first could actually make companies behave more humane than they currently are. Think about it, we humans can’t comprehend the consequences of all those decisions made in a company. Wouldn’t it be much better if the employees at a company could wish what they want to achieve, but how it is executed is carefully evaluated by an AI including the minds of the complete population?

8 Solving the Flaws of Democracy

Democracy surely has it’s flaws. However, it’s the best flawed option we have. With AI, we will have a better alternative. I’ll consider two flaws of democracy, the terror of the majority and voters actually don’t know much about what they are voting for.

Democracy works best if everybody votes what they think is best for the country as a whole. Unfortunately, in reality people tend to vote what’s best for themselves. The terror of majority is the case in which a majority of the population, can completely rule over a minority in the population, just because they have a majority. This most certainly doesn’t have to be humane. Luckily we have our models of objectives that keep us on the right track most of the times. With AI we’ll have much better option. Instead of putting our trust in some politician, the AI can simply run every single political decision against the full simulation of the minds of the complete population. Using asymmetric evaluation, the AI can prevent doing anything that benefits a majority of the population, while having too negative consequences for smaller groups in society. Just imagine how perfectly you could set taxes if you could actually run simulations on what the true impact would be on people.

Second, voters are probably not the best ones to decide about the best solutions since they are not the experts. Moreover, people don’t like to spend there time in learning the details about politics and specific topics they are voting for. However, using AI that simulates the minds of the full population, anybody could have a copy of their minds dedicated to understanding the consequences of a political decision and evaluating whether they like the results or not. As a result, the decisions that are actually made will be definitely much wiser than the decisions that are currently made.


I hope you liked this thought experiment! To conclude, I think these are the points that will help to keep AI evolving on the right path:

  • AI should move more and more to having a complex model of objectives, like we humans do, instead of simplistic objective functions.
  • We will need to except that we won’t be able to fully comprehend this model of objectives.
  • These models of objectives should only be able to evolve slowly based on new input data. They should never be overruled by hardcoded human objective functions.
  • AI should evolve alongside humans as personal assistants, so they will have plenty of exposure to human behavior not only to learn how to do things, but also learn their model of objectives (why do you do what you do).
  • Although we humans have pretty diverse and sometimes oposing objectives, the complexity of models of objectives makes us all fit in to society pretty well. The same will be true for AI.
  • In fact it will not be humans vs AI, but both will blend together one way or another.
  • When AI becomes superhuman, it will actually be possible to make an exact model of our minds. This should be used to asses any possible action of an AI with a simulation against copies of human minds.
  • These human minds should reflect the complete world population. Luckily, the most efficient way of training AI is by feeding it as much human behavior as possible. Therefore, AI will be able to learn it’s model of objectives from as many and diverse people as possible.
  • Hopefully, the AI that will be used by companies will be run by the same AI, and therefore any possible (AI enhanced) action will be evaluated by models of minds of the world population. This could actually make companies behave more humane than they currently do.
  • It could improve democracy too.

Looking forward to hearing your thoughts!

Leave a Reply