I also posted this article on Less Wrong under the name Carn
This is a response to this article on Less Wrong, Reward is not the optimization target
People have grown skeptical of the claim that an artificial intelligence would learn to intrinsically value reward over what the reward is tied to. I would have to disagree. I am going to argue that any AI that would surpass it's designers would have to value reward over it's goals, otherwise it wouldn't be smarter than humans.
Being smarter than humans is defined as coming up with solutions that humans can't.
I am going to make a comparison of computers to humans and evolution. I will be talking about evolution as if it had volition, but that's just a metaphor to get my point across easier. You could replace it with God if you want, I am just describing the way our world works. Evolution "wants" us to spread our genes. There are two ways it could do that. It could have all the instructions of how to reproduce hard coded into our DNA, or it could push us into reproducing by rewarding certain behavior ~ not dying, having sex, and let us figure out the rest. Although you can find the first method in simple viruses and bacteria, the second has led to more complex organisms completing their goal of passing on their genes since the beginning of life. However, recently, there has been one species that has been messing up this system. Humans.
Humans have found ways of achieving reward without accomplishing "evolution's goals". Some people eat lots of ice cream because it's rewarding to them, but excessive ice cream eating leads to you dying, not the opposite. Some people have sex with condoms, leading to no children being created. Drugs have no counterpart in nature, but they are widespread in society. In a sense, we have "outsmarted" evolution, using it's processes for own goals rather than what they are "supposed" to do. A system meant to promote reproduction in humans can lead to us preventing it.
If you look at the history of early artificial intelligence research, most of it was trying to hard code an algorithm to accomplish whatever goals the researcher wanted. This kind of research has accomplished many things, Optical Character Recognition, chess playing robots, and even basic chat bots. Many of these things were considered artificial intelligence in the past, and some people thought you would need an Artificial General Intelligence to do these things. However none of them ended up creating an Artificial General Intelligence, because they were algorithms programed by researchers. They would only do what was explicitly in their code, so they were limited by what their designers could think of.
However, the problems have gotten too complicated for a man made algorithm to suffice. Now days, researchers use a carrot and stick approach to AI, and let the algorithm figure out how to solve problems on it's own. This, along with an increase of computing power, has lead to massive developments in Artificial Intelligence. If you want to see how major of a paradigm shift this was, look at the chess games between Stockfish, the most advanced traditional hand written chess algorithm, verses AlphaZero, a reward driven artificial intelligence. AlphaZero beat Stockfish 290 to 24, and AlphaZero only had 4 hours to learn how to play the game.
This article from now gets very far fetched. Take it all with a mountain of salt. I am now going to argue that AlphaZero, a chess playing robot, has already achieved some kind of introspection; it is semi-conscious. AlphaZero has surpassed any human, in fact, all humans put together (Stockfish), in playing chess. In order to do that, it would have to come up with it's own strategies, nothing any human would ever think of. If you do watch humans trying to analyze AlphaZero's playing strategy, it doesn't make any sense to them. It's moves are completely nonsensical, but somehow, it (almost) always ends up on top. Stockfish was a perfection of human strategies, but AlphaZero is completely alien. Again, AlphaZero has surpassed humanity (in chess), by doing things no human would ever think of. In order for it to surpass the examples of chess games given to it to start, it had to move beyond trying to perfectly copy it's input, and truly understand the game of chess. It would have to think about the consequences of it's actions. And it has proven that it understands the game of chess far beyond any human. If you found an agent that could understand, not just know, things to the same level, or even beyond you, would you not call that, in some form, conscious?
Since it has moved beyond copying it's input, but has started to generate it's own strategies, AlphaZero can do things that no researcher would even think of. It is finding ways of generating reward in ways outside of the researcher's conception, but still within the bounds of the system (playing computer chess and not much else). If it wasn't valuing reward (win game) over it's explicit programming (learn chess from humans), it wouldn't be able to surpass the sum of human knowledge. AIs do things like this all the time, but this is normally considered a bad thing. Amazon made an AI to hire people based on their resume, but the AI proved that it understood how humans judge people better than we do by becoming sexist. It looked past the explicit goals of the designer (hire effective workers), and instead moved to what got it the most reward (hire people managers like). It maximized reward within the bounds of the system (choose people to hire).
What would happen if the artificial intelligence was allowed to have the whole world as it's system, and it was given the capabilities of perceiving the things it it's environment? An example I could think of would be a therapy bot designed to help depression by talking to people, and it's given access to the internet and the facilities to parse web pages to understand human nature. You start off by giving it a script of a therapist giving encouraging words to someone, you assign it some patients, and then you check up on it on a week.
I'm sure you would find something horrible. Either everyone would be members of the happy happyism cult, or it would yell berating words at it's patients until they respond, "WOW I'M SO HAPPY DEPRESSION CURED," so that they can leave and never come back. In fact, the more intelligent the AI is in coming up with it's own solutions, the more the result strays from the goal, assuming there is a more effective way of achieving reward over the purpose of the AI's existence. This wouldn't be possible with traditional AI. The fact that Artificial Intelligences in real life do tend toward things like this proves that they are valuing reward over the initial goals. In the same way we have "outsmarted" evolution, the AI would have outsmarted it's designers.
P.S. You could talk about why this happens, but it doesn't take away from the fact that it does happen.
P.P.S. In another article, I wrote why this would lead to self destruction in a sufficiently conscious AI, now I understand with caveat that it has enough control and the ability to perceive the world. I called it Why a conscious AI won't take over. As a whole, I am more unsure about this line of reasoning, but this is a conclusion that it leads to.
P.P.P.S. I believe that there are many systems in the world that act like AGIs optimizing for specific targets: complex life forms, corporations, societies, and even countries, and we could learn a lot about AI by studying how they act in situations, but that's a post for another time.