Artificial General Intelligence: What Could Possibly Go Wrong?

I enjoy writing and talking about Artificial General Intelligence (aka AGI). For example, see my Celent report, Artificial General Intelligence: What It Could Mean for Property/Casualty Insurers.

On the one hand, everyone (or almost everyone) says AGI does not exist today. That makes it difficult to object to anything anyone says about AGI. It’s a bit like trying to argue with somebody who says that the average length of unicorns’ horns is exactly 4 inches.

On the other hand, there is some serious money being bet on the idea that AGI will be a real thing within the time horizon that VCs think about. Case in point: A start-up, Safe Superintelligence, started out in June 2024. Describing itself as having, “ . . .the world’s first straight-shot SSI lab, with one goal and one product: a safe superintelligence.” By September 2024 it had raised $1 billion from NFDG, a16z, Sequoia, DST Global, and SV Angel. By March 2025, there was a report of addition investments of $1 billion that valued the firm at $30 billion. Not bad for an early stage start-up.

So will AGI (Safe Superintellgence or any other system) be safe? I’m not going take a stance on that question in this blog. Rather, I will describe what two fictional works from years ago had to say about the safety of something like AGI.

2001: A Space Odyssey is a well known science fiction film which includes a human mission to Jupiter to investigate a mysterious monolith. The spacecraft is largely controlled by a supercomputer HAL. HAL can see and speak with the human crew (having Machine Vision and Natural Language Generation capabilities). So let’s say HAL exhibits AGI.

People programmed HAL to obey two critical rules:

He could not reveal to the human crew the true purpose of the mission (which was related to the monolith)
He could not lie to the human crew

As the plot develops, the conflict between these two rules becomes increasingly severe. HAL resolves the conflict by murdering all the crew (except one who survives by disabling HAL).

Conclusion: HAL commits multiple murders due to the failure of his human programmers to foresee that there was a contradiction between the two critical rules. They also did not realize that HAL’s AGI would resolve the contradiction by murder.

The second work is a collection of science fiction short stories, I, Robot by Isaac Asimov. The collection contains the Three Laws of Robotics. What concerns us in this blog is the First Law:

“A robot may not injure a human being or, through inaction, allow a human being to come to harm.”

A casual reading of the first law creates a sense of safety and security. But a careful reading raises some disturbing questions.

The first phrase, “A robot may not injure a human being . . . “ appears straightforward and even admirable. But the difficulties begin with the rest of the law: “ . . . or, through inaction, allow a human being to come to harm.”
The first phrase prohibits an action that occurs in the present: injuring a human. The second phase prohibits inaction – in effect mandating future activity that will prevent “harm.”

But there are many kinds of harm, including: physical, emotional, social, environmental, and others. Just as importantly, who is going to define what actions must be taken so that harm does not occur?

For example, one robot might be programmed in a way that it concludes that certain forms of education will cause great harm to many people within a few years. That robot will then take a number of actions to prevent that harm from occurring.
Another robot, even with the same programming, might conclude that those same forms of education are actually beneficial. It will then take actions to prevent any changes because those changes would be harmful

You see the problem. The people in various organizations who are trying to create AGI systems (or robots) are fallible and have a variety of values.

It’s going to be an interesting AGI ride.

Author

Donald Light

Research & Advisory

My insights