Good and evil genies: the AI alignment problem

I feel like that as much hype as AI has gotten, rightfully so, the people advocating for caution and restraint, such as the Effective Altruism movement, have not been very popular. Part of that is just being in the unenviable and intrinsically unpopular position of a naysayer, but I think a large part comes from not understanding very well what exactly they are afraid of, other than vague fears of a Terminator-style Skynet being created.

At the heart of AI safety issues is the problem of dealing with an entity that is much smarter than we are (like a true, general, wide AI is likely going to be), but has a limited or no understanding or our value and morality system. These thought experiments often think of AI as a genie that can grant wishes… but while adhering to the letter of your wish, the outcome is likely not going to be what you wanted, unless the genie somehow understands exactly what you value.

Eliezer Yudkowsky is one of the very smart people working on the AI safety and alignment, and he wrote The Hidden Complexity of Wishes which explains the problem in a clear, concise, and entertaining way. The Rational Animations YouTube channel has a beautifully animated version of the same article:

A follow-up video on the same channel continues to explore the topic, and talks about Specification gaming more explicitly in the context of AI:

One thing that is not commonly talked about, and the article and video also do not cover is that the human value- and morality system are actually incomplete and inconsistent, clear from philosophical dilemmas such as the Trolley Problem and its variations. Even worse, even we humans cannot all agree on critical questions of morality, even basic ones such as whether there should be such a thing as a death sentence, whether women should be allowed to have abortions, or whether immigration is a good or a bad thing. How could we possibly expect a super-intelligent AI to not do harm on a massive scale if we cannot even possibly define what is considered harmful?