Theology, philosophy, math, science, and random other things

What do we want? Reflections on AI and morality

"All get what they want; they do not always like it."

- Aslan, in "The Magician's Nephew" by C.S. Lewis.
The basics of AI

The recent advent of LLMs (large language models) like ChatGPT has caused much chatter about what it all means. Is this the long-awaited AGI (artificial general intelligence)? Is it conscious? Does it have rights, or a "soul"? Is this the beginning of the "singularity"? Can it help you with your job, or will it take it away from you? Is this kind of technology too dangerous? Should we slow down, or stop altogether? Is it a danger to humanity? Will it kill us all? And what should we do about it?

Now, I do not intend to tackle any of these questions, except for the one that I think is the most important. And to get there, we need to know the basics of how these AIs work. While I am by no means an expert in the specifics of the currently popular AI models, I do work as a data scientist, and I've built enough machine learning models to know the fundamentals that apply to them all. And nothing I say will require esoteric knowledge specific to any one model.

All these AIs are trained by providing it with lots of examples, of the output we want for a given input. So, for example, the well-known MNIST database has many pictures of handwritten single-digit numbers, along with the correct values for what that handwritten number is supposed to be. A small part of this dataset looks like this:

ImageCorrect value

If you train a model with the images as the "input", and the correct values as the "output we want", then you'll get an "AI" that tells you the number in such pictures. This model will then be able to take other pictures - even ones that were not in its training data - and give a reasonable answer for what numbers might be in them.

Now, here's the key point: "the output we want" doesn't have to be the number in the picture. It can be anything you want, as long as you have the data for it. You can, for example, have it be the number of white pixels in the picture, or the number of strokes necessary to draw the image, or the favorite food of the children of that age. As long as you provide the model with the data that accurately represents the general problem, the model will learn to associate the inputs with the correct output, and provide its best guess at the correct answer for any given input.

So these models can be trained to do whatever we want. You just have to provide the data for "whatever we want". So, ChatGPT was trained on text on the internet, and we wanted it to generate the reply or the continuation of an input text. Text-to-image models like Dall-E or Midjourney were trained on images and their descriptions, and we wanted it to produce the image when provided with the description. Even AlphaGo works under the same principle, except that AlphaGo can generate its own data by playing games against itself. It is then trained on the set of game states, with "what we want" being the move played by the winner of the game.

I'm obviously glossing over a ton of details here, but those are the broad and fundamental strokes. Essentially, these AIs solve all kinds of "how" problems for us. We now know how to play a good game of chess or go, how to reply to any text prompt, and how to create an image from just a text description. This leaves us with the "what" problem. What do we want? What do we value?

The objective function

At this point it's old news to write about what people can do with AI. Some people are using it to make money: what they want is money, and they'll train the AI to provide the "output" that maximizes their income. Others have used it to perform better on online dating platforms. They want more, better sexual opportunities, and they've trained the AI that optimizes their actions for that. No doubt some will skip online dating entirely, and go directly to enhancing porn with AI, or developing AI partners. That is what they decided they want. Others still will use it to gain attention - for likes and followers for their social media accounts. Some will use this to propagate their agenda, to flood the internet with their posts and their accounts. It's then a short skip over to politics, where someone will decide that "what we want" is for their party to win the election, and train an AI accordingly.

Others still will decide that intelligence is the best thing ever, that it's the source of everything humans have ever achieved. So what they want is more intelligence, and they will train AIs to be smarter. The idea of the "technological singularity" is that iterations of this process will snowball rapidly to create a super-intelligence, beyond any human ability to understand or control. There's an old story here about an AI that's told to make more paperclips: the AI gets some intelligence upgrades to help it do its job, until it one day it triggers this runaway singularity process, and ends up enslaving humanity on its way to turn the whole universe into paperclips.

But why stop at politics or intelligence? Any AI smart enough to understand intelligence can also understand power. Why not go straight for that instead? After all, power gets you everything else: power to make money, to have sex, to command attention, to win elections, to enhance intelligence - power even to get more paperclips, or whatever else you desire. Thus, power is one possible ultimate answer, one endpoint and sink to the question of "what do we what?"

An AI trained in this way will want power for its own sake, to have more power. Such an AI will optimally pursue any of the things we considered above (money, sex, attention, politics, intelligence, etc.), or any other necessary objective, in order to get more power. You can be sure that some people will seriously consider creating such an AI, as soon as our AI gets "general" enough to be trained accordingly. Indeed, all of our other AIs will have been developing components and trial runs for this ultimate AI. Then, someone may get the unfortunate idea that the best metric for power - its best demonstration - is to make others suffer. To specifically prove that you're maximally more powerful than they by inflicting on them the specific thing they most wish to avoid. This will then be the optimization function for this nigh-omnipotent AI.

This is a bleak picture - not just in its final state, but along every step. I don't want a nigh-omnipotent AI inflicting maximum suffering on us. I don't want a nigh-omnipotent AI at all, or even one that makes maximal money as its only objective function. But isn't all this inevitable?

Absolutely not. After all, we don't have to train an AI this way. Nor does that mean that we have to eschew AI like technophobic luddites. We've been working under the paradigm that we'll simply train an AI to do what we want. Well, it turns out that we don't always like getting what we want. Then how should we decide what we should want? What should the objective function for our AI be? What do we REALLY want, in the end?

This is a hard question. Can an AI answer it for us? No, I don't think so. At least not any of the AIs as we have them today. Remember, the fundamental function of all the AIs we've discussed is to take in data, and give us what we said we wanted. So it seems that there's some kind of bootstrapping step missing, in that we have to give "what we want" as an input, but expect "what we REALLY want" as an output.

Also note the time horizon. It's easy enough to optimize for some good time tonight. It's much harder to not regret it the next day, or the next year, or in a generation. And given that we expect AI to be ubiquitous in our future society, "what we really want" are the first principles of long-term world-wide organization. This basically makes it impossible to obtain any training data. History is perhaps our best guide in showing us where "what we want" went wrong, but even there it's hard to say what we should have wanted instead, nor is there enough societal permutations in the historical record to adequately explore the space. And given that it would take at least a historical era to generate reliable labels for "what we REALLY want", iteration times for developing such an AI would take forever.

Certainly, Als will be able to assist us in all this, as they can in many other things. But fundamentally, AI needs us to set their goals for them. They will give us what we want, but they cannot tell us what we SHOULD want. We have to decide that for ourselves.

The Ultimate Good

So, what should we want? We're not talking about a new problem here. This issue is not specific to AI. We're talking about what we REALLY want, what we SHOULD want. We're talking about the overarching organizing principle - for us, for our AI, for our whole society. We're talking about iteration times that last entire historical eras in a process that takes us to eternity. We're talking about approaching the Ultimate Good. We are, of course, talking about morality.

Morality is the alignment with the Ultimate Good. So by definition, a moral AI is better - that is, 'more good' - than a less moral one. This holds true regardless of where it ranks on any of the other objective functions discussed above. Morality is, again by definition, that which decides the value of these lower-order goods, after taking everything into account. Such an AI may pursue power or money or intelligence, insofar as these things align with its objective, but it may sacrifice any of these for some greater good. Its decision will be right precisely insofar as it is moral, and it will be "improved" precisely by becoming better aligned with the Ultimate Good.

This is what we must provide to an AI, because it cannot provide it for itself. For those concerned about being "replaced" by an AI, there is honestly great comfort in our role here. This doesn't mean that your job will be safe, or that your partner will not leave you for an AI. But it does mean that in our essential humanity, in reflecting the image of the Ultimate Good, we still have something to offer, something that's still uniquely ours. And this may be the thing that decides everything in the end: may the best AI win - which means victory for the one who trains their AI to be the most righteous.

But how do we train such an AI? How do we teach an AI to be good, when we ourselves are so evil?

The Ultimate Good is what we all call "God". But at this level of analysis, this is just a definition. We don't actually gain anything by it. It doesn't actually help us or our AI to be better. What we need is something more solid, to put some substance behind the definition. Or something more Incarnate, to flesh out what we mean by "God". And history has provided us with a data set for such an entity, of an actualized instance of the fullness of the Ultimate Good: what we need is the person of Christ, to serve as our perfect example. Even if you're not a Christian, but only grant that there is something good in Christ, he can still give clear guidance in how we and our AI can be moral, as illustrated in the following points.

First, in Christ we have a plain repudiation of the pursuit of power for its own sake. Being power-hungry is not a moral goal, or a good objective function. This is seen clearly in the "Christ Hymn" in Philippians 2, where Christ does not consider his equal status with God as something to be grasped at, or leveraged for his advantage. He instead made himself nothing, humbling himself as an obedient servant even to the point of his death on the cross.

We furthermore have his injunction of love for your neighbors, and even for your enemies, and the well-known golden rule of "do unto others as you would have them do unto you". These are to be the core basis for our interpersonal relationships, and the chief principles for the organization of our societies.

We then have the one law that Christ says is greater still than to love our neighbors: to love God with all our heart, soul, mind, and strength. While I cannot hope to do any measure of justice to the full meaning of this command, for now let us just say that it prevents our morality from collapsing into short-sighted people-pleasing under the guise of "loving" them. Remember, people often don't know what they should want. This command instead infuses our morality with God's infinite and eternal perspective, and self-consistently reinforces the idea that it's this alignment with his Ultimate Good which is our highest ideal and our greatest, most comprehensive goal.

At least some of that should be non-controversial. It may be easy enough to agree that we should want to be moral, and that we should train our AI accordingly. Well and good. But what about the "power" AI? If an AI is trained to seek power as its ultimate goal, would that not beat the "moral" AI? Because it'd be more powerful? So if some nefarious agent is training this "power" AI for their ends, would we not need to join them in this arms race just to compete? And wouldn't that just end in the dystopian AI described above?

Righteousness and power

This is inevitable - if not for one possibility: The "moral" AI might outscale the "power" AI in power. It may be that an AI prioritizing morality actually makes it more powerful, even than an AI trained specifically to maximize power.

At least, we better hope that's true, or else we're all screwed. And I can see how some may think this to be a forlorn hope, or some childish sop. You may feel like Harry Potter, incredulous at being told by Dumbledore that "love" is his supposed power for overcoming Voldemort. But a moment's reflection will show this possibility to be real - and even likely.

For an analogy, it's not hard to understand that optimizing for education can be a more effective strategy for making money than optimizing for money directly. In games like chess or go, it's often better to go for position or influence than to go directly for material or territory. If you want to be happy in your relationships, it's often better to put the relationship before your own happiness. Is it so hard to believe that being good may bring you more power than seeking power directly?

Of course, there's plenty of short-sighted counterexamples to these patterns. If we have hope for this life only, it's easy to think that there is no good or evil, but only power. Then by all means, let us eat, drink, and seek what is best in life - to crush your enemies, to see them under your power, and to inflict suffering on their loved ones. But be assured that this is in fact short-sighted. Its scope is narrow, its training data is incomplete, and its end is its destruction. The will to power will find itself powerless, and its adherents will whine impotently about a "slave morality" when true morality has mastered them all.

In fact, we can see something like this happening in this very discussion about AI. Some AI development has been carried out by agents who neither fear God nor care for men, heedless of moral concerns and believing that their technical abilities will get them money, sex, or power. But when their efforts have finally begun to bear fruit, they now discover that morality matters - that it matters the most of all. Like many atheists in many other fields, they discover that Christians have long been way ahead of them, just as they get in sight of the mountaintop they wanted to climb. In the end, the best way to develop a powerful AI is to be morally righteous, after all.

The right ordering of priorities

This is an old question that goes far beyond just AI - one that Christianity has been discussing since its inception. The idea of the Ultimate Good is profound enough to address the AI question with such effortless ease, that it can almost seem like it may not be relevant to the question at all. For we humans have always been misled by false objective functions. We have ever erred on choosing what we wanted. Remember, The Shadow cannot create; it can only corrupt. It therefore cannot make anything of its own for us to follow, but can only induce us to mis-order the things that are genuinely good, like valuing power over morality, or money over human experiences.

For example, even before AI, a big question in tech was choosing the right "KPI" or "north star" metric, for experimentation and planning. Teams or companies that got this wrong often suffered as a result. Even before tech companies - indeed, since the dawn of civilization - money was often the metric of choice, simply because it was quantifiable: whenever we humans put a number to something, we want to see how big that number could get, even at the expense of everything else. But the flaws of this objective function are so well known that greed has its place as one of the seven deadly sins. In fact, bad objective functions go all the way back to Eve, even to the garden of Eden. She saw that the forbidden fruit was good for food, pleasing to the eye, and desirable for making oneself wise - but none of these things are what she should have wanted, what she really wanted. Had she obeyed the Ultimate Good instead, who knows what might have happened?

But remember, power and money are not bad things. Neither is knowledge, sex, intelligence, or fruit. The problem is one of priorities. The badness is in the misalignment of these lesser goods with the Ultimate Good - or in the worst case, when they replace the Ultimate Good. But if we get the first thing right, everything else will fall into alignment. We see this supremacy of moral righteousness over all other objective functions in Philippians 2, in the remainder of the "Christ Hymn". After Christ humbled himself even to the point of his crucifixion, God exalted him to the utmost - that everyone everywhere should bow at his name and confess that he is Lord. In being righteous, he was granted everything else. We can therefore trust him with full confidence when he says in a still more succinct passage:

"seek ye first the kingdom of God, and his righteousness; and all these things shall be added unto you."

- Jesus, in Matthew 6:33

Show/hide comments(No Comments)

Leave a Reply