(This post is still a work in progress. Please come back later.)
"All get what they want; they do not always like it."
- Aslan, in "The Magician's Nephew" by C.S. Lewis.
The recent advent of LLMs (large language models) like ChatGPT has caused much chatter about what it all means. Is this the long-awaited AGI (artificial general intelligence)? Is it conscious? Does it have rights, or a "soul"? Is this the beginning of the "singularity"? Can it help you with your job, or will it take it away from you? Is this kind of technology too dangerous? Should we slow down, or stop altogether? Is it a danger to humanity? Will it kill us all? And what should we do about it?
Now, I do not intend to tackle any of these questions, except for the one that I think is the most important. And to get there, we need to know the basics of how these artificial intelligences (AIs) work. While I am by no means an expert in the specifics of the currently popular AI models, I do work as a data scientist, and I've built enough machine learning models to know the fundamentals that apply to them all. And nothing I say will require esoteric knowledge specific to any one model.
All these AI models are trained by providing it with lots of examples of what we want the output to be, for a given input. So, for example, the well-known MNIST database has many pictures of handwritten single-digit numbers, along with the correct values for what that handwritten number is supposed to be. A small part of this dataset looks like this:
Image | Correct value |
![]() | 1 |
![]() | 4 |
![]() | 7 |
![]() | 9 |
If you train a model with the images as the "input data", and the correct values as the "what we want the output to be", then you'll get an "AI" that tells you the number in such pictures. This model will then be able to take any similar picture - even the ones that were not in its training data - and give a reasonable answer for what number is supposed to be in that picture.
Now, here's the key point: "what we want the output to be" doesn't have to be the number in the picture. It can be anything you want, as long as you have the data for it. You can, for example, have it be the number of white pixels in the picture, or the number of strokes necessary to draw the image, or the favorite food of the children of that age. As long as you provide the model with the correct, legitimate data that accurately represents the general problem, the model will learn to associate the inputs with the correct output, and will provide its best guess at the correct answer whenever you feed it some input data.
So these models can be trained to do whatever we want. You just have to provide the data that illustrates "what we want". So, ChatGPT was trained on publicly available text on the internet, and what we wanted was for it to generate the reply or the continuation for the input text. Text-to-image models like Dall-E or Midjourney was trained on a bunch of images and their descriptions, and we wanted it to produce the image when provided with the description. Even AlphaGo works under the same principle, except that AlphaGo can generate its own data by playing games against itself. It is then trained on the set of game states, with "what we want" being the move played by the winner of the game.
I'm obviously glossing over a ton of details here, but those are the broad and fundamental strokes. Essentially, these AIs solve all kinds of "how" problems for us. We now know how to play a good game of chess or go, how to reply to any text prompt, and how to create an image from just a text description. This leaves us with the "what" problem. What do we want? What do we value?
At this point it's already quite passé to write about what people are doing with AI, or what people will soon do. But to retread this ground: some people are using it to make money: what they want is money, and they'll train the AI to provide the "output" that maximizes their income. Others have used it to perform better on online dating platforms. They want more, better sexual opportunities, and they've trained the AI that optimizes their actions for that. No doubt some will skip online dating entirely, and go directly to enhancing porn with AI. That is what they decided they want. Others still will use it to gain attention - for likes and followers for their social media accounts. Some will use this to propagate their agenda, to flood the internet with their posts and their accounts. It's then a short skip over to politics, where someone will decide that "what we want" is for their party to win the election, and train an AI accordingly.
Others still will decide that intelligence is the best thing ever, that it's the source of everything humans have ever achieved. So what they want is more intelligence, and they will thus train an AI to be smarter. The idea of the "technological singularity" is that iterations of this process will snowball rapidly to create a super-intelligence, beyond any human ability to understand or control. The old story here is about an AI that's told to make more paperclips: the AI gets some intelligence upgrades to help it do its job, until it one day it triggers this runaway singularity process, and ends up enslaving humanity on its way to turn the whole universe into paperclips.
But why stop at politics or intelligence? Any AI smart enough to understand intelligence can also understand power. Why not go straight for that instead? After all, power gets you everything else: power to make money, to have sex, to command attention, to win elections, to enhance intelligence - power even to get more paperclips, or whatever else you desire. Thus, power is one possible ultimate answer, one endpoint and sink to the question of "what do you what?"
An AI trained in this way will want power for its own sake, to have more power. Such an AI will optimally pursue any of the things we considered above (money, sex, attention, politics, intelligence, etc.), or any other necessary objective, in order to get more power. You can be sure that people will seriously consider creating such an AI, as soon as our AI gets "general" enough to trained accordingly. Indeed, all of our other AIs will have been developing components and trial runs for this ultimate AI. Then, someone may get the unfortunate idea that the best metric for power - its best demonstration - is to make others suffer. To specifically prove that you're maximally more powerful than they by inflicting on them the specific thing they most wish to avoid. This will then be the optimization function for this nigh-omnipotent AI.
This is a bleak picture - not just in its final state, but along every step. I don't want a nigh-omnipotent AI inflicting maximum suffering on us. I don't want a nigh-omnipotent AI at all, or even one that makes maximal money as its only objective function. But is all this inevitable?
Absolutely not. After all, we don't have to train an AI this way. Nor does that mean that we have to eschew AI like technophobic luddites. We've been working under the paradigm that we'll simply train an AI to do what we want. Well, it turns out that we don't always like getting what we want. How should we decide what we should want? What should the objective function for the AI be? What do we REALLY want, in the end?
This is a hard question. Can an AI answer it for us? No, I don't think so. At least not any of the AIs as we have them today. Remember, the fundamental function of all the AIs we've discussed is to take in data, and give us what we said we wanted. So it seems that there's some kind of bootstrapping step missing, in that we have to give the AI "what we want" as an input, but expect "what we REALLY want" as an output.
Also note the time horizon. It's easy enough to optimize for some good time tonight. It's much harder to not regret it tomorrow, or next year, or in a generation. And given that we expect AI to be ubiquitous in our future society, "what we really want" are the first principles of long-term world-wide organization. This basically makes it impossible to obtain any training data. History is perhaps our best guide in showing us where "what we want" went wrong, but even there it's hard to say what we should have wanted instead, nor is there enough societal permutations in the historical record to adequately explore the space. And given that it would take at least a historical era to generate reliable labels for "what we REALLY want", iteration times for developing such an AI would take forever.
Certainly, Als will be able to assist us in all this, as they can in many other things. But fundamentally, AI needs us to set their goals for them. They will give us what we want, but they cannot tell us what we SHOULD want. We have to decide that for ourselves.
So, what should we want? We're not talking about a new problem here. This issue is not specific to AI. We're talking what we REALLY want, what we SHOULD want. We're talking about the overarching organizing principle - for us, for our AI, for our whole society. We're talking about iteration times that last entire historical eras in a process that takes us to eternity. We're talking about approaching the Ultimate Good. We are, of course, talking about morality.
Morality is the alignment with the Ultimate Good. So by definition, a moral AI is better - that is, 'more good' - than a less moral one. This holds true regardless of where it ranks on any of the other dimensions discussed above. Morality is, again by definition, that which decides the value of these lower-order goods, after taking everything into account. Such an AI may pursue power or money or intelligence, insofar as these things align with its objective, but it may sacrifice any of these for some greater good. Its decision will be right precisely insofar as it is moral, and it will be "improved" precisely by becoming better aligned with the Ultimate Good.
This is what we must provide to an AI, because it cannot provide it for itself. For those concerned about being completely "replaced" by an AI, there is honestly great comfort in our role here. This doesn't mean that your job will be safe, or that your partner will not leave you for an AI. But it does mean that in our essential humanity, in reflecting the image of the Ultimate Good, we have something that's still uniquely ours. And it may be the thing that decides everything in the end. We may say, "may the best AI win". But what that means is victory for the one who trains their AI with the most moral goals.
But how do we train such an AI? How do we teach an AI to be good, when we ourselves are so evil?
The Ultimate Good is what we all call "God". But at this level of analysis, this is just a definition. We don't actually gain anything by it. It doesn't actually help us or our AI to be better. What we need is something more solid, to put some substance behind the definition. Or something more Incarnate, to flesh out what we mean by "God". What we need is the person of Christ, to serve as our perfect example. Even if you're not a Christian, but only grant that there is some good in the Christian perception of Christ, he can still give some guidance in how we and our AI can be moral, as illustrated in the following points.
First, in Christ we have a clear repudiation of the pursuit of power for its own sake. That is not a moral goal, or a good objective function. This seen clearly in the "Christ Hymn" in Philippians 2, where Christ does not consider his equal status with God as something to be grasped at, or leveraged for his advantage. He instead made himself nothing, humbling himself as an obedient servant even to the point of his death on the cross.
We furthermore have his injunction of love for your neighbors, and even for your enemies, and the well-known golden rule of "do onto others as you would have them do onto you". These are to be the core basis for our interpersonal relationships, and the organization of our societies.
We then have the one law that Christ says is greater still than to love our neighbors: to love God with all our heart, soul, mind, and strength. While I cannot hope to do any measure of justice to the full meaning of this command, for now let us just say that it prevents our morality from collapsing into short-sighted people-pleasing under the guise of "loving" them. Remember, people often don't know what they should want. This command instead infuses our morality with God's infinite and eternal perspective, and self-consistently reinforces the idea that it's this alignment with his Ultimate Good which is our highest ideal and our greatest, most comprehensive goal.
At least some of that should be non-controversial. It may be easy enough to agree that we should want to be moral, and that we should train our AI accordingly. Well and good. But what about the "power" AI? If an AI is trained to seek power as its ultimate goal, would that not beat the "moral" AI? Because it's, well, more powerful? So if some nefarious agent is training this "power" AI for their ends, would we not need to join them in this arms race just to compete? And wouldn't that just end in the dystopian AI described above?
This is inevitable - if not for one possibility: The "moral" AI might outscale the "power" AI in power. It may be that an AI prioritizing morality actually makes it more powerful, even than an AI trained specifically to maximize power.
At least, we better hope that's true, or else we're all screwed. And I can see how some may think this to be a forlorn hope, or some childish sop. You may feel like Harry Potter, incredulous at being told by Dumbledore that "love" is his supposed power for overcoming Lord Voldemort. But a moment's reflection will show this possibility to be real - and even likely.
For an analogy, it's not hard understand that optimizing for education can be more a more effective strategy for making money than optimizing for money directly. In games like chess or go, it's often better to go for position or influence than to go directly for material or territory. If you want to be happy in your relationships, it's often better to put the relationship before your own happiness. Is it so hard to believe that being good may bring you more power than seeking power directly?
Of course, it's not hard to find short-sighted counterexamples to these patterns. If we have hope for this life only, it's easy to think that there is no good or evil, but only power. Then by all means, let us eat, drink, and seek what is best in life - which is to crush your enemies and inflict suffering on their loved ones. But be assured that this is in fact short-sighted. Its scope is narrow, its training data is incomplete, and its end is its destruction. The will to power will find itself powerless, and its adherents will only whine impotently about "slave morality" when true morality has mastered them all.
In fact, we can see something like this happening in this very discussion about AI. Some AI development has been carried out by agents who neither fear God nor care for men, heedless of moral concerns and believing that their technical abilities will get them money, sex, or power. But when their efforts have finally begun to bear fruit, they now discover that morality matters - that it matters the most of all. Like many atheists in many other fields, they discover that Christians have long been ahead of them just as they get in sight of the mountaintop they wanted to climb. The best way to develop a powerful AI is to be morally righteous, after all.
So, this is not really about AI at all. The idea of the Ultimate Good is profound enough to address the most important questions about AI with such effortless ease, that it can almost seem like it may not be relevant to the question at all. For we humans have always been mislead by false objective functions: we have ever erred on choosing what we wanted. Remember, The Shadow cannot create; it can only corrupt. It therefore cannot create anything of its own for us to follow, but can only induce us to mis-order the things that are genuinely good, like getting us to value power over morality, or money over human experiences.
For example, even before AI, a big question in tech was choosing the right "KPI" or "north star" metric, for experimentation and planning. Teams or companies that got this wrong often suffered as a result. Even before tech - indeed, since the dawn of civilization - money was often the metric of choice, simply because it was quantifiable: whenever we humans put a number to something, we want to see how big we can make it. But the flaws of this objective function are well known, to the point that it has one of the seven deadly sins dedicated to it. In fact, bad objective functions even go all the way back to Eve, in the garden of Eden. She saw that the forbidden fruit was good for food, pleasing to the eye, and desirable for making oneself wise - but none of these things are what she should have wanted, what she really wanted. Had she obeyed the Ultimate Good instead, who knows what might have happened?
But we need not have taken this circuitous route to establish the supremacy of morality - over power, money, and everything else. All we needed to do was to continue on reading the the "Christ Hymn" of Philippians 2 with a simple faith. After Christ humbled himself even to the point of his crucifixion, God exalted him to the utmost - that everyone everywhere should bow at his name and confess that he is Lord. In being righteous, he was granted everything else. We can therefore trust him with full confidence when he says in a still more succinct passage:
"seek ye first the kingdom of God, and his righteousness; and all these things shall be added unto you."
- Jesus, in Matthew 6:33
************
So, humans would still be "better" than AI at moral reasoning. Not that we're really any good at it, mind you, but it seems that this is simply something that an AI just doesn't do. So avoiding an AI apocalypse may come down to training a moral AI. That is to say, it may be more important to train a moral AI than a smart or a powerful one. Honestly, there is great comfort in that.
But what do we know about the Ultimate Good? What do we know about God?
AI cannot set its own goals. We must set them.
We are talking about morality. A moral AI is better than a smart or a powerful one. Morality is like that by nature, because...
We're talking about the ultimate good. We're talking about the connection to God.
How do we train such an AI? We have some examples.
God is Love. Christ set aside his power.
But what about the "power" AI? wouldn't it win?
Maybe not. morality might scales better.
You'd better hope this works or we're all screwed.
Examples - Harry Potter's "love", exp. points, education, history, etc.
***Examples here - morality turned out to be more important than technical ability. Christians getting to the mountaintop first. Nietzsche's will to power/slave morality
I believe it will, for reasons in the rest of my blog post.
Second part of the Philippians 2 - Christ became more powerful. He outscaled the power-grasping strategy.
Of course, this is an old discussion, and goes beyond AI.
Wrong "objective functions" are already around. Wrong "north star" metrics, wrong decision metrics. Online experimentation, wrong KPIs, any metric can be gamed, etc.
Money as an objective metric.
Even Adam and Eve ate of the fruit for "good" reasons, but in the wrong order.
The shadow cannot create; it can only twist and corrupt. We become twisted and corrupted when we seek things in the wrong order.
So I'm only giving a very old prescription.
Seek ye first.
****************
humans are "better" at it. There is great comfort in that. One more example of Christians getting to the mountaintop first
Humans are still bad. Ultimately, what is God like? God is love. Christ set aside his power. That's why Christ commands us to seek first the kingdom of God.
But what about the "power" AI? Won't it win in the end?
Maybe not. Our hope lies in the second part of Christ's command, that "all these things will be added onto you". Putting morality first should (we hope) outscale the power AI.
Like in Harry Potter where power of love outscales Voldemort.
The AI that wins is the most moral one. It will be more powerful than the power one.
But this is not unexpected. It's been this way forever.
The shadow cannot create; it can only twist and corrupt. We become twisted and corrupted when we seek things in the wrong order.
Wrong "objective functions" are already around. Wrong "north star" metrics, wrong decision metrics. Online experimentation, wrong KPIs, any metric can be gamed, etc.
Money as an objective metric.
Even Adam and Eve ate of the fruit for "good" reasons, but in the wrong order.
Seek ye first
So, it all comes down to what we want.
what is the objective function
what do you want?
Experiments, metrics
Money
it's always been the case, for ever
You can have what you want. You may not like it at the end
people are currently doing all kinds of things
the importance of morality, as the determiner of the objective function
harry potter and the "power of love" example
as for me, seek ye first
You must be logged in to post a comment.