On Internal Alignment

I’ve spent a fair amount of the last year learning about and acting on issues of internal alignment – bringing satisfaction of values to all of my parts. In my life I’ve often been quite imbalanced, satisfying the parts that want ‘play’ and ‘escapism’ over all others. The part that wants to learn ended up being drowned out by the part that wants not to be confronted with not being so brilliant I understand everything instantly. The part that wants to save the world was overrun by the part that doesn’t want to be embarrassed, and the former didn’t have anything like enough expectancy to fight back with.

In short, I was a mess… but then, if you’ve been reading the autobiographical posts, you knew that.

I still haven’t sorted everything out, and gotten all of my parts on board, but we’re getting there, bit by bit.

One piece of making this happen has been paying better attention internally, and not trying to just force the outcomes “I” want. It turns out driving yourself that way works out badly over long timescales. I trace a lot of my personal akrasia to being at war inside myself, and I really enjoy the outcome of the massive boost in productivity I’ve gotten from not being at war.

This does, however, mean that sometimes I have to do stuff that’s less productive, on a short time horizon. At the end of last week I started getting requests to take a week or two off. I considered it, and consulted with my partner, and the end result is that I’m taking some me time this week and next. I’ll be taking care of that which is necessary, and that which I find I want to do, but I’m not going to drive myself the way I have been. It’s likely that my book review this week will be delayed, and next week will have fewer than three posts, but I think you’ll forgive me this.

What about you, dear reader? What does your internal alignment look like? What have you learned about working together with yourself?

Book Review – How to Actually Change Your Mind

If I had to point to one book of the six that compose AI to Zombies as the single most important, it would be How to Actually Change Your Mind. Without the ability this book strives to teach, true rationality is not possible – you’d simply believe whatever you were told first, and nevermind that ‘evidence’ stuff.

Leading off with what is possibly the single piece of writing that is my most linked, the book delves into the tools, tactics, and thought patterns that will enable you to truly entangle your beliefs with reality. Eliezer takes no prisoners in explaining what humility is really for, why lotteries are a waste of hope, and how false dilemmas sneakily try to get us to argue against (or for) limited option sets, constraining us from looking at how reality really is.

The next section digs into politics and rationality, with classics like Politics is the mind-killer and Reversed stupidity is not intelligence. The ancestral environment shaped us to have certainty all out of line with our evidence in dealing with political matters, and further, to have our emotions tightly tied to these discussions. This section closes with Human Evil and Muddled Thinking, a mighty argument that unclear thought is a key ingredient of human evil.

Following that, Against Rationalization talks about how we fool ourselves, how we turn fiction into facts and how we take only the tiniest steps forward, when shown that we must move. I think the ideas that most intrigued me in this chapter are in Motivated Stopping and Motivated Continuation. I recognized them in some of my past behavior, and having names for them has helped me over the years to recognize them happening, and stop it.

I could dig into the next few chapters, Seeing with Fresh Eyes, Death Spirals, and Letting Go, but honestly, I’d rather you go and read it yourself – it’s an excellent work that deserves its place in rationalist canon, and I can’t say enough good things about it (I do have other things to do, after all!). Go forth, acquire How to Actually Change Your Mind and become stronger!

On Resilience

I spend some of my time volunteering with ALLFED – the Alliance to Feed Earth in Disasters. I’m working on a paper on maintaining nutritional health in the face of major disaster – a 50% drop in incoming sunlight that reaches the surface is the primary scenario we’re planning around. It’s an interesting question – how do we ensure that everyone has their caloric needs met, and also doesn’t end up deficient in key nutrients, when the foods we grow currently won’t have their needs met?

We’ve found things that will grow under those conditions – potatoes, for example, already grow at that kind of sunlight level in Alaska. They’re impressively resilient. Potatoes can be found all over the planet, from the Andes to Alaska, and they are a great starting point if you want to get people fed. They aren’t, on their own, enough to keep everyone hale and healthy, though.

That said, potatoes aren’t perfectly resilient – as we’ve seen in history, there are ways in which we can lose access to them as a crop. Most crops have at least one problem along these lines – we of course want to grow the variation of a given crop that will give us the largest harvest, which means that one narrowly genetically defined breed ends up being the one that’s grown everywhere, and when something moves into that ecological niche, it explodes. While we have fairly good methods of control, farming as a whole is not resilient.

This has led me to think about resilience in and of itself; resilient systems, resilient species, and the resilience of particular people. In some ways, I’m fairly resilient. I wouldn’t have survived the hell year without it; every once in a while I tell someone that story and they tell me they don’t know how I made it through, or that they couldn’t have, something along those lines.

I’m also fairly fragile along some axes – I recently discussed with my partners, the fact that I have some difficulty taking certain kinds of criticism. They’ve been investing some amount of skull sweat into not breaking me when there are things I need to be told about myself. Always, always, there are ways to become stronger.

It’s definitely something I want to work on, although at the moment things in people’s lives are a little bit too on fire for me to bring it up as an area to target. Soon, I hope, and of course I’ll keep digging into the resources I can work with on my own.

And yourself, dear reader? How resilient do you find yourself? What have you found that helped you become more resilient?

On Moral Relevance

I’ve been trying to decide for a while what moral relevance is made of. At the moment I have an answer that seems to serve in day to day life, but I still have the feeling that I haven’t fully resolved the question, a nagging feeling that I’m not yet capturing everything.

Backing up a step, what do I even mean when I say, “moral relevance”?

I consider a to be morally relevant if I think it’s something that I should take into account when choosing my actions. It is accounted for, so that I do not violate any preferences that it has. In short, violations of ‘s preferences is a bad thing, and to be avoided. matters.

A meaningful amount of the problem lies here. I have an intuitive felt sense of what it means to matter, and what it means for a thing to be bad rather than good, but I don’t think I could give a technical explanation of these concepts. I am in some senses a moral realist – it seems to me that there is something more than game theory in being kind to others. I don’t think that you can find an atom of good or the charge carrier of morality, but you’d be hard-pressed to find an atom of human, either, and they certainly seem to exist.

Axioms seem to be necessary to establishing systems, from what I can tell. Investigating this question, it seems that in at least some systems, with enough progress, you can then go back and prove your axioms (see this quora) but note that what I’ve found also supports another claim I’ve made – that a system with no axioms does and can prove nothing; it has no affordances with which one can do anything.

So, my moral axioms:

  1. There is an axis on which events can be measured, this axis runs from “bad” to “good”, and any given action, fully contextualized, has a location on this axis.
  2. The degree to which an event is bad or good depends strongly (although possibly not exclusively) on the degree of suffering that descends from it.
  3. Evil is a quality that bad events can, but do not necessarily have, which requires intentionality – an evil action is one where a bad event will result, the mind that initiated the action was aware of this, and either didn’t care, or (increasing the degree of evil) actively wanted this outcome.
  4. For two events identical in all respects except for the presence or absence of evil, the event which evil was a part of the existence of, is worse (“more bad”).

From this, it can be seen that moral relevance comes the capacity to suffer. Rocks are not morally relevant, because they don’t, as far as we know, experience suffering. People are morally relevant, because they do. This gets… problematic, if one considers engineered beings. Does a person who is identical to me in every respect, except that they experience suffering at twice the intensity, with their upper bound for suffering, matter twice as me? Does a person identical to me in every respect except that she has no capacity to suffer whatsoever, not matter?

I notice, once more, that I am confused.

On reflection, I think that a version of me who didn’t experience suffering at all might not be morally relevant. If she’s not bothered in the hypothetical case where I hit her with a stick as she walks past, I’m not sure where the bad lies.

Similarly, an action that hurts the double-suffering version of me is at least as bad as taking said action towards me. Is it twice as bad? She suffers twice as much, after all.

The incentive gradients are all screwy here, though. It suggests that the right thing to do is to make people who are capable of suffering as much as possible, and then make sure the world (or other people) don’t hurt them, and that seems obviously incorrect.

I need to think more on this topic, clearly, if I’m getting outcomes like this. What are your thoughts, reader?

Book Review – Map and Territory

Earlier this year I read Rationality: From AI to Zombies(at the bottom of the page, or online here. At 1800 pages, it’s not an adventure for those afraid of thick books, but I’ve read Worm several times. I ain’t afraid of no page count!

While I could reasonably say that I’ve read the Sequences, I did so in kind of a scattershot fashion, over the course of years, and reread by hopping around among pages when I was in Colorado and between calls at work. I wanted to be sure I had in fact read everything, to refresh my memory of it, and to be able to say that I had done so with confidence.

I set myself to the task, and spent quite a few hours working my way through that mighty tome, coming out the far side more rational and stronger (I hope). Last week I read Superforecasting, and when I was talking to a friend of mine about it, they mentioned that given the replication crisis, shouldn’t we be reviewing and updating the sequences to make sure it wasn’t based in any research that didn’t replicate?

This struck me as a pretty good idea, enough so that I would have been willing to put my time into that project. Given this, I reached out to Rob Bensinger, Head of Research Communications at MIRI, since he’d done the original editing to make the Sequences into AI to Zombies, and asked if that was something that was going to happen.

He informed me that not only was AI to Zombies being updated, the first two books had already been released. Big win, and they went straight to the top of my to-read pile.

The first book, Map and Territory, took me four hours to go through. At 354 pages, it was a much less intimidating piece to take on than the whole stack of six-in-one, and the quality’s definitely gone up a bit – I’d be hard pressed to point to any single item, but I found it flowed more smoothly than the AI to Zombies version. It does an excellent job of explaining the relevant concepts, and of keeping you (me, at least) turning pages – I could see sitting down and reading it end to end without moving.

From Scope Insensitivity to The Simple Truth, Map and Territory has excellent flow and kept me hooked. Eliezer’s writing practically sparkles in this edition, with all of the polish that’s been added. If you haven’t read the Sequences yet, definitely pick up the new editions and remedy this. You’ll be glad you did!

On Soul-Forging

As I’ve alluded to before, my childhood was not the best. In all honesty, I was a demon child, and a source of great stress and grief to my mother. I got in vast amounts of trouble in school, and that wasn’t even my trouble-causing final form. Without going into too much detail, it was enough that my mother threw up her hands, unable to deal, and I ended up in the care of the state. This is how I ended up never setting foot in a high school.

I don’t blame her, to be clear – demon child is not an exaggeration, or not much. I definitely needed that shock, and if the programs were not in fact an ideal environment, they did lead to my being the person I am now, and I’m glad to be me, although of course I’m still striving to be someone better.

Something that happened fairly early on here ended up making a huge impact on the shape of my life, and I’ve never properly thanked the person responsible. My mother and stepfather visited me, and dropped of some books, novels. On the cover of one of them was a post-it from my stepfather, with a note stating, roughly, that I should read these books with an eye to learning how good people acted.

I don’t know what he expected. What actually happened, though, was that I got the idea of incorporating, actively, facets of characters I admired. I read a lot, then and now, and I’m always on the lookout for something new to attach. One of the reasons I’ve reread Worm as many times as I have is that I want Taylor’s refusal to lose. As I’ve also noted, I think as a species we are up against some Serious Shit, and I want to bring everything I possibly can to bear on the problems facing us.

I’ve cribbed from books, from games, from television. “I will quote the truth wherever I find it,” says Richard Bach in Illusions, and the concept generalizes – I will add to my self that which is worthy wherever I find it.

The last major addition I consciously made was from Unsong. I can’t do the Comet King justice without a direct quote, and so –

“Do you know,” interrupted Jalaketu, “that whenever it’s quiet, and I listen hard, I can hear them? The screams of everybody suffering. In Hell, around the world, anywhere. I think it is a power of the angels which I inherited from my father.”

He spoke calmly, without emotion. “I think I can hear them right now.”

Ellis’ eyes opened wide. “Really?” he asked. “I’m sorry. I didn’t . . . ”

“No,” said the Comet King. “Not really.”

They looked at him, confused. “No, I do not really hear the screams of everyone suffering in Hell. But I thought to myself, ‘I suppose if I tell them now that I have the magic power to hear the screams of the suffering in Hell, then they will go quiet, and become sympathetic, and act as if that changes something.’ Even though it changes nothing. Who cares if you can hear the screams, as long as you know that they are there? So maybe what I said was not fully wrong. Maybe it is a magic power granted only to the Comet King. Not the power to hear the screams. But the power not to have to. Maybe that is what being the Comet King means.”

I read this when I lived in Colorado and it had a huge impact on me. We don’t need to hear the screams to know it’s happening, and when I look inside myself at the place that knows this, it grabs the part of me that said in a philosophy class once that if your ethical system tells you the right thing to do is something you don’t want to, you either need to run your ethical calculations again, do the thing, or admit that you’re not good, according to your system. It grabs that part, and shakes it, and demands that I make it stop Make It Stop MAKE IT STOP.

And so I work to fix the vast holes in my education, so that I can do something about the state of things. I’ve built myself a soul from parts I found lying around, and it turns out having done so makes me need to do certain things.

I’m happier about this being the case than the child I was could ever have understood.

Why I Want to Work at MIRI

Death sucks.

Death sucks a lot.

I talked about this somewhat in, “On Grief“, about how painful death is on those of us who are still here. Losing people you care about forever is agonizing. To never see them again, never make them smile again, all the unhad conversations and silent moments together, this is unacceptable, but we’re left with it anyway.

There’s also, of course, the loss to the guest of honor at the funeral. All your hopes and dreams, everything you wanted to do and say, the experiences you could have had, all gone. All value gone, turned into a game over screen with no continues and nobody to see it.

We should not leave out the loss to the world, either. A person is gone, taking with them their unique viewpoint and all of their experience. Everything they knew, every skill they had, every idea, gone, just like that, because of a biological hiccup.

It’s unacceptable. It is not to be borne.

The only real resolution that I see to this problem is a superhuman artificial intelligence, aligned to human values. Nothing else offers a complete solution the way more intelligence does. With what we humans have we’ve pushed death back, raised the length of the average lifespan bit by bit as our generations passed, but we still die in the end.

Even if SENS is fully successful, and aging becomes a process that is fully managed and no longer a threat, we still have risks – I’ve been told someone ran the actuarial calculations and found that in the absence of aging we’d average about eight hundred years and die to accidents. It’s certainly far better, but we’d still have to say goodbye.

Aside from that of course we’re still left with the problems of war, disease, and violence. While we’ve reduced these, they’re still factors in the ending of lives, and this doesn’t touch on x-risks – biotech, nanotech, unaligned artificial intelligence, climate change, all threats to us that SENS can’t touch, and solving any of these doesn’t resolve the others, except for AI. An optimizer that’s more intelligent than we are can actually touch these things.

So I’ve made the case now to work in alignment, but why MIRI?

MIRI’s working on a model called HRAD – Highly Reliable Agent Design. Machine learning systems that we have been building thus far are largely trained by rewarding the system for making the ‘right’ choice when given problems, which causes the system to update slightly in that direction. The problem with doing things this way is that we end up with a system that’s a black box – we don’t know what it is that makes it decide what it does in any sort of detail, and giving them novel inputs is a crapshoot – one example is of a system intended to scan photographs for tanks. The researchers fed it a bunch of pictures that had tanks, and a bunch of pictures with out, then tested it on the other photographs in the set, which it classified perfectly.

When they presented it to the military, the military tested it and found that it didn’t do better than chance. It turned out that in all of the training data, the pictures of tanks were taken on cloudy days, and the non-tanks were on sunny days. The system distinguished weather, not tanks.

We need AGI that values what we value, and I think MIRI is doing the best work in this direction, and I want to be a part of it.

Because I want to do the highest leverage thing that I can.

To stop the AI from eating us all.

To end death.