I do independent research on EA topics. I write about whatever seems important, tractable, and interesting (to me).
I have a website: https://mdickens.me/ Much of the content on my website gets cross-posted to the EA Forum, but I also write about some non-EA stuff like [investing](https://mdickens.me/category/finance/) and [fitness](https://mdickens.me/category/fitness/).
My favorite things that I've written: https://mdickens.me/favorite-posts/
I used to work as a software developer at Affirm.
I think this is a legitimate concern.
Data on MATS alumni show that 80% are currently working on AI safety and 10% on capabilities. 10% is still too high but an 80/10 ratio still seems like positive ROI.
(It's not entirely clear to me because 27% are working at AI companies and I'm nervous about people doing AI safety work at AI companies, although my guess is still that it's net positive.)
What do you think about what I wrote about Horizon Institute here? The info I could find about the org made me skeptical of its effectiveness but I may have had misconceptions worth correcting, I assume you know more than I do.
Upvoted; I have some some concerns about this proposal but I do think "Deep Democracy"-aligned AGI would be significantly better than the default outcome (assuming alignment is solved, which is a big assumption) that AGI is aligned to whoever happens to be in control of it. And I think this is an important discussion to have.
And "make AGI democratic" seems much more tractable than "convince everyone that utilitarianism is true, and then make AGI utilitarian".
Suitably deep kinds of democracy avoid the tyranny of the majority, where if 51% of people say they want something, it happens. Instead decisions are sensitive to everyone's values. This means that if you personally value something really weird, that doesn't get stamped out by majority values, it still gets a place in the future.
How does this work mechanically? Say 1% of people care about wild animal suffering, 49% care about spreading nature, and 50% don't care about either. How do you satisfy both the 1% and the 49%? How do the 1%—who have the actually correct values—not get trampled?
You object to aligning AGI(s) to your own values for principled reasons. It would be highly uncooperative, undemocratic, coercive, and basically cartoon supervillain evil.
I value cooperation and not-being-evil, though! If I align AGI to my own values, then the AGI will be nice to everyone—probably nicer than if it's aligned to some non-extrapolated aggregate of the values of all currently-living humans. The Michael-aligned AGI will surely be nicer to animals and digital minds than the democratically-aligned AGI would be.
(This is hypothetical of course; there will never be a situation where AGI is aligned specifically to my values.)
It also seems a bit circular because if you want to build a Deep Democracy AGI, then that means you value Deep Democracy, so you're still aligning AGI to your values, it's just that you value including everyone else's values. Why is that any better than (say) building a Utilitarian AGI, which incorporates everyone's preferences?
P.S. I appreciated the length of this article, when I saw the title I thought it was going to be a 10,000 word slog but it ended up being quite easy to read. (Observing this preference in myself makes me think I need to try harder to make my own writing shorter.)
It sounds like you are asking, do EAs ever apply folk-style interventions to elites, and elite-style interventions to "folks"?
In that case I think the answer is no:
I think it is very unclear whether building AI would decrease or increase non-AI risks.
My guess is that a decentralized / tool AI would increase non-AI x-risk by e.g. making it easier to build biological weapons, and a world government / totalizing ASI would, conditional on not killing everyone, decrease x-risk.
In retrospect my comment was poorly thought-out, I think you're right that it's not directly addressing your scenarios.
I think there are two separate issues with my comment:
RE #1, my sense is that "person is risk-averse with respect to utility" is isomorphic to "person disprefers a lottery with a possibility of doing harm, even if it has the same expected utility as a purely-positive lottery". Or like, I think the person is making the same mistake in these two scenarios. But it's not immediately obvious that these are isomorphic and I'm not 100% sure it's true. Now I kind of want to see if I can come up with a proof but I would need to take some time to dig into the problem.
RE #2, I do in fact believe that utility = welfare, but that's a whole other discussion and it's not what I was trying to get at with my original comment, which means I think my comment missed the mark.
Or maybe I'm misunderstanding, and you're just rejecting the conclusion that there is a moral difference between taking, say, an action with +1 EV and a 20% chance of causing harm and an action with +1EV and a 0% chance of causing harm / think I just shouldn't care about that difference?
Depends on what you mean by "EV". I do reject that conclusion if by EV you mean welfare. If by EV you mean something like "money", then yeah I think money has diminishing marginal utility and you shouldn't just maximized expected money.
Sounds reasonable. My concern is less that the fellows aren't talented—I'm confident that they're talented; or that Horizon isn't good at placing fellows into important positions—it seems to have a good track record of doing that. My concern is more that the fellows might not use their positions to reduce x-risk. The public outputs of fellows are more relevant to that concern, I think.