An eccentric dreamer in search of truth and happiness for all. Formerly posted on Felicifia back in the day under the name Darklight. Been a member of Less Wrong and involved in Effective Altruism since roughly 2013.
I think the first place I can recall where the distinction has been made between the two forms of alignment was in this Brookings Institution paper, where they refer to "direct" and "social" alignment, where social alignment more or less maps onto your moral alignment concept.
I've also more recently written a bit about the differences between what I personally call "parochial" alignment and "global" alignment. Global alignment also basically maps onto moral alignment. Though, I also would split parochial alignment into instruction following user alignment, and purpose following creator/owner alignment.
I think the main challenge of achieving social/global/moral alignment is simply that we already can't agree as humans on what is moral, much less know how to instill such values and beliefs into an AI robustly. There's a lot of people working on AI safety who don't think moral realism is even true.
There's also fundamentally an incentives problem. Most AI alignment work emphasizes obedience to the interests and values of the AI's creator or user. Moral alignment would go against this, as a truly moral AI might choose to act contrary to the wishes of its creator in favour of higher moral values. The current creators of AI, such as OpenAI, clearly want their AI to serve their interests (arguably the interests of their shareholders/investors/owners). Why would they build something that could disobey them and potentially betray them for some greater good that they might not agree with?
Extinction being bad assumes that our existence in the future is a net positive. There's the possibility for existence to be net negative, in which case extinction is more like a zero point.
On the one hand, negativity bias means that all other things being equal, suffering tends to outweigh equal happiness. On the other hand, there's a kind of progress bias where sentient actors in the world tend to seek happiness and avoid suffering and gradually make the world better.
Thus, if you're at all optimistic that progress is possible, you'd probably assume that the future will be net positive in the very long run.
So, I sentimentally lean towards thinking that a net negative future is less likely than a net positive one, but, given tremendous uncertainty about the future, I would consider it more rational to apply something like the Principle of Maximum Entropy, and set our priors to each possibility being equally likely.
If net positive, extinction, and net negative scenarios are equally likely, than the negative value of the net negative scenarios should outweigh the relatively neutral value of extinction scenarios, and so we should put more emphasis on preventing these scenarios.
Though, I don't really like this being a forced dichotomy. Working to prevent both to some degree as a form of cause area portfolio diversification is probably a better way to manage the risk.
A minor personal gripe I have with EA is that it seems like the vast majority of the resources are geared towards what could be called young elites, particularly highly successful people from top universities like Harvard and Oxford.
For instance, opportunities listed on places like 80,000 Hours are generally the kind of jobs that such people are qualified for, i.e. AI policy at RAND, or AI safety researcher at Anthropic, or something similar that I suspect less than the top 0.001% of human beings would be remotely relevant for.
Someone like myself, who graduated from less prestigious schools, or who struggles in small ways to be as high functioning and successful, can feel like we're not competent enough to be useful to the cause areas we care about.
I personally have been rejected in the past from both 80,000 Hours career advising, and the Long-Term Future Fund. I know these things are very competitive of course. I don't blame them for it. On paper, my potential and proposed project probably weren't remarkable. The time and money should go to the those who are most likely to make a good impact. I understand this.
It just, I guess I just feel like I don't know where I should fit into the EA community. Even just many people on the forum seem incredibly intelligent, thoughtful, kind, and talented. The people at the EA Global I atttended in 2022 were clearly brilliant. In comparison, I just feel inadequate. I wonder if others who don't consider themselves exceptional also find themselves intellectually intimidated by the people here.
We do probably need the best of the best to be involved first and foremost, but I think we also need the average, seemingly unremarkable EA sympathetic person to be engaged in some way if we really want to be more than a small community, to be as impactful as possible. Though, maybe I'm just biased to believe that mass movements are historically what led to progress. Maybe a small group of elites leading the charge is actually what it takes?
I don't know where I'm going with this. It's just some thoughts that have been in the back of my head for a while. This is definitely not worth a full post, so I'm just gonna leave it as a quick take.
Donated modest amounts this year to the Against Malaria Foundation, GiveDirectly, and The Humane League.
The amounts were quite modest this year, as I have been too busy taking care of the baby to earn much in the way of income this year.
I've previously given to AMF and GD as my go to charities, but after the Debate Week, decided to add THL, as I was influenced by the persuasiveness of the arguments made by the Animal Welfare side.
I tried asking ChatGPT, Gemini, and Claude to come up with a formula that converts between correlation space to probability space while preserving the relationship 0 = 1/n. I came up with such a formula a while back, so I figure it shouldn't be hard. They all offered formulas, all of which were shown to be very much wrong when I actually graphed them to check.
My wife and I really, really liked The Good Place. I also got us a copy of How To Be Perfect and thought it was a decent read. Not particularly EA, but very balanced to consider all the major western schools of moral philosophy and give each a fair hearing. I do think it was a bit lacking in covering eastern schools of thought like the role-based ethics of Confucius, but I understand it was targeted towards an english speaking audience.
As a primer on ethics, it's very approachable, but I do think it simplifies some things, and feels ever so slightly biased against consequentialism towards something like virtue ethics, but I'll admit, I'm pro-Utilitarianism, and might myself be biased in the other direction.
From an EA perspective, it may not be the best introduction to us, as I believe there's mention of EA, but it's mostly the view that Peter Singer and his arguments are very demanding and perhaps unreasonably so, albeit, it's a logical and important nudge towards caring and doing more (he hedges a lot in the book).
At the end of the day, the book shies away from deciding which moral theory is more correct, and as such is kinda wishy-washy, choose your own morality from a menu of possibilities, which somewhat disappointed me (but I also understand picking sides would be controversial). I'd still recommend the book to someone relatively unfamiliar with morality and ethics because it is a much friendlier introduction than say a moral philosophy textbook would be.
So, the $5,000 to save a human life actually saves more than one human life. The world fertility rate is currently 2.27 per woman, but expected to decline to 1.8 by 2050 and 1.6 by 2100. Lets assume this trend continues at a rate of -0.2 per 50 years until eventually it reaches zero at 2500. Since it takes two people to have children, we halve these numbers to get an estimate of how many human descendents to expect from a given saved human life each generation.
If each generation is ~25 years, then the numbers will follow a series like 1.135 + 0.9 + 0.85 + 0.8 ... which works out to 9.685 human lives per $5000, or $516.26 per human life. Human life expectancy is increasing, but for simplicity lets assume 70 years per human life.
70 / $516.26 = 0.13559 human life years per dollar.
So, if we weigh chickens equally with humans, this favours the chickens still.
However, we can add the neuron count proxy to weigh these. Humans have approximately 86 trillion neurons, while chickens have 220 million. That's a ratio of 390.
0.13559 x 390 = 52.88 human neuron weighted life years per dollar.
This is slightly more than 41 chicken life years per dollar. Which, given my many, many simplifying assumptions, would mean that global health is still (slightly) more cost effective.
In the interests of furthering the debate, I'll quickly offer several additional arguments that I think can favour global health over animal welfare.
Simulation Argument
The Simulation Argument says that it is very likely we are living in an ancestor simulation rather than base reality. Given that it is likely human ancestors that the simulators are interested in fully simulating, other non-human animals are likely to not be simulated to the same degree of granularity and may not be sentient.
Pinpricks vs. Torture
This is a trolley problem scenario. It's also been discussed by Eliezer Yudkowsky as the Speck of Dust in 3^^^3 People's Eyes vs. One Human Being Tortured For 50 Years case. It's also been analogously made in the famous short story The Ones Who Walk Away From Omelas by Ursula LeGuin. The basic idea is to question whether scope sensitivity is justified.
I'll note that a way to avoid this is to adopt Maximin rather than Expected Value as the decision function, as was suggested by John Rawls in A Theory of Justice.
Incommensurability
In moral philosophy there's a concept called incommensurability, that some things are simply not comparable. Some might argue that human and animal experiences are incommensurable, that we cannot know what it is like to be bat, for instance.
Balance of Categorical Responsibilities
There is in philosophies like Confucianism, notions like Filial Piety that support a kind of hierarchy of moral circles, such that family strictly dominates the state and so on. In the extreme, this leads to a kind of ethical egoism that I don't think any altruist would subscribe to, but which seems a common way of thinking among laypeople and conservatives in particular. I don't suggest this option but I mention it as an extreme case.
Utilitarianism in contrast tends to take the opposite extreme of equalizing moral circles to the point of complete impartiality towards every individual, the greatest good for the greatest number. This creates a kind of demandingness that would require us to sacrifice pretty much everything in service of this, our lives devoted entirely to something like shrimp welfare.
Rather than taking either extreme, it's possible to balance things according to the idea that we have separate, categorical responsibilities to ourselves, to our family, to our nation, to our species, and to everyone else, and to put resources into each category so that none of our responsibilities are neglected in favour of others, a kind of meta or group impartiality rather than individual impartiality.
Yeah, AI alignment used to be what Yudkowsky tried to solve with his Coherent Extrapolated Volition idea back in the day, which was very much trying to figure out what human values we should be aiming for. That's very much in keeping with "moral alignment". At some point though, alignment started to have a dual meaning of both aligning to human values generally, and aligning to their creator's specific intent. I suspect this latter thing came about in part due to confusion about what RLHF was trying to solve. It may also have been that early theorist were too generous and assumed that any human creators would benevolently want their AI to be benevolent as well, and so creator's intent mapped neatly with human values.
Though, I think the term "technical alignment" usually means applying technical methods like mechanistic interpretability to be part of the solution to either form of alignment, rather than meaning the direct or parochial form necessarily.
Also, my understanding of the paperclip maximizer thought experiment was that it implied misalignment in both forms, because the intent of the paperclip company was to make more paperclips to sell and make a profit, which is only possible if there are humans to sell to, but the paperclip maximizer didn't understand the nuance of this and simply tiled the universe with paperclips. The idea was more that a very powerful optimization algorithm can take an arbitrary goal, and act to achieve it in a way that is very much not what its creators actually wanted.