Aligned AI is an Oxford based startup focused on applied alignment research. Our goal is to implement scalable solutions to the alignment problem, and distribute these solutions to actors developing powerful transformative artificial intelligence (related Alignment Forum post here).
In the tradition of AI safety startups, Aligned AI will be doing an AMA this week, from today, Tuesday the 1st of March, till Friday the 4th, inclusive. It will be mainly me, Stuart Armstrong, answering these questions, though Rebecca Gorman and Oliver Daniels-Koch may also answer some of them. GPT-3 will not be invited.
From our post introducing Aligned AI:
We think AI poses an existential risk to humanity, and that reducing the chance of this risk is one of the most impactful things we can do with our lives. Here we focus not on the premises behind that claim, but rather on why we're particularly excited about Aligned AI's approach to reducing AI existential risk.
- We believe AI Safety research is bottle-necked by a core problem: how to extrapolate values from one context to another.
- We believe solving value extrapolation is necessary and almost sufficient for alignment.
- Value extrapolation research is neglected, both in the mainstream AI community and the AI safety community. Note that there is a lot of overlap between value extrapolation and many fields of research (e.g. out of distribution detection, robustness, transfer learning, multi-objective reinforcement learning, active reward learning, reward modelling...) which provide useful research resources. However, we've found that we've had to generate our most of the key concepts ourselves.
- We believe value extrapolation research is tractable (and we've had success generating the key concepts).
- We believe distributing (not just creating) alignment solutions is critical for aligning powerful AIs.
Great, thank you for the response.
On (3) — I feel AI safety as it’s pursued today is a bit disconnected from other fields such as neuroscience, embodiment, and phenomenology. I.e. the terms used in AI safety don’t try to connect to the semantic webs of affective neuroscience, embodied existence, or qualia. I tend to take this as a warning sign: all disciplines ultimately refer to different aspects of the same reality, and all conversations about reality should ultimately connect. If they aren’t connecting, we should look for a synthesis such that they do.
That’s a little abstract; a concrete example would be the paper “Dissecting components of reward: ‘liking’, ‘wanting’, and learning” (Berridge et al. 2009), which describes experimental methods and results showing that ‘liking’, ‘wanting’, and ‘learning’ can be partially isolated from each other and triggered separately. I.e. a set of fairly rigorous studies on mice demonstrating they can like without wanting, want without liking, etc. This and related results from affective neuroscience would seem to challenge some preference-based frames within AI alignment, but it feels there‘s no ‘place to put’ this knowledge within the field. Affective neuroscience can discover things, but there’s no mechanism by which these discoveries will update AI alignment ontologies.
It’s a little hard to find the words to describe why this is a problem; perhaps that not being richly connected to other fields runs the risk of ‘ghettoizing‘ results, as many social sciences have ‘ghettoized’ themselves.
One of the reasons I’ve been excited to see your trajectory is that I’ve gotten the feeling that your work would connect more easily to other fields than the median approach in AI safety.