I recently finished leading an AI Safety Camp project on Wise AI Advisors[1] (my team included Chris Cooper, Matt Hampton, and Richard Kroon). Since we want to share our work in an orderly fashion, I’m launching Wise AI Wednesdays. Each Wednesday[2], I (or one of my teammates) will be sharing a post, initially drawn from our AI Safety Camp outputs, but later including shifting to include future work and outputs summaries or commentary on related research. I’m hoping that a regular posting schedule will help cultivate Wise AI/Wise AI Advisors as a subfield of AI Safety.
This inaugural post provides an update on how my views on AI and wisdom have changed since I won 3rd prize in the Automation of Wisdom and Philosophy Competition. My views have evolved considerably since then, so I thought it was worth listing a number of updates I've made:
- Broadening My Focus: I had originally labelled my research direction as Wise AI Advisors via Imitation Learning. While I still see this as a particularly promising approach, I’m now interested in Wise AI Advisors more broadly. Many different reasonable-sounding paths exist; it would be arrogant to believe I’ve found the One True Path without much more investigation. I’m now more inclined to follow a 'let a thousand flowers bloom' path.
- More Favourable Funding Landscape: There seems to be more desire to fund work adjacent to this space than I had anticipated. For example: Cosmos x FIRE Truth-seeking Round and Fellowship on AI for Human Reasoning. I initially thought significant effort would be needed to convince funders, but it now seems quite possible to secure funding for Wise AI projects, even though you might have to choose a project that intersects with a funder's interest level.
- Increased Emphasis on Field-Building: Consequently, my focus has shifted from solely direct research progress to a combination of research and field-building. While I'm still defining what this entails, activities like establishing the case for Wise AI, exploring theories of impact, and identifying concrete, high-priority projects seem important right now.
- Expanded View of Possible Contributions: My ideas about what kind of work might be useful have broadened significantly thanks to AI Safety Camp and conversations with others. I now see a much wider range of ways people can contribute (see my draft post with project proposals).
- Importance of Simultaneous Pursuit of Both Theory and Practice: I now believe it’s crucial to have people pursuing both theory and practice simultaneously and collaboratively. While it can be tempting to delay attempting to implement anything until all the theoretical building blocks are in place, if you spend all your time in ‘theoryland’, it’s extremely easy to overlook pragmatic challenges—challenges that could easily turn out to be make-or-break.
- Subjective Research as Underrated: I feel that exploratory, subjective research is often underrated—especially in the early stages of a field. Initial exploration needs a 'lay of the land', which traditional scientific research can struggle with because it’s hard to measure many bits[3]. At first, bandwidth matters more than rigor—but the balance should shift as the field matures.
- Exploration vs Critique: I’ve recently been wondering whether research into a new sub-field is best accomplished with explicit phases (Obviously, past a certain point, such coordination might be impossible, but I’m discussing the ideal rather than pragmatics):
- Phase 1: Exploration: Focus on opening up an area of exploration, with the role of critique mainly being about what to prioritise.
- Phase 2: Critique: Once reasonable proposals exist, in-depth criticism becomes vital to identify any potentially fatal flaws.
- Phase 3: Rebuilding/repair: Attempting to fix or patch any flaws exposed in the previous phase. Optimism is important in this phase to avoid prematurely ruling out potential fixes.
- Phase 4 and further: Continue alternating between critique and repair.
- The Value of Dot Points: I wasn’t originally planning to continue writing up more bulleted lists, but I'm now much more willing to stand behind the value of this format in particular circumstances. They are useful as previews, help get information out quickly, and suit posts like this one that provide multiple quick updates rather than a single, in-depth topic.
Thanks for reading and see you next Wednesday!
(PS. if you have a draft post that you'd like to be shared as part of our posting schedule, feel free to reach out).
Previous posts:
- Summary: "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al.
- My submission to the AI Impacts Essay Competition on the Automation of Wisdom and Philosophy:
- ^
Originally Wise AI Advisors via Imitation Learning. As I say further down, I pivoted towards “let a thousand flowers bloom” rather than one particular approach.
- ^
It is still Wednesday for US night owls.
- ^
That said, it might be worthwhile doing some more objective research earlier on if the acquisition cost of knowledge is low enough or if you’re worried that once schools of thought form, people won’t update on empirical evidence.