Hide table of contents

Democratising AI alignment has been put forward as a way of addressing a concerning question facing alignment research thus far: alignment to whom? Whose values are the blueprint for ensuring AI safety? Opening alignment strategy up to public input could offer one approach to issues associated with essentially leaving such crucial decisions up to a handful of tech oligarchs. But while the idea of democratising alignment is compelling, it is fraught with risks. Without careful design, it could lead to poorly aligned AI shaped by populist whims, misinformation, or hasty consensus-building.

Here, I attempt to sketch a framework that acknowledges these tensions and explores a pragmatic, phased approach to democratising alignment.

The Dangers of Oversimplified Democratisation

Efforts to include public input can backfire if not structured carefully:

  • Populist Drift: Alignment decisions made via broad, unfiltered public polling risk producing lowest-common-denominator AI behaviour or reinforcing harmful cultural biases.
  • Speed vs. Scrutiny: Slow, consultative decision-making could cause alignment-focused labs to fall behind competitors who skip safety checks.
  • Fragmentation Risk: Misaligned inputs across different countries or demographics could produce a patchwork of incompatible models or governance rules. This could undermine efforts to create consistent safety norms and open up opportunities for regulatory arbitrage, where bad actors exploit the least restrictive jurisdiction to deploy risky AI systems. Additionally, OpenAI’s grant programme surfaced issues in achieving true diversity across linguistic and digital divides, which risks skewing results in favour of more digitally connected and typically more optimistic participants.
  • Polarisation & Misinformation: Like any other political issue, alignment deliberations are vulnerable to ideological echo chambers and fact-free debates. For example, Anthropic’s public constitution experiment revealed distinct value clusters that often diverged strongly on contentious issues, illustrating how polarisation can shape the outcome of alignment processes. Notably, even when trained on public input, the AI exhibited lower bias, suggesting that democratic participation can improve inclusion—but also that navigating strong disagreement is complex and sensitive to group composition and methodology.

At the same time, overcorrecting toward purely technocratic control introduces its own risks: it may overlook ethical and social dimensions, lack legitimacy, and deepen public distrust. A stakeholder-based analysis suggests that because AI affects everyone—directly or indirectly—everyone could have a valid stake in its governance.

A Structured Approach to Democratic Input

Instead of abandoning democratisation due to these risks, we could experiment with institutional and technological systems that help mitigate them:

Layered Participation and Expertise Filters

Democratic input in AI alignment could benefit from balancing openness with structured access. Participation might be grounded in basic AI literacy, potentially via prerequisite courses or orientations in alignment ethics.

Democratisation also needs to grapple with AI’s unusual access profile: models are widely used, increasingly modifiable, and deeply impactful—meaning governance mechanisms could include not only developers, but also affected publics. Multi-phase systems might allow broad citizen input to shape value priorities (e.g., "autonomy over paternalism"), followed by expert interpretation into technical updates—subject to public audit. Expert override mechanisms for safety-critical cases could remain, but perhaps require justification similar to judicial dissent.

Fast Action Under Democratic Constraints

Democratic systems might require mechanisms to support rapid action, particularly in the context of AI. Companies could retain limited emergency powers within pre-agreed value frameworks. An independent, publicly accountable oversight board could retrospectively review such decisions and respond accordingly. This fast-slow hybrid draws on precedents like the War Powers Act and is reflected in ideas from collective governance prototypes.

Misinformation and Polarisation Safeguards

Deliberative systems could be made more robust by introducing a set of safeguards. Public deliberation might be preceded by curated briefing materials vetted by interdisciplinary panels, ideally outside of corporate influence. Platforms such as Pol.is may help identify areas of consensus by clustering viewpoints, limiting amplification of polar extremes. AI tools could assist in surfacing misinformation without censoring it, using flags backed by independent fact-checking sources.

Separating participatory inputs (like large-scale surveys) from structured, smaller-scale deliberations could help maintain clarity, as argued in Lucile Ter-Minassian’s framework. Given that public values shift over time, these systems might be designed to accommodate ongoing engagement. And as Anthropic’s findings show, even structured input produces distinct normative clusters, so consensus shouldn’t be forced where genuine disagreement exists.

Regional Diversity with Global Consistency

AI governance could differentiate between universal terminal values (e.g., safety, dignity) and culturally specific instrumental preferences (e.g., preferences around levels of AI autonomy in scientific research, global coordination, or information access). A Global Core Values Charter might be developed through multinational citizen panels, while more granular behaviours could be guided through co-design tools that let users set their preferences within safety constraints.

Countering Power Asymmetries

Given the imbalance between the public and powerful AI labs, oversight could be strengthened by establishing independent review boards empowered to audit how public input is handled and issue binding recommendations. Escalation mechanisms—for example, enabling a small but significant portion of participants to trigger regulatory review—might support accountability. Since legitimacy stems in part from inclusion, participation models could be designed to rebuild trust in institutions. Incentives—such as those trialled in OpenAI’s grant programme, including payment for underrepresented creative contributions—might help broaden input.

Resilience to Trolling

To protect democratic input from being undermined, systems could combine lightweight identity verification (e.g., proof-of-personhood without breaching anonymity) with “earned trust” reputation systems. Influence might grow with constructive engagement, and be filtered through cluster-mapping rather than raw vote tallies. Moderation frameworks could be transparent and supported by AI to detect and contain bad-faith behaviour, without restricting good-faith disagreement.

What About Dead Internet Bots?

This remains an open challenge. Preventing large-scale manipulation by non-human actors could require governance and technical innovation beyond what currently exists. It may be one of the most critical unresolved risks to meaningful democratic input.


Democratising AI alignment need not mean opening the floodgates uncritically. Rather, it could involve building layered, inclusive, and technically informed processes that allow public input to shape values—without sacrificing safety or coherence. The real risk may lie less in trying to democratise than in doing so carelessly—or not at all.

I invite feedback, criticism, and further suggestions to improve or test this proposed framework.

2

0
0

Reactions

0
0

More posts like this

Comments2


Sorted by Click to highlight new comments since:

I appreciate this! I don't feel though that the article addresses the possibility of democratising alignment (or, as Toner says, 'steerability').

More from Lloy2
Curated and popular this week
 ·  · 47m read
 · 
Thank you to Arepo and Eli Lifland for looking over this article for errors.  I am sorry that this article is so long. Every time I thought I was done with it I ran into more issues with the model, and I wanted to be as thorough as I could. I’m not going to blame anyone for skimming parts of this article.  Note that the majority of this article was written before Eli’s updated model was released (the site was updated june 8th). His new model improves on some of my objections, but the majority still stand.   Introduction: AI 2027 is an article written by the “AI futures team”. The primary piece is a short story penned by Scott Alexander, depicting a month by month scenario of a near-future where AI becomes superintelligent in 2027,proceeding to automate the entire economy in only a year or two and then either kills us all or does not kill us all, depending on government policies.  What makes AI 2027 different from other similar short stories is that it is presented as a forecast based on rigorous modelling and data analysis from forecasting experts. It is accompanied by five appendices of “detailed research supporting these predictions” and a codebase for simulations. They state that “hundreds” of people reviewed the text, including AI expert Yoshua Bengio, although some of these reviewers only saw bits of it. The scenario in the short story is not the median forecast for any AI futures author, and none of the AI2027 authors actually believe that 2027 is the median year for a singularity to happen. But the argument they make is that 2027 is a plausible year, and they back it up with images of sophisticated looking modelling like the following: This combination of compelling short story and seemingly-rigorous research may have been the secret sauce that let the article to go viral and be treated as a serious project:To quote the authors themselves: It’s been a crazy few weeks here at the AI Futures Project. Almost a million people visited our webpage; 166,00
 ·  · 8m read
 · 
Note: This post was crossposted from the Open Philanthropy Farm Animal Welfare Research Newsletter by the Forum team, with the author's permission. The author may not see or respond to comments on this post. ---------------------------------------- > Despite setbacks, battery cages are on the retreat My colleague Emma Buckland contributed (excellent) research to this piece. All opinions and errors are mine alone. It’s deadline time. Over the last decade, many of the world’s largest food companies — from McDonald’s to Walmart — pledged to stop sourcing eggs from caged hens in at least their biggest markets. All in, over 2,700 companies globally have now pledged to go cage-free. Good things take time, and companies insisted they needed a lot of it to transition their egg supply chains — most set 2025 deadlines to do so. Over the years, companies reassured anxious advocates that their transitions were on track. But now, with just seven months left, it turns out that many are not. Walmart backtracked first, blaming both its customers and suppliers, who “have not kept pace with our aspiration to transition to a full cage-free egg supply chain.” Kroger soon followed suit. Others, like Target, waited until the last minute, when they could blame bird flu and high egg prices for their backtracks. Then there are those who have just gone quiet. Some, like Subway and Best Western, still insist they’ll be 100% cage-free by year’s end, but haven’t shared updates on their progress in years. Others, like Albertsons and Marriott, are sharing their progress, but have quietly removed their pledges to reach 100% cage-free. Opportunistic politicians are now getting in on the act. Nevada’s Republican governor recently delayed his state’s impending ban on caged eggs by 120 days. Arizona’s Democratic governor then did one better by delaying her state’s ban by seven years. US Secretary of Agriculture Brooke Rollins is trying to outdo them all by pushing Congress to wipe out all stat
 ·  · 13m read
 · 
  There is dispute among EAs--and the general public more broadly--about whether morality is objective.  So I thought I'd kick off a debate about this, and try to draw more people into reading and posting on the forum!  Here is my opening volley in the debate, and I encourage others to respond.   Unlike a lot of effective altruists and people in my segment of the internet, I am a moral realist.  I think morality is objective.  I thought I'd set out to defend this view.   Let’s first define moral realism. It’s the idea that there are some stance independent moral truths. Something is stance independent if it doesn’t depend on what anyone thinks or feels about it. So, for instance, that I have arms is stance independently true—it doesn’t depend on what anyone thinks about it. That ice cream is tasty is stance dependently true; it might be tasty to me but not to you, and a person who thinks it’s not tasty isn’t making an error. So, in short, moral realism is the idea that there are things that you should or shouldn’t do and that this fact doesn’t depend on what anyone thinks about them. So, for instance, suppose you take a baby and hit it with great force with a hammer. Moral realism says: 1. You’re doing something wrong. 2. That fact doesn’t depend on anyone’s beliefs about it. You approving of it, or the person appraising the situation approving of it, or society approving of it doesn’t determine its wrongness (of course, it might be that what makes its wrong is its effects on the baby, resulting in the baby not approving of it, but that’s different from someone’s higher-level beliefs about the act. It’s an objective fact that a particular person won a high-school debate round, even though that depended on what the judges thought). Moral realism says that some moral statements are true and this doesn’t depend on what people think about it. Now, there are only three possible ways any particular moral statement can fail to be stance independently true: 1. It’s