Lizka

Researcher @ Forethought
17021 karmaJoined Working (0-5 years)

Bio

I'm a researcher at Forethought; before that, I ran the non-engineering side of the EA Forum (this platform), ran the EA Newsletter, and worked on some other content-related tasks at CEA. [More about the Forum/CEA Online job.]

Selected posts

Background

I finished my undergraduate studies with a double major in mathematics and comparative literature in 2021. I was a research fellow at Rethink Priorities in the summer of 2021 and was then hired by the Events Team at CEA. I later switched to the Online Team. In the past, I've also done some (math) research and worked at Canada/USA Mathcamp.

Sequences
10

Celebrating Benjamin Lay (1682 - 1759)
Donation Debate Week (Giving Season 2023)
Marginal Funding Week (Giving Season 2023)
Effective giving spotlight - classic posts
Selected Forum posts (Lizka)
Classic posts (from the Forum Digest)
Forum updates and new features
Winners of the Creative Writing Contest
Winners of the First Decade Review
Load more (9/10)

Comments
556

Topic contributions
267

I think this is a good question, and it's something I sort of wanted to look into and then didn't get to! (If you're interested, I might be able to connect you with someone/some folks who might know more, though.)

Quick general takes on what private companies might be able to do to make their tools more useful on this front (please note that I'm pretty out of my depth here, so take this with a decent amount of salt -- and also this isn't meant to be prioritized or exhaustive): 

  • Some of the vetting/authorization processes (e.g. FedRAMP) are burdensome, and sometimes companies do just give up/don't bother (see footnote 12), which narrows the options for agencies; going through this anyway could be very useful
  • Generally lowering costs for tech products can make a real difference for whether agencies will adopt them -- also maybe open source products are likelier to be used(?) (and there's probably stuff like which chips are available, which systems staff are used to, what tradeoffs on speed vs cost or similar make sense...)
  • Security is useful/important, e.g. the tool/system can be run locally, can be fine-tuned with sensitive data, etc. (Also I expect things like "we can basically prove that the training / behavior of this model satisfies [certain conditions]" will increasingly matter — with conditions like "not trained on X data" or "could not have been corrupted by any small group of people" — but my understanding/thinking here is very vague!)
  • Relatedly I think various properties of the systems' scaffolding will matter; e.g. "how well the tool fits with existing and future systems" — so modularity and interoperability[1] (in general and with common systems) are very useful — and "can this tool be set up ~once but allow for different forms of access/data" (e.g. handle differences in who can see different kinds of data, what info should be logged, etc.)

(Note also there's a pretty huge set of consultancies that focus on helping companies sell to the government, but the frame is quite different.)

And then in terms of ~market gaps, I'm again very unsure, but expect that (unsurprisingly) lower-budget agencies will be especially undersupplied — in particular the DOD has a lot more funding and capacity for this kind of thing — so building things for e.g. NIST could make sense. (Although it might be hard to figure out what would be particularly useful for agencies like NIST without actually being at NIST. I haven't really thought about this!)

  1. ^

    I haven't looked into this at all, but given the prevalence of Microsoft systems (Azure etc.) in the US federal government (which afaik is greater than what we see in the UK), I wonder if Microsoft's relationship with OpenAI explains why we have ChatGPT Gov in the US, while Anthropic is collaborating with the UK government https://www.anthropic.com/news/mou-uk-government 

the main question is how high a priority this is, and I am somewhat skeptical it is on the ITN pareto frontier. E.g. I would assume plenty of people care about government efficiency and state capacity generally, and a lot of these interventions are generally about making USG more capable rather than too targeted towards longtermist priorities.

Agree that "how high-priority should this be" is a key question, and I'm definitely not sure it's on the ITN pareto frontier! (Nice phrase, btw.) 

Quick notes on some things that raise the importance for me, though:

  1. I agree lots of people care about government efficiency/ state capacity — but I suspect few of them are seriously considering the possibility of transformative AI in the near future, and I think what you do to ~boost capacity looks pretty different in that world
  2. Also/relatedly, my worldview means I have extra reasons to care about state capacity, and given my worldview is unusual that means I should expect the world is underinvesting in state capacity (just like most people would love to see a world with fewer respiratory infections, but tracking the possibility of a bioengineered pandemic means I see stuff like far-UVC/better PPE/etc. as higher value)
    1. More generally I like the "how much more do I care about X" frame — see this piece from 2014
    2. (It could also be a kind of public good.)
  3. In particular, it seems like a *lot* of the theory of change of AI governance+ relies on competent/skillful action/functioning by the US federal government, in periods where AI is starting to radically transform the world (e.g. to mandate testing and be able to tell if that mandate isn't being followed!), and my sense is that this assumption is fragile/the govt may very well not actually be sufficiently competent — so we better be working on getting there, or investing more in plans that don't rely on this assumption

And I'm pretty worried that a decent amount of work aimed at mitigating the risks of AI could end up net-negative (for its own goals) by not tracking this issue and thus not focusing enough on the interventions that are actually worth pursuing --- further harming government AI adoption & competence / capacity in the process (e.g. I think some of the OMB/EO guidance from last year looked positive to me before I dug into this, and now looks negative). So I'd like to nudge some people who work on issues related to existential risk (and government) away from a view like: "all AI is scary/bad, anything that is 'pro-AI' increases existential risk, if this bundle of policies/barriers inhibits a bunch of different AI things then that's probably great even if I think only a tiny fraction is truly (existentially) risky", etc. 

--

this felt like neither the sort of piece targeted to mainstream US policy folks, nor that convincing for why this should be an EA/longtermist focus area. 

Totally reasonable reaction IMO. To a large extent I see this as a straightforward flaw of the piece & how I approached it (partly due to lack of time - see my reply to Michael above), although I'll flag that my main hope was to surface this to people who are in fact kind of in between -- e.g. folks at think tanks that do research on existential security and have government experience/expertise.

--

I'm unconvinced that e.g. OP should spin up a grantmaker focused on this (not that you were necessarily recommending this).

I am in fact not recommending this! (There could be specific interventions in the area that I'd see as worth funding, though, and it's also related to other clusters where something like the above is reasonable IMO.)

--

Also, a few reasons govts may have a better time adopting AI come to mind:

  • Access to large amounts of internal private data
  • Large institutions can better afford one-time upfront costs to train or finetune specialised models, compared to small businesses

But I agree the opposing reasons you give are probably stronger.

The data has to be accessible, though, and this is a pretty big problem. See e.g. footnote 17. 

I agree that a major advantage could be that the federal government can in fact move a lot of money when ~it wants to, and could make some (cross-agency/...) investments into secure models or similar, although my sense is that right now that kind of thing is the exception/aspiration, not the rule/standard practice. (Another advantage is that companies do want to maintain good relationships with the government/admin, and might thus invest more in being useful. Also there are probably a lot of skilled people who are willing to help with this kind of work, for less personal gain.)

--

If only this were how USG juggled its priorities!

🙃 (some decision-makers do, though!)

I imagine there might be some very clever strategies to get a lot of the benefits of AI without many of the normal costs of integration.

For example:

  1. The federal government makes heavy use of private contractors. These contractors are faster to adopt innovations like AI.
  2. There are clearly some subsets of the government that matter far more than others. And there are some that are much easier to improve than others.
  3. If AI strategy/intelligence is cheap enough, most of the critical work can be paid for by donors. For example, we have a situation where there's a think tank that uses AI to figure out the best strategies/plans for much of the government, and government officials can choose to pay attention to this.

I'd be excited to see more work in this direction! 

Quick notes: I think (1) is maybe the default way I expect things to go fine (although I have some worries about worlds where almost all US federal govt AI capacity is via private contractors). (2) seems right, and I'd want someone who has (or can develop) a deeper understanding of this area than me to explore this. Stuff like (3) seems quite useful, although I'm worried about things like ensuring access to the right kind of data and decision-makers (but partnerships / a mix of (2) and (3) could help). 

(A lot of this probably falls loosely under "build capacity outside the US federal government" in my framework, but I think the lines are very blurry / a lot of the same interventions help with appropriate use/adoption of AI in the government and external capacity. )

all very similar to previous thinking on how forecasting can be useful to the government

I hadn't thought about this — makes sense, and a useful flag, thank you! (I might dig into this a bit more.)

 Thanks for this comment! I don’t view it as “overly critical.”

Quickly responding (just my POV, not Forethought’s!) to some of what you brought up ---

(This ended up very long, sorry! TLDR: I agree with some of what you wrote, disagree with some of the other stuff / think maybe we're talking past each other. No need to respond to everything here!)

A. Motivation behind writing the piece / target audience/ vibe / etc.

Re:

…it might help me if you explained more about the motivation [behind writing the article] [...] the article reads like you decided the conclusion and then wrote a series of justifications

 I’m personally glad I posted this piece, but not very satisfied with it for a bunch of reasons, one of which is that I don’t think I ever really figured out what the scope/target audience should be (who I was writing for/what the piece was trying to do). 

So I agree it might help to quickly write out the rough ~history of the piece:

  • I’d started looking into stuff related to “differential AI development” (DAID), and generally exploring how the timing of different [AI things] relative to each other could matter.
  • My main focus quickly became exploring ~safety-increasing AI applications/tools — Owen and I recently posted about this (see the link).
  • But I also kept coming back to a frame of “oh crap, who is using AI how much/how significantly is gonna matter an increasing amount as time goes on. I expect adoption will be quite uneven — e.g. AI companies will be leading the way — and some groups (whose actions/ability to make reasonable decisions we care about a lot) will be left behind.”
    • At the time I was thinking about this in terms of “differential AI development and diffusion
  • IIRC I soon started thinking about governments here; I had the sense that government decision-makers were generally slow on tech use, and I was also using “which types of AI applications will not be properly incentivized by the market” as a way to think about which AI applications might be easier to speed up. (I think we mentioned this here.)
  • This ended up taking me on a mini deep dive on government adoption of AI, which in turn increasingly left me with the impression that (e.g.) the US federal government would either (1) become increasingly overtaken from within by an unusually AI-capable group (or e.g. the DOD), (2) be rendered increasingly irrelevant, leaving (US) AI companies to regulate themselves and likely worsening its ability to deal with other issues, or (3) somehow in fact adopt AI, but likely in a chaotic way that would be especially dangerous (because things would go slowly until a crisis forced a ~war-like undertaking).
  • I ended up poking around in this for a while, mostly as an aside to my main DAID work, feeling like I should probably scope this out and move on. (The ~original DAID memos I’d shared with people discussed government AI adoption.)
  • After a couple of rounds of drafts+feedback I got into a “I should really publish some version of this that I believe and seems useful and then get back to other stuff; I don’t think I’m the right person to work a lot more on this but I’m hoping other people in the space will pick up whatever is correct here and push it forward” mode - and ended up sharing this piece. 

In particular I don’t expect (and wasn’t expecting) that ~policymakers will read this, but hope it’s useful for people at relevant think tanks or similar who have more government experience/knowledge but might not be paying attention to one “side” of this issue or the other. (For instance, I think a decent fraction of people worried about existential risks from advanced AI don’t really think about how using AI might be important for navigating those risks, partly because all of AI kinda gets lumped together).

Quick responses to some other things in your comment that seem kinda related to what I'm responding to in this “motivation/vibe/…” cluster:

 I also found it odd that the report did not talk about extinction risk. In its list of potential catastrophic outcomes, the final item on the list was "Human disempowerment by advanced AI", which IMO is an overly euphemistic way of saying "AI will kill everyone".

We might have notably different worldviews here (to be clear mine is pretty fuzzy!). For one thing, in my view many of the scary “AI disempowerment” outcomes might not in fact look immediately like “AI kills everyone” (although to be clear that is in fact an outcome I’m very worried about), and unpacking what I mean by "disempowerment" in the piece (or trying to find the ideal way to say it) didn't seem productive -- IIRC I wrote something and moved on. I also want to be clear that rogue AI [disempowering] humans is not the only danger I’m worried about, i.e. it doesn’t dominate everything else for me -- the list you're quoting from wasn't an attempt to mask AI takeover, but rather a sketch of the kind of thing I'm thinking about. (Note: I do remember moving that item down the list at some point when I was working on a draft, but IIRC this was because I wanted to start with something narrower to communicate the main point, not because I wanted to de-emphasize ~AI takeover.)

 

By my reading, this article is meant to be the sort of Very Serious Report That Serious People Take Seriously, which is why it avoids talking about x-risk.

I might be failing to notice my bias, but I basically disagree here --- although I do feel a different version of what you're maybe pointing to here (see next para). I was expecting that basically anyone who reads the piece will already have engaged at least a bit with "AI might kill all humans", and likely most of the relevant audience will have thought very deeply about this and in fact has this as a major concern. I also don't personally feel shy about saying that I think this might happen — although again I definitely don't want to imply that I think this is overwhelmingly likely to happen or the only thing that matters, because that's just not what I believe.

However I did occasionally feel like I was ~LARPing research writing when I was trying to articulate my thoughts, and suspect some of that never got resolved! (And I think I floundered a bit on where to go with the piece when getting contradicting feedback from different people - although ultimately the feedback was very useful.) In my view this mostly shows up in other ways, though. (Related - I really appreciated Joe Carlsmith's recent post on fake thinking and real thinking when trying to untangle myself here.)


 

B. Downside risks of the proposed changes

  1. Making policymakers “more reluctant to place restrictions on AI development...”
    1. I did try to discuss this a bit in the "Government adoption of AI will need to manage important risks" section (and sort of in the "3. Defusing the time bomb of rushed automation" section), and indeed it's a thing I'm worried about.
    2. I think ultimately my view is that without use of AI in government settings, stuff like AI governance will just be ineffective or fall to private actors anyway, and also that the willingness-to-regulate /undue influence dynamics will be much worse if the government has no in-house capacity or is working with only one AI company as a provider.
  2. Shortening timelines by increasing AI company revenue
    1. I think this isn't a major factor here - the govt is a big customer in some areas, but the private sector dominates (as does investment in the form of grants, IIRC)
  3. "The government does not always work in the interests of the people (in fact it frequently works against them!) so making the government more effective/powerful is not pure upside."
     
    1. I agree with this, and somewhat worry about it. IIRC I have a footnote on this somewhere -—I decided to scope this out. Ultimately my view right now is that the alternative (~no governance at least in the US, etc.) is worse. Sort of relatedly, I find the "narrow corridor" a useful frame here -- see e.g. here.)

C. Is gov competence actually a bottleneck?

 I don't think government competence is what's holding us back from having good AI regulations, it's government willingness. I don't see how integrating AI into government workflow will improve AI safety regulations (which is ultimately the point, right?[^1]), and my guess is on balance it would make AI regulations less likely to happen because policy-makers will become more attached to their AI systems and won't want to restrict them.

My view is that you need both, we're not on track for competence, and we should be pretty uncertain about what happens on the willingness side.

D. Michael’s line item responses

1. 

> invest in AI and technical talent

What does that mean exactly? I can't think of how you could do that without shortening timelines so I don't know what you have in mind here.

I’m realizing this can be read as “invest in AI and in technical talent” — I meant “invest in AI talent and (broader) technical talent (in govt).” I’m not sure if this just addresses the comment; my guess is that doing this might have a tiny shortening effect on timelines (but is somewhat unclear, partly because in some cases e.g. raising salaries for AI roles in govt might draw people away from frontier AI companies), but this is unlikely to be the decisive factor. (Maybe related: my view is that generally this kind of thing should be weighed instead of treated as a reason to entirely discard certain kinds of interventions.)

2. 

> Streamline procurement processes for AI products and related tech

I also don't understand this. Procurement by whom, for what purpose? And again, how does this not shorten timelines? (Broadly speaking, more widespread use of AI shortens timelines at least a little bit by increasing demand.)

I was specifically talking about agencies’ procurement of AI products — e.g. say the DOE wants a system that makes forecasting demand easier or whatever; making it easier for them to actually get such a system faster. I think the effect on timelines will likely be fairly small here (but am not sure), and currently think it would be outweighed by the benefits.

3. 

> Gradual adoption is significantly safer than a rapid scale-up.

This sounds plausible but I am not convinced that it's true, and the article presents no evidence, only speculation. I would like to see more rigorous arguments for and against this position instead of taking it for granted.

I’d be excited to see more analysis on this, but it’s one of the points I personally am more confident about (and I will probably not dive in right now). 

4. 

> And in a crisis — e.g. after a conspicuous failure, or a jump in the salience of AI adoption for the administration in power — agencies might cut corners and have less time for security measures, testing, in-house development, etc.

This line seems confused. Why would a conspicuous failure make government agencies want to suddenly start using the AI system that just conspicuously failed? Seems like this line is more talking about regulating AI than adopting AI, whereas the rest of the article is talking about adopting AI.

Sorry, again my writing here was probably unclear; the scenarios I was picturing were more like: 

  • There’s a serious breach - US govt systems get hacked (again) by [foreign nation, maybe using AI] - revealing that they’re even weaker than is currently understood, or publicly embarrassing the admin. The admin pushes for fast modernization on this front.
  • A flashy project isn’t proceeding as desired (especially as things are ramping up), the admin in power is ~upset with the lack of progress, pushes
  • There’s a successful violent attack (e.g. terrorism); turns out [agency] was acting too slowly...
  • Etc.

Not sure if that answers the question/confusion?

5. 

> Frontier AI development will probably concentrate, leaving the government with less bargaining power.

I don't think that's how that works. Government gets to make laws. Frontier AI companies don't get to make laws. This is only true if you're talking about an AI company that controls an AI so powerful that it can overthrow the government, and if that's what you're talking about then I believe that would require thinking about things in a very different way than how this article presents them.

This section is trying to argue that AI adoption will be riskier later on, so the “bargaining power” I was talking about here is the bargaining power of the US federal govt (or of federal agencies) as a customer; the companies it’s buying from will have more leverage if they’re effectively monopolies. My understanding is that there are already situations where the US govt has limited negotiation power and maybe even makes policy concessions to specific companies specifically because of its relationship to those companies — e.g. in defense (Lockheed Martin, etc., although this is also kinda complicated) and again maybe Microsoft.

And: would adopting AI (i.e. paying frontier companies so government employees can use their products) reduce the concentration of power? Wouldn't it do the opposite?

Again, the section was specifically trying to argue that later adoption is scarier than earlier adoption (in this case because there are (still) several frontier AI companies). But I do think that building up internal AI capacity, especially talent, would reduce the leverage any specific AI company has over the US federal government. 

6. 

> It’s natural to focus on the broad question of whether we should speed up or slow down government AI adoption. But this framing is both oversimplified and impractical

Up to this point, the article was primarily talking about how we should speed up government AI adoption. But now it's saying that's not a good framing? So why did the article use that framing? I get the sense that you didn't intend to use that framing, but it comes across as if you're using it.

Yeah, I don't think I navigated this well! (And I think I was partly talking ti myself here.) But  maybe my “motivation” notes above give some context? 
In terms of the specific “position” I in practice leaned into: Part of why I led with the benefits of AI adoption was the sense that the ~existential risk community (which is most of my audience) generally focuses on risks of AI adoption/use/products, and that's where my view diverges more. There's also been more discussion, from an existential risk POV, of the risks of adoption than there has been of the benefits, so I didn't feel that elaborating too much on the risks would be as useful.

7. 

> Hire and retain technical talent, including by raising salaries

I would like to see more justification for why this is a good idea. The obvious upside is that people who better understand AI can write more useful regulations. On the other hand, empirically, it seems that people with more technical expertise (like ML engineers) are on average less in favor of regulations and more in favor of accelerating AI development (shortening timelines, although they usually don't think "timelines" are a thing). So arguably we should have fewer such people in positions of government power.

The TLDR of my view here is something like "without more internal AI/technical talent (most of) the government will be slower on using AI to improve its work & stay relevant, which I think is bad, and also it will be increasingly reliant on external people/groups/capacity for technical expertise --- e.g. relying on external evals, or trusting external advice on what policy options make sense, etc. and this is bad."

8. 

> Explore legal or other ways to avoid extreme concentration in the frontier AI market

[...]

The linked article attached to this quote says "It’s very unclear whether centralizing would be good or bad", but you're citing it as if it definitively finds centralization to be bad.

(The linked article is this one: https://www.forethought.org/research/should-there-be-just-one-western-agi-project )

I was linking to this to point to relevant discussion, not as a justification for a strong claim like “centralization is definitively bad” - sorry for being unclear!

9. 

> If the US government never ramps up AI adoption, it may be unable to properly respond to existential challenges.

What does AI adoption have to do with the ability to respond to existential challenges? It seems to me that once AI is powerful enough to pose an existential threat, then it doesn't really matter whether the US government is using AI internally.

I suspect we may have fairly different underlying worldviews here, but maybe a core underlying belief on my end is that there are things that it's helpful for the government to do before we get to ~ASI, and also there will be AI tools pre ~ASI that are very helpful for doing those things. (Or an alt framing: the world will get ~/fast/complicated/weird due to AI before there’s nothing the US gov could in theory do to make things go better.)

10. 

> Map out scenarios in which AI safety regulation is ineffective and explore potential strategies

I don't think any mapping is necessary. Right now AI safety regulation is ineffective in every scenario, because there are no AI safety regulations (by safety I mean notkilleveryoneism). Trivially, regulations that don't exist are ineffective. Which is one reason why IMO the emphasis of this article is somewhat missing the mark—right now the priority should be to get any sort of safety regulations at all.

I fairly strongly disagree here (with "the priority should be to get any sort of safety regulations at all") but don't have time to get into it, really sorry!

---
 

Finally, thanks a bunch for saying that you enjoyed some of my earlier writing & I changed your thinking on slow vs quick mistakes! That kind of thing is always lovely to hear.

(Posted on my phone— sorry for typos and similar!)

Quick (visual) note on something that seems like a confusion in the current conversation:


Others have noted similar things (eg, and Will’s earlier take on total vs human extinction). You might disagree with the model (curious if so!), but I’m a bit worried that one way or another people are talking past each other (least from skimming the discussion). 

(Commenting via phone, sorry for typos or similar!)

What actually changes about what you’d work on if you concluded that improving the future is more important on the current margin than trying to reduce the chance of (total) extinction (or vice versa)? 

Curious for takes from anyone!

I wrote a Twitter thread that summarizes this piece and has a lot of extra images (I probably went overboard, tbh.) 

I kinda wish I'd included the following image in the piece itself, so I figured I'd share it here:

As AI capabilities rise, AI systems will be responsible for a growing fraction of relevant work

I just want to say that I really appreciated this post — it came at exactly the right time for me and I've referenced it several times since you shared it. 

Follow-up: 

Quick list of some ways benchmarks might be (accidentally) misleading[1]

  1. Poor "construct validity"[2]( & systems that are optimized for the metric)
    1. The connection between what the benchmark is measuring and what it's trying to measure (or what people think it's measuring) is broken. In particular:
    2. Missing critical steps
      1. When benchmarks are trying to evaluate progress on some broad capability (like "engineering" or "math ability" or "planning"), they're often testing specific performance on meaningfully simpler tasks. Performance on those tasks might be missing key aspects of true/real-world/relevant progress on that capability.
    3. Besides the inherent difficulty of measuring the right thing, it's important to keep in mind that systems might have been trained specifically to perform well on a given benchmark.
      1. This is probably a bigger problem for benchmarks that have gotten more coverage.
    4. And some benchmarks have been designed specifically to be challenging for existing leading models, which can make new/other AI systems appear to have made more progress on these capabilities (relative to the older models) than they actually are.
      1. We're seeing this with the "Humanity's Last Exam" benchmark.
    5. ...and sometimes some of the apparent limitations are random or kinda fake, such that a minor improvement appears to lead to radical progress
  2. Discontinuous metrics: (partial) progress on a given benchmark might be misleading.
    1. The difficulty of tasks/tests in a benchmark often varies significantly (often for good reason), and reporting of results might explain the benchmark by focusing on its most difficult tests instead of the ones that the model actually completed.
      1. I think this was an issue for Frontier Math, although I'm not sure how much strongly to discount some of the results as a result.
    2. -> This (along with issues like 1b above, which can lead to saturation) is part of what makes it harder to extrapolate from e.g. plots of progress on certain benchmarks.
  3. Noise & poisoning of the metric: Even on the metric in question, data might have leaked into the training process, the measurement process itself can be easily affected by things like who's running it, comparing performance of different models that were tested slightly differently might be messy, etc. Some specific issues here (I might try to add links later):
    1. Brittleness to question phrasing
    2. Data contamination (discussed e.g. here)
    3. Differences in how the measurement/testing process was actually set up (including how much inference compute was used, how many tries a model was given, access to tools, etc.)
  4. Misc
    1. Selective reporting/measurement (AI companies want to report their successes)
    2. Tasks that appear difficult (because they're hard for humans) might be might be especially easier for AI systems (and vice versa) — and this might cause us to think that more (or less) progress is being made than is true
      1. E.g. protein folding is really hard for humans
      2. This looks relevant, but I haven't read it
    3. Some benchmarks seem mostly irrelevant to what I care about
    4. Systems are tested pre post-training enhancements or other changes

Additions are welcome! (Also, I couldn't quickly find a list like this earlier, but I'd be surprised if a better version than what I have above wasn't available somewhere; I'd love recommendations.)

Open Phil's announcement of their now-closed benchmarking RFP has some useful notes on this, particularly the section on "what makes for a strong benchmark."I also appreciated METR's list of desiderata here.

  1. ^

    To be clear: I'm not trying to say anything on ways benchmarks might be useful/harmful here. And I'm really not an expert. 

  2. ^

    This paper looks relevant but I haven't read it. 

TLDR: Notes on confusions about what we should do about digital minds, even if our assessments of their moral relevance are correct[1]

I often feel quite lost when I try to think about how we can “get digital minds right.” It feels like there’s a variety of major pitfalls involved, whether or not we’re right about the moral relevance of some digital minds.

Digital-minds-related pitfalls in different situations

Reality ➡️

Our perception ⬇️

These digital minds are (non-trivially) morally relevant[2]These digital minds are not morally relevant
We see these digital minds as morally relevant

(1) We’re right

But we might still fail to understand how to respond, or collectively fail to implement that response.

(2) We’re wrong

We confuse ourselves, waste an enormous amount of resources on this[3] (potentially sacrificing the welfare of other beings that do matter morally), and potentially make it harder to respond to the needs of other digital minds in the future (see also).

We don’t see these digital minds as morally relevant

(3) We’re wrong

The default result here seems to be moral catastrophe through ignorance/sleepwalking (although maybe we’ll get lucky).

(4) We’re right

All is fine (at least on this front).

Categories (2) and (3) above are ultimately about being confused about which digital minds matter morally — having an inappropriate level of concern, one way or another. A lot of the current research on digital minds seems to be aimed at avoiding this issue. (See e.g. and e.g..) I’m really glad to see this work; the pitfalls in these categories worry me a lot. 

But even if we end up in category (1) — we realize correctly that certain digital minds are morally relevant — how can we actually figure out what we should do? Understanding how to respond probably involves answering questions like: 

  • What matters for these digital minds? What improves their experience? (How can we tell?)
  • Is their experience overall negative/positive? Should we be developing such minds? (How can we tell??) (How can we ethically design digital minds?)
  • Should we respect certain rights for these digital minds? (..?)
  • ??

Answering these questions seem really difficult. 

In many cases, extrapolating from what we can tell about humans seems inappropriate.[4] (Do digital minds systems find joy or meaning in some activities? Do they care about survival? What does it mean for a digital mind/AI system to be in pain? Is it ok to lie to AI systems? Is it ok to design AI systems that have no ~goals besides fulfilling requests from humans?)

And even concepts we have for thinking about what matters to humans (or animals) often seem ill-suited for helping us with digital minds.[5] (When does it make sense to talk about freedoms and rights,[6] or the sets of (relevant) capabilities a digital mind has, or even their preferences? What even is the right “individual” to consider? Self-reports seem somewhat promising, but when can we actually rely on them as signals about what’s important for digital minds?)

I’m also a bit worried that too much of the work is going towards the question of “which systems/digital minds are morally relevant” vs to the question of “what do we do if we think that a system is morally relevant (or if we’re unsure)?” (Note however that I've read a tiny fraction of this work, and haven't worked in the space myself!) E.g. this paper focuses on the former and closes by recommending that companies prepare to make thoughtful decisions about how to treat the AI systems identified as potentially morally significant by (a) hiring or appointing a DRI (directly responsible individual) for AI welfare and (b) developing certain kinds of frameworks for AI welfare oversight. These steps do seem quite good to me (at least naively), but — as the paper explicitly acknowledges — they’re definitely not sufficient. 

Some of the work that’s apparently focused at the question of "which digital minds matter morally" involves working towards theories of ~consciousness, and maybe that will also help us with the latter question. But I’m not sure. 

(So my quite uninformed independent impression is that it might be worth investing a bit more in trying to figure out what we should do if we do decide that some digital minds might be morally relevant, or maybe what we should do if we find that we’re making extremely little progress on figuring out whether they are.) 


These seem like hard problems/questions, but I want to avoid defeatism. 

I appreciated @rgb‘s closing remark in this post (bold mine): 

To be sure, our neuroscience tools are way less powerful than we would like, and we know far less about the brain than we would like. To be sure, our conceptual frameworks for thinking about sentience seem shaky and open to revision. Even so, trying to actually solve the problem by constructing computational theories which try to explain the full range of phenomena could pay significant dividends. My attitude towards the science of consciousness is similar to Derek Parfit’s attitude towards ethics: since we have only just begun the attempt, we can be optimistic.


Some of the other content I’ve read/skimmed feels like it’s pointing in useful directions on these fronts (and more recommendations are welcome!): 

  1. ^

    I’m continuing my quick take spree, with a big caveat that I’m not a philosopher and haven’t read nearly as much research/writing on digital minds as I want to.

    And I’m not representing Forethought here! I don’t know what Forethought folks think about what I’m writing here.

  2. ^

    The more appropriate term, I think, is “moral patienthood.” And we probably care about the degree of moral consideration that is merited in different cases.

  3. ^

    This paper on “Sharing the world with digital minds” mentions this as one of the many failure modes — note that I’ve only skimmed it. 

  4. ^

     I’m often struck by how appropriate the War with the Newts is as an analogy/illustration/prediction for a bunch of what I’m seeing today, including on issues related to digital minds. But there’s one major way in which the titular newts differ significantly from potential morally relevant digital minds; the newts are quite similar to humans and can, to some extent, communicate their preferences in ways the book’s humans could understand if they were listening. 

    (One potential exception: there’s a tragic-but-brilliant moment in which “The League for the Protection of Salamanders” calls women to action sewing modest skirts and aprons for the Newts in order to appease the Newts’ supposed sense of propriety.)

  5. ^

    This list of propositions touches on related questions/ideas.

  6. ^

     See e.g. here

    Whether instrumentally-justified or independently binding, the rights that some AI systems could be entitled to might be different from the rights that humans are entitled to. This could be because, instrumentally, a different set of rights for AI systems promotes welfare. For example, as noted by Shulman and Bostrom (2021), naively granting both “reproductive” rights and voting rights to AI systems would have foreseeably untenable results for existing democratic systems: if AI systems can copy themselves at will, and every copy gets a vote, then elections could be won via tactical copying. This set of rights would not promote welfare and uphold institutions in the same way that they do for humans. Or AI rights could differ because, independently of instrumental considerations, their different properties entitle them to different rights—analogously to how children and animals are plausibly entitled to different rights than adults.

Load more