PhD Student / Board Member @ MIT / EA Singapore
198 karmaJoined


AI alignment researcher in the Computational Cognitive Science and Probabilistic Computing Groups at MIT. My research sits at the intersection of AI and cognitive science, asking questions like: How can we specify and perform inference over rich yet structured generative models of human decision-making and value formation, in order to accurately infer human goals and values?

Currently a board member of EA Singapore, formerly co-president of Yale EA (2015-2017).



I also avoid the "Eastern" label for similar reasons that others have raised, but if we're just speaking about the Chinese cultural and ethical context, there is a longstanding practice of giving up family ties in service of more universal end (albeit one that is often met by opposition) -- shaving one's head and becoming a monastic!


Grew up in a Chinese Singaporean family and I always find the "against vegetarianism" thing a strange thing to learn about other Chinese families because a lot of my relatives growing up (grandmother, greatgrandmother, etc) were vegetarian by way of Buddhism! And now my immediate family is mostly vegetarian (only eats fish sometimes), partly to be healthier, and partly because they never liked red meat very much, but I think the Chinese Buddhist cultural and ethical background also helped.

Fair point! Sorry it wasn't the most helpful. My attempt at explaining a bit more below:

Convex sets are just sets where each point in the set can be expressed as weighted sum of the points on the exterior of the set, e.g.: 

ConvexHullMesh—Wolfram Language Documentation

(source: https://reference.wolfram.com/language/ref/ConvexHullMesh.html)

In 1D, convex sets are just intervals, [a, b], and convex sets of probability distributions basically correspond to intervals of probability values, e.g. [0.1, 0.5], which are often called "imprecise probabilities".

Let's generalize this idea to 2D. There are two events, A and B, which I am uncertain about. If I were really confident, I could say that I think A happens with probability 0.2, and B happens with probability 0.8. But what if I feel so ignorant that I can't assign a probability to event B? That means I think P(B) could be any probability between [0.0, 1.0], while keeping P(A) = 0.2. So my joint probability distribution P(A, B) is somewhere within the line segment (0.2, 0.0) to (0.2, 1.0). Line segments are a convex set.

You can generalize this notion to infinite dimensions -- e.g. for a bit sequence of infinite length, specifying a complete probability distribution would require saying how probable each bit is likely to be equal to 1, conditioned on the values of all of the other bits. But we could instead only assign probabilities to the odd bits, not the even bits, and that would correspond to a convex set of probability distributions.

Hopefully that explains the convex set bit. The other part is why it's better to use convex sets. Well, one reason is that sometimes we might be unwilling to specify a probability distribution, because we know the true underlying process is uncomputable. This problem arises, for example, when an agent is trying to simulate itself. I* can never perfectly simulate a copy of myself within my mind, even probabilistically, because that leads to infinite regress -- this sort of paradox is related to the halting problem and Godel's incompleteness theorem.

In at least these cases it seems better to say "I don't know how to simulate this part of me", rather pretending I can assign a computable distribution to how I will behave. For example, if I don't know if I'm going to finish writing this comment in 5 minutes, I can assign it the imprecise probability [0.2, 1.0]. And then if I want to act safely, I just assume the worst case outcomes for the parts of me I don't know how to simulate, and act accordingly. This applies to other parts of the world I can't simulate as well -- the physical world (which contains me), or simply other agents I have reason to believe are smarter than me.

(*I'm using "I" here, but I really mean some model or computer that is capable of more precise simulation and prediction than humans are capable of.)

@Vanessa Kosoy has a nice explanation of Level 4 uncertainties (a.k.a as Knightian uncertainty), in the context of her work on infra-Bayesianism. The following is from her AXRP podcast interview with @DanielFilan :  

Daniel Filan: Okay. I guess this gets to a question that I have, which is, is the fact that we’re dealing with this convex sets of distributions … because that’s the main idea, and I’m wondering how that lets you deal with non-realizability, because it seems to me that if you have a convex set of probability distributions, in standard Bayesianism, you could just have a mixture distribution over all of that convex set, and you’ll do well on things that are inside your convex set, but you’ll do poorly on things that are outside your convex set. Yeah, can you give me a sense of how … Maybe this isn’t the thing that helps you deal with non-realizability, but if it is, how does it?

Vanessa Kosoy: The thing is, a convex set, you can think of it as some property that you think the world might have, right? Just let’s think of a trivial example. Suppose your world is a sequence of bits, so just an infinite sequence of bits, and one hypothesis you might have about the world is maybe all the even bits are equal to zero. This hypothesis doesn’t tell us anything about odd bits. It’s only a hypothesis about even bits, and it’s very easy to describe it as such a convex set. We just consider all probability distributions that predict that the odd bits will be zero with probability one, and without saying anything at all - the even bits, they can be anything. The behavior there can be anything.

Vanessa Kosoy: Okay, so what happens is, if instead of considering this convex set, you consider some distribution on this convex set, then you always get something which makes concrete predictions about the even bits. You can think about it in terms of computational complexity. All the probability distributions that you can actually work with have bounded computational complexity because you have bounded computational complexity. Therefore, as long as you’re assuming a probability distribution, a specific probability distribution, or it can be a prior over distributions, but that’s just the same thing. You can also average them, get one distribution. It’s like you’re assuming that the world has certain low computational complexity.

Vanessa Kosoy: One way to think of it is that Bayesian agents have a dogmatic belief that the world has low computational complexity. They believe this fact with probability one, because all their hypotheses have low computational complexity. You’re assigning probability one to this fact, and this is a wrong fact, and when you’re assigning probability one to something wrong, then it’s not surprising you run into trouble, right? Even Bayesians know this, but they can’t help it because there’s nothing you can do in Bayesianism to avoid it. With infra-Bayesianism, you can have some properties of the world, some aspects of the world can have low computational complexity, and other aspects of the world can have high complexity, or they can even be uncomputable. With this example with the bits, your hypothesis, it says that the odd bits are zero. The even bits, they can be uncomputable. They can be like the halting oracle or whatever. You’re not trying to have a prior over them because you know that you will fail, or at least you know that you might fail. That’s why you have different hypotheses in your prior.

From: https://axrp.net/episode/2021/03/10/episode-5-infra-bayesianism-vanessa-kosoy.html


As someone who works on probabilistic programming (w applications to simulation based modeling), I wanted to say that I thought this was very good! I think more people attracted to expected utility maximization should read this to expand their view of what's possible and practical in light of our value disagreements, and our deep ignorance about the world.


Seconded! On this note, I think the assumed presence of adversaries or competitors is actually one of the under-appreciated upshots of MIRI's work on Logical Induction (https://intelligence.org/2016/09/12/new-paper-logical-induction/). By the logical induction criterion they propose, "good reasoning" is only defined with respect to a market of traders of a particular complexity class - which can be interpreted as saying that "good reasoning" is really intersubjective rather than objective! There's only pressure to find the right logical beliefs in a reasonable amount of time if there are others who would fleece you for not doing so.

Welcome! To be clear, I do think that Buddhist thought and Kantian thought are more often at odds than in alignment. It's just that Garfield's more careful analysis of the No-Self argument suggests that accepting the emptiness of "Self" doesn't mean doing away with personhood-related concepts like moral responsibility.

That said, you might be interested in Dan Arnold's Brains, Buddhas and Believing, which does try to interpret arguments from the Madhyamaka school as similar to contemporary Kantian critiques against reductionism about the mind.

I really liked this post - appreciated how detailed and constructive it was! As one of the judges for the red-teaming contest, I personally thought this should have gotten a prize, and I think it's unfortunate that it didn't. I've tried to highlight it here in a comment on the announcement of contest winners!


Personal highlights from a non-consequentialist, left-leaning panelist
(Cross-posted from Twitter.)

Another judge for the criticism contest here - figured I would share some personal highlights from the contest as well!  I read much fewer submissions than the most active panelists (s/o to them for their hard work!), but given that I hold minority viewpoints in the context of EA  (non-consequentialist, leftist), I thought people might find these interesting.

I was initially pretty skeptical of the contest, and its ability to attract thoughtful foundational critiques. But now that the contest is over, I've been pleasantly surprised! 

To be clear, I still think there are important classes of critique missing. I would probably have framed the contest differently to encourage them, perhaps like what Michael Nielsen suggests here:

It would be amusing to have a second judging panel, of people strongly opposed to EA, and perhaps united by some other ideology. I wouldn't be surprised if they came to different conclusions.

I also basically agree with the critiques made in Zvi's criticism of the contest. All that said, below are some of my favorite (1) philosophical (2) ideological (3) object-level critiques.

(1) Philosophical Critiques

  • Population Ethics Without Axiology: A Framework
    Lukas Gloor's critique of axiological thinking was spot-on IMO. It gets at heart of why utilitarian EA/longtermism can lead to absurd conclusions, and how contractualist "minimal morality" addresses them. I think if people took Gloor's post seriously, it would strongly affect their views about what it means to "do good" in the first place: In order to "not be a jerk", one need not care about creating future happy people, whereas one probably should care about e.g. (global and intergenerational) justice.
  • On the Philosophical Foundations of EA
    I also liked this critique of several EA arguments for consequentialism by Will MacAskill and AFAIK shared by other influential EAs like Holdern Karnofsky and Nick Beckstead. Korsgaard's response to Parfit's argument (against person-affecting views) was new to me!
  • Deontology, the Paralysis Argument and altruistic longtermism
    Speaking of non-consequentialism, this one is more niche, but William D'Alessandro's refutation of Mogensen & MacAskill's "paralysis argument" that deontologists should be longtermists hit the spot IMO. The critique concludes that EAs / longtermists need to do better if they want to convince deontologists, which I very much agree with.

A few other philosophical critiques I've yet to fully read, but was still excited to see: 

(2) Ideological Critiques

I'm distinguishing these from the philosophical critiques, in that they are about EA as a lived practice and actually existing social movement. At least in my experience, the strongest disagreements with EA are generally ideological ones.

Unsurprisingly, there wasn't participation from the most vocal online critics! (Why make EA better if you think it should disappear?) But at least one piece did examine the "EA is too white, Western & male" and "EA is neocolonialist" critiques in depth: 

  • Red-teaming contest: demographics and power structures in EA
    The piece focuses on GiveWell and how it chooses "moral weights" as a case study. It then makes recommendations for democratizing ethical decision-making, power-sharing and increasing relevant geographic diversity.

    IMO this was a highly under-rated submission. It should have gotten a prize (at least $5k)! The piece doesn't say this itself, but it points toward a version of the EA movement that is majority non-white and non-Western, which I find both possible and desirable.

There was also a slew of critiques about the totalizing nature of EA as a lived practice (many of which were awarded prizes):

  • Effective altruism in the garden of ends
    I particularly liked this critique for being a first-person account from a (formerly) highly-involved EA about how such totalizing thinking can be really destructive.
  • Notes on Effective Altruism
    I also appreciated Michael Nielsen's critique, which discusses the aforementioned "EA misery trap", and also coins the term "EA judo" for how criticisms of EA are taken to merely improve EA, not discredit it.
  • Leaning into EA Disillusionment
    A related piece is about disillusionment with EA, and how to lean into it. I liked how it creates more space for sympathetic critics of EA with a lot of inside knowledge - including those of us who've never been especially "illusioned" in the first place!

That's it for the ideological critiques. This is the class of critique that felt the most lacking in my opinion. I personally would've liked more well-informed critiques from the Left, whether socialist or anarchist, on terms that EAs could appreciate. (Most such critiques I've seen are either no longer as relevant or feel too uncharitable to be constructive.)

There was one attempt to synthesize leftism and EA, but IMO not any better than this old piece by Joshua Kissel on "Effective Altruism and Anti-Capitalism". There have also been some fledgling anarchist critiques circulating online that I would love to see written up in more detail.

(And maybe stay tuned for The Political Limits of Effective Altruism, the pessimistic critique I've yet to write about the possibility of EA ever achieving what mass political movements achieve.)

(3) Object-Level Critiques

  • Biological Anchors External Review
    On AI risk, I'd be remiss not to highlight Jennifer Lin's review of the influential Biological Anchors report on AI timelines. I appreciated both the arguments against the neural network anchor, and the evolutionary anchor, and have become less convinced by the evolutionary anchor as a prediction for transformative AI by 2100.
  • A Critique of AI Takeover Scenarios
    I also appreciated James Fodor's critique of AI takeover scenarios put forth by influential EAs like Holden Karnofsky and Ajeya Cotra. I share the skepticism about the takeover stories I've seen so far, which have often seemed to me way too quick and subjective in their reasoning.
  • Are you really in a race? The Cautionary Tales of Szilárd and Ellsberg
    And of course, there's Haydn Belfield's cautionary tale about how nuclear researchers mistakenly thought they were in an arm's race, and how the same could happen (has happened?) with the race to "AGI". 
  • The most important climate change uncertainty
    Outside of AI risk, I was glad to see this piece on climate change get an honorable mention!  It dissects the disconnect between EA consensus and non-EAs about climate risk, and argues for more caution. (Disclosure: This was written by a friend, so I didn't vote on it.)
  • Red Teaming CEA’s Community Building Work
    Finally, I also appreciated this extensive critique of CEA's community-building work. I've yet to read it in full, but it resonates with challenges working with CEA I've witnessed while on the board of another EA organization.

There's of course tons more that I didn't get the chance to read. I wish I'd had the time! While the results of the contest of won't please everyone - much less the most trenchant EA critics - I still think the world is still better for it, and I'm now more optimistic about this particular contest format and incentive scheme than I was previously.


For whatever it's worth, it looks like Carrick himself has chosen to donate $2900 to the Salinas campaign, and to publicly announce it via his official Twitter account:

Today I donated the maximum amount, $2900, to #OR06's @AndreaRSalinas. I earned less than $45k last year, so my money is where my mouth is when I say that I believe she will do an excellent job representing Oregonians in DC. [1/2]

This is a tight race and we must win it not only to get Andrea into office but also to keep Congress blue. Please consider digging deep and donating to her campaign here: https://tinyurl.com/2p8m9nwh. And for those planning to help GOTV, I'm right here with you. [2/2]


Load more