Mo Putera

Bio

Participation
4

CE Research Training Program graduate and research intern at ARMoR under the Global Impact Placements program, working on cost-benefit analyses to help combat AMR. Currently exploring roles involving research distillation and quantitative analysis to improve decision-making e.g. applied prioritization research, previously supported by a FTXFF regrant. Previously spent 6 years doing data analytics, business intelligence and knowledge + project management in various industries (airlines, e-commerce) and departments (commercial, marketing), after majoring in physics at UCLA. Also collaborating on a local charity evaluation initiative with the moonshot aim of reorienting Malaysia's giving landscape towards effectiveness. 

I first learned about effective altruism circa 2014 via A Modest Proposal, a polemic on using dead children as units of currency to force readers to grapple with the opportunity costs of subpar resource allocation under triage. I have never stopped thinking about it since.

Comments
98

Topic contributions
3

Would you count Holden's take here as a robust case for funding forecasting as an effective use of charitable funds? 

It's not controversial to say a highly general AI system, such as PASTA, would be momentous. The question is, when (if ever) will such a thing exist?

Over the last few years, a team at Open Philanthropy has investigated this question from multiple angles.

One forecasting method observes that:

  • No AI model to date has been even 1% as "big" (in terms of computations performed) as a human brain, and until recently this wouldn't have been affordable - but that will change relatively soon.
  • And by the end of this century, it will be affordable to train enormous AI models many times over; to train human-brain-sized models on enormously difficult, expensive tasks; and even perhaps to perform as many computations as have been done "by evolution" (by all animal brains in history to date).

This method's predictions are in line with the latest survey of AI researchers: something like PASTA is more likely than not this century.

A number of other angles have been examined as well.

One challenge for these forecasts: there's no "field of AI forecasting" and no expert consensus comparable to the one around climate change.

It's hard to be confident when the discussions around these topics are small and limited. But I think we should take the "most important century" hypothesis seriously based on what we know now, until and unless a "field of AI forecasting" develops.

This is my own (possibly very naive) interpretation of one motivation behind some of Open Phil's forecasting-related grants

Actually, maybe it's also useful to just look at the biggest grants from that list: 

  • $7,993,780 over two years to the Applied Research Laboratory for Intelligence and Security at the University of Maryland, to support the development of two forecasting platforms, in a project led by Dr. Adam Russell. The forecasting platforms will be provided as a resource to help answer questions for policymakers (writeup)
  • two grants totaling $6,305,675 over three years to support the Forecasting Research Institute (FRI)’s work on projects to advance the science of forecasting as a tool to improve public policy and reduce existential risk. This includes developing a new modular forecasting platform and conducting research to test different forecasting techniques. This follows our October 2021 support ($275,000) for planning work by FRI Chief Scientist Philip Tetlock, and falls within our work on global catastrophic risks (writeup)
  • $3,000,000 to Metaculus to support work to improve its online forecasting platform, which allows forecasters to make predictions about world events. We believe that this work will help to provide more accurate and calibrated forecasts in domains relevant to Open Philanthropy’s work, such as artificial intelligence and biosecurity and pandemic preparedness, and enable organizations and individuals working in those areas to make better decisions. This follows our May 2022 support ($5,500,000) and falls within our work on global catastrophic risks (writeup)

something EA often misses from it's birds-eye approach to solutions - leverage.

I'd be curious as to what you mean here, since my impression was always that EA discourse heavily emphasises leverage – e.g. in the SPC framework for cause prioritisation, in career advice by 80,000 Hours and Probably Good, in GiveWell's reasoning (for instance here is how GW's spreadsheet adjusts for leverage in evaluating AMF). 

In Tom's report it's an open question

  • To inform the size of the effective FLOP gap
    • ...
    • What is the current $ value-add of AI? How is it changes over time, or with model size?
      • Various ways of operationalising this: investment, revenues, effect on GDP.
      • Relevant for when AI will first be capable enough to readily add $trillions / year to GDP.

The closest the report gets to answering your question seems to be in the Evidence about the size of the effective FLOP gap subsection, where he says (I put footnotes in square brackets)

  • As of today the largest training run is ~3e24 FLOP. [I believe these were the requirements for PaLM.] ...
  • In my opinion, today’s AI systems are not close to being able to readily perform 20% of all cognitive tasks done by human workers. [Actually automating these tasks would add ~$10tr/year to GDP.]
  • If today’s systems could readily add $500b/year to the economy, that would correspond to automating ~1% of cognitive tasks. [World GDP is ~$100tr, about half of which is paid to human labour. If AI automates 1% of that work, that’s worth ~$500b/year.]

That last assumption bullet is what seems to have gone into the https://takeoffspeeds.com/ model referenced in Vasco's answer.

You may have also seen Sam Clarke's classification of AI x-risk sources, just sharing for others :) 

Wei Dai and Daniel Kokotajlo's older longlist might worth perusing too?

As someone predisposed to like modeling, the key takeaway I got from Justin Sandefur's Asterisk essay PEPFAR and the Costs of Cost-Benefit Analysis was this corrective reminder – emphasis mine, focusing on what changed my mind:

Second, economists were stuck in an austerity mindset, in which global health funding priorities were zero-sum: $300 for a course of HIV drugs means fewer bed nets to fight malaria. But these trade-offs rarely materialized. The total budget envelope for global public health in the 2000s was not fixed. PEPFAR raised new money. That money was probably not fungible across policy alternatives. Instead, the Bush White House was able to sell a dramatic increase in America’s foreign aid budget by demonstrating that several billion dollars could, realistically, halt an epidemic that was killing more people than any other disease in the world. 

...

A broader lesson here, perhaps, is about getting counterfactuals right. In comparative cost-effectiveness analysis, the counterfactual to AIDS treatment is the best possible alternative use of that money to save lives. In practice, the actual alternative might simply be the status quo, no PEPFAR, and a 0.1% reduction in the fiscal year 2004 federal budget. Economists are often pessimistic about the prospects of big additional spending, not out of any deep knowledge of the budgeting process, but because holding that variable fixed makes analyzing the problem more tractable. In reality, there are lots of free variables.

More detail:

Economists’ standard optimization framework is to start with a fixed budget and allocate money across competing alternatives. At a high-level, this is also how the global development community (specifically OECD donors) tends to operate: foreign aid commitments are made as a proportion of national income, entirely divorced from specific policy goals. PEPFAR started with the goal instead: Set it, persuade key players it can be done, and ask for the money to do it.

Bush didn’t think like an economist. He was apparently allergic to measuring foreign aid in terms of dollars spent. Instead, the White House would start with health targets and solve for a budget, not vice versa. ... Economists are trained to look for trade-offs. This is good intellectual discipline. Pursuing “Investment A” means forgoing “Investment B.” But in many real-world cases, it’s not at all obvious that the realistic alternative to big new spending proposals is similar levels of big new spending on some better program. The realistic counterfactual might be nothing at all.

In retrospect, it seems clear that economists were far too quick to accept the total foreign aid budget envelope as a fixed constraint. The size of that budget, as PEPFAR would demonstrate, was very much up for debate.

When Bush pitched $15 billion over five years in his State of the Union, he noted that $10 billion would be funded by money that had not yet been promised. And indeed, 2003 marked a clear breaking point in the history of American foreign aid. In real-dollar terms, aid spending had been essentially flat for half a century at around $20 billion a year. By the end of Bush’s presidency, between PEPFAR and massive contracts for Iraq reconstruction, that number hovered around $35 billion. And it has stayed there since. 

Compared to normal development spending, $15 billion may have sounded like a lot, but exactly one sentence after announcing that number in his State of the Union address, Bush pivoted to the case for invading Iraq, a war that would eventually cost America something in the region of $3 trillion — not to mention thousands of American and hundreds of thousands of Iraqi lives. Money was not a real constraint.

Tangentially, I suspect this sort of attitude (Iraq invasion notwithstanding) would naturally arise out of a definite optimism mindset (that essay by Dan Wang is incidentally a great read; his follow-up is more comprehensive and clearly argued, but I prefer the original for inspiration). It seems to me that Justin has this mindset as well, cf. his analogy to climate change in comparing economists' carbon taxes and cap-and-trade schemes vs progressive activists pushing for green tech investment to bend the cost curve. He concludes: 

You don’t have to give up on cost-effectiveness or utilitarianism altogether to recognize that these frameworks led economists astray on PEPFAR — and probably some other topics too. Economists got PEPFAR wrong analytically, not emotionally, and continue to make the same analytical mistakes in numerous domains. Contrary to the tenets of the simple, static, comparative cost-effectiveness analysis, cost curves can sometimes be bent, some interventions scale more easily than others, and real-world evidence of feasibility and efficacy can sometimes render budget constraints extremely malleable. Over 20 years later, with $100 billion dollars appropriated under both Democratic and Republican administrations, and millions of lives saved, it’s hard to argue a different foreign aid program would’ve garnered more support, scaled so effectively, and done more good. It’s not that trade-offs don’t exist. We just got the counterfactual wrong.

Aside from his climate change example above, I'd be curious to know what other domains economists are making analytical mistakes in w.r.t. cost-benefit modeling, since I'm probably predisposed to making the same kinds of mistakes. 

One of the more surprising things I learned from Karen Levy's 80K podcast interview on misaligned incentives in global development was how her experience directly contradicted a stereotype I had about for-profits vs nonprofits: 

Karen Levy: When I did Y Combinator, I expected it to be a really competitive environment: here you are in the private sector and it’s all about competition. And I was blown away by the level of collaboration that existed in that community — and frankly, in comparison to the nonprofit world, which can be competitive. People compete for funding, and so very often we’re fighting over slices of the same pie. Whereas the Y Combinator model is like, “We’re making the pie bigger. It’s getting bigger for everybody.”

My assumption had been that the opposite was true. 

Tomasik's claim (emphasis mine)

I suspect many charities differ by at most ~10 to ~100 times, and within a given field, the multipliers are probably less than a factor of ~5.

reminded me of this (again emphasis mine) from Ben Todd's 80K article How much do solutions to social problems differ in their effectiveness? A collection of all the studies we could find

Overall, my guess is that, in an at least somewhat data-rich area, using data to identify the best interventions can perhaps boost your impact in the area by 3–10 times compared to picking randomly, depending on the quality of your data.

This is still a big boost, and hugely underappreciated by the world at large. However, it’s far less than I’ve heard some people in the effective altruism community claim.

In addition, there are downsides to being data-driven in this way — by insisting on a data-driven approach, you might be ruling out many of the interventions in the tail (which are often hard to measure, and so will be missing). This is why we advocate for first aiming to take a ‘hits-based’ approach, rather than a data-driven one.

"Hits-based rather than data-driven" is a pretty thought-provoking corrective to me, as I'm maybe biased by my background having worked in data-rich environments my whole career.

I thought I had mostly internalized the heavy-tailed worldview from a life-guiding perspective, but reading Ben Kuhn's searching for outliers made me realize I hadn't. So here are some summarized reminders for posterity:  

  • Key idea: lots of important things in life generated by multiplicative processes resulting in heavy-tailed distributions – jobs, employees / colleagues, ideas, romantic relationships, success in business / investing / philanthropy, how useful it is to try new activities  
  • Decision relevance to living better, i.e. what Ben thinks I should do differently:
    • Getting lots of samples improves outcomes a lot, so draw as many samples as possible
    • Trust the process and push through the demotivation of super-high failure rates (instead of taking them as evidence that the process is bad)
    • But don't just trust any process; it must have 2 parts: (1) a good way to tell if a candidate is an outlier ("maybe amazing" below) (2) a good way to draw samples 
    • Optimize less, draw samples more (for a certain type of person)
    • Filter for "maybe amazing", not "probably good", as they have different traits
    • Filter for "ruling in" candidates, not "ruling out" (e.g. in dating)
    • Cultivate an abundance mindset to help reject more candidates early on (to find 99.9th percentile not just 90th)
    • Think ahead about what outliers look like, to avoid accidentally rejecting 99.9th percentile candidates out of miscalibration, by asking others based on their experience 
  • My reservations with Ben's advice, despite thinking they're mostly sound and idea-generating:
    • "Stick with the process through super-high failure rates instead of taking them as evidence that the process is bad" feels uncomfortably close to protecting a belief from falsification
    • Filtering for "maybe amazing", not "probably good" makes me uncomfortable because I'm not risk-neutral (e.g. in RP's CCM I'm probably closest to "difference-making risk-weighted expected utility = low to moderate risk aversion", which for instance assesses RP's default AI risk misalignment megaproject as resulting in, not averting, 300+ DALYs per $1k)
    • Unlike Ben, I'm a relatively young person in a middle-income country, and the abundance mindset feels privileged (i.e. not as much runway to try and fail) 
  • So maybe a precursor / enabling activity for the "sample more" approach above is "more runway-building": money, leisure time, free attention & health, proximity to opportunities(?)

suppose that humanity is extinct (or reduced to a locked-in state) by 3000 CE (or any other period you choose); how likely is it that factor x figures in a causal chain leading to that?

Perhaps not a direct answer to your question, but this reminded me of the Metaculus Ragnarok series.

Load more