Hide table of contents

TL;DR: I'm trying to either come up with a new promising AIS direction or decide (based on my inside view and not based on trust) that I strongly believe in one of the existing proposals. Is there some ML background that I better get? (and if possible: why do you think so?)

I am not asking how to be employable 

I know there are other resources on that, and I'm not currently trying to be employed.

Examples of seemingly useful things I learned so far (I want more of these)

  1. GPT's training involves loss only for the next token (relevant for myopia)
  2. Neural networks have a TON of parameters (relevant for why interpretability is hard)
  3. GPT has a limited amount of tokens in the prompt, plus other problems that I can imagine[1] trying to solve (relevant for solutions like "chain of thought")
  4. The vague sense that nobody has a good idea for why things work, and the experience of trying different hyperparameters and learning only in retrospect which of them did well.

(Please correct me if something here was wrong)

I'm asking because I'm trying to decide what to learn next in ML

if anything.

My background

  • Almost no ML/math background: I spent about 13 days catching up on both. What I did so far is something like "build and train GPT2 (plus some of pyTorch)", and learn about some other architectures.
  • I have a lot of non-ML software engineering experience.

Thanks!

 

  1. ^

    I don't intend to advance ML capabilities

7

0
0

Reactions

0
0
Comments3


Sorted by Click to highlight new comments since:

Reading ML papers seems like a pretty useful skill. They're quite different from blog posts, less philosophical and more focused on proving that their methods actually result in progress. Over time, they'll teach you what kinds of strategies tend to succeed in ML, and which ones are more difficult, helping you build an inside view of different alignment agendas. MLSS has a nice list of papers, and weeks 3 through 5 of AGISF have particularly good ones too. 

Some basic knowledge of (relatively) old-school probabilistic graphical models, along with basic understanding of variational inference. Not that graphical models are going to be used directly for any SOTA models any more, but the mathematical formalism is still very useful.

For example, understanding how inference on a graphical model works motivates the control-as-inference perspective on reinforcement learning. This is useful for understanding things like decision transformers, or this post on how to interpret RLHF on language models.

It would also be essential background to understand the causal incentives research agenda.

So the same tools come up in two very different places, which I think makes a case for their usefulness.

This is in some sense math-heavy, and some of the concepts are pretty dense, but without many mathematical prerequisites. You have to understand basic probability (how expected values and log likelihoods work, mental comfort going between  and  notation), basic calculus (like "set the derivative = 0 to maximize"), and be comfortable algebraically manipulating sums and products.

Curated and popular this week
 ·  · 1m read
 · 
Although some of the jokes are inevitably tasteless, and Zorrilla is used to set up punchlines, I enjoyed it and it will surely increase concerns and donations for shrimp. I'm not sure what impression the audience will have of EA in general.  Last week The Daily Show interviewed Rutger Bregman about his new book Moral Ambition (which includes a profile of Zorrilla and SWP). 
 ·  · 7m read
 · 
Article 5 of the 1948 Universal Declaration of Human Rights states: "Obviously, no one shall be subjected to torture or to cruel, inhuman or degrading treatment or punishment." OK, it doesn’t actually start with "obviously," but I like to imagine the commissioners all murmuring to themselves “obviously” when this item was brought up. I’m not sure what the causal effect of Article 5 (or the 1984 UN Convention Against Torture) has been on reducing torture globally, though the physical integrity rights index (which “captures the extent to which people are free from government torture and political killings”) has increased from 0.48 in 1948 to 0.67 in 2024 (which is good). However, the index reached 0.67 already back in 2001, so at least according to this metric, we haven’t made much progress in the past 25 years. Reducing government torture and killings seems to be low in tractability. Despite many countries having a physical integrity rights index close to 1.0 (i.e., virtually no government torture or political killings), many of their citizens still experience torture-level pain on a regular basis. I’m talking about cluster headache, the “most painful condition known to mankind” according to Dr. Caroline Ran of the Centre for Cluster Headache, a newly-founded research group at the Karolinska Institutet in Sweden. Dr. Caroline Ran speaking at the 2025 Symposium on the recent advances in Cluster Headache research and medicine Yesterday I had the opportunity to join the first-ever international research symposium on cluster headache organized at the Nobel Forum of the Karolinska Institutet. It was a 1-day gathering of roughly 100 participants interested in advancing our understanding of the origins of and potential treatments for cluster headache. I'd like to share some impressions in this post. The most compelling evidence for Dr. Ran’s quote above comes from a 2020 survey of cluster headache patients by Burish et al., which asked patients to rate cluster headach
 ·  · 2m read
 · 
A while back (as I've just been reminded by a discussion on another thread), David Thorstad wrote a bunch of posts critiquing the idea that small reductions in extinction risk have very high value, because the expected number of people who will exist in the future is very high: https://reflectivealtruism.com/category/my-papers/mistakes-in-moral-mathematics/. The arguments are quite complicated, but the basic points are that the expected number of people in the future is much lower than longtermists estimate because: -Longtermists tend to neglect the fact that even if your intervention blocks one extinction risk, there are others it might fail to block; surviving for billions  (or more) of years likely  requires driving extinction risk very low for a long period of time, and if we are not likely to survive that long, even conditional on longtermist interventions against one extinction risk succeeding, the value of preventing extinction (conditional on more happy people being valuable) is much lower.  -Longtermists tend to assume that in the future population will be roughly as large as the available resources can support. But ever since the industrial revolution, as countries get richer, their fertility rate falls and falls until it is below replacement. So we can't just assume future population sizes will be near the limits of what the available resources will support. Thorstad goes on to argue that this weakens the case for longtermism generally, not just the value of extinction risk reductions, since the case for longtermism is that future expected population  is many times the current population, or at least could be given plausible levels of longtermist extinction risk reduction effort. He also notes that if he can find multiple common mistakes in longtermist estimates of expected future population, we should expect that those estimates might be off in other ways. (At this point I would note that they could also be missing factors that bias their estimates of