This is a special post for quick takes by bruce. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Sorted by Click to highlight new quick takes since:

Reposting from LessWrong, for people who might be less active there:[1]

TL;DR

  • FrontierMath was funded by OpenAI[2]
  • This was not publicly disclosed until December 20th, the date of OpenAI's o3 announcement, including in earlier versions of the arXiv paper where this was eventually made public.
  • There was allegedly no active communication about this funding to the mathematicians contributing to the project before December 20th, due to the NDAs Epoch signed, but also no communication after the 20th, once the NDAs had expired.
  • OP claims that "I have heard second-hand that OpenAI does have access to exercises and answers and that they use them for validation. I am not aware of an agreement between Epoch AI and OpenAI that prohibits using this dataset for training if they wanted to, and have slight evidence against such an agreement existing."

Tamay's response:

  • Seems to have confirmed the OpenAI funding + NDA restrictions
  • Claims OpenAI has "access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities."
    • They also have "a verbal agreement that these materials will not be used in model training."


Edit (19/01): Elliot (the project lead) points out that the holdout set does not yet exist (emphasis added): 

As for where the o3 score on FM stands: yes I believe OAI has been accurate with their reporting on it, but Epoch can't vouch for it until we independently evaluate the model using the holdout set we are developing.[3]

Edit (24/01):
Tamay tweets an apology (possibly including the timeline drafted by Elliot). It's pretty succinct so I won't summarise it here! Blog post version for people without twitter. Perhaps the most relevant point:

OpenAI commissioned Epoch AI to produce 300 advanced math problems for AI evaluation that form the core of the FrontierMath benchmark. As is typical of commissioned work, OpenAI retains ownership of these questions and has access to the problems and solutions.

Nat from OpenAI with an update from their side:

  • We did not use FrontierMath data to guide the development of o1 or o3, at all.
  • We didn't train on any FM derived data, any inspired data, or any data targeting FrontierMath in particular
  • I'm extremely confident, because we only downloaded frontiermath for our evals *long* after the training data was frozen, and only looked at o3 FrontierMath results after the final announcement checkpoint was already picked .

============

Some quick uncertainties I had:

  • What does this mean for OpenAI's 25% score on the benchmark?
  • What steps did Epoch take or consider taking to improve transparency between the time they were offered the NDA and the time of signing the NDA?
  • What is Epoch's level of confidence that OpenAI will keep to their verbal agreement to not use these materials in model training, both in some technically true sense, and in a broader interpretation of an agreement? (see e.g. bottom paragraph of Ozzi's comment).
  • In light of the confirmation that OpenAI not only has access to the problems and solutions but has ownership of them, what steps did Epoch consider before signing the relevant agreement to get something stronger than a verbal agreement that this won't be used in training, now or in the future?
  1. ^

    Epistemic status: quickly summarised + liberally copy pasted with ~0 additional fact checking given Tamay's replies in the comment section

  2. ^

    arXiv v5 (Dec 20th version) "We gratefully acknowledge OpenAI for their support in creating the benchmark."

  3. ^

    See clarification in case you interpreted Tamay's comments (e.g. that OpenAI "do not have access to a separate holdout set that serves as an additional safeguard for independent verification") to mean that the holdout set already exists

Note that the hold-out set doesn't exist yet. https://x.com/ElliotGlazer/status/1880812021966602665

What does this mean for OpenAI's 25% score on the benchmark?

Note that only some of FrontierMath's problems are actually frontier, while others are relatively easier (i.e. IMO level, and Deepmind was already one point from gold on IMO level problems) https://x.com/ElliotGlazer/status/1870235655714025817

first funding, then talent, then PR, and now this.

how much juice will OpenAI squeeze out of EA?

Its OK man because Sam has promised to donate 500 million a year to EA causes!

What did we say about making jokes on the forum Nick?

It's true we've discussed this already...

I've known Jaime for about ten years. Seems like he made an arguably wrong call when first dealing with real powaah, but overall I'm confident his heart is in the right place.

Some very quick thoughts from EY's TIME piece from the perspective of someone ~outside of the AI safety work. I have no technical background and don't follow the field closely, so likely to be missing some context and nuance; happy to hear pushback!

Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.

Frame nothing as a conflict between national interests, have it clear that anyone talking of arms races is a fool. That we all live or die as one, in this, is not a policy but a fact of nature. Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.

 

  • My immediate reaction when reading this was something like "wow, is this representative of AI safety folks? Are they willing to go to any lengths to stop AI development?". I've heard anecdotes of people outside of all this stuff saying this piece reads like a terrorist organisation, for example, which I think is a stronger term than I'd describe, but I think suggestions like this does unfortunately play into potential comparisons to ecofascists.
  • As someone seen publicly to be a thought leader and widely regarded as a founder of the field, there are some risks to this kind of messaging. It's hard to evaluate how this trades off, but I definitely know communities and groups that would be pretty put off by this, and it's unclear how much value the sentences around willingness to escalate nuclear war are actually adding.
    • It's an empirical Q about how to tradeoff between risks from nuclear war and risks from AI, but the claim of "preventing AI extinction is a priority above a nuclear exchange" is ~trivially true; the reverse is also true: "preventing extinction from nuclear war is a priority above preventing AI training runs". Given the difficulty of illustrating and defending a position that the risks of AI training runs is substantially higher than that of a nuclear exchange to the general public, I would have erred on the side of caution when saying things that are as politically charged as advocating for nuclear escalation (or at least something can be interpreted as such).
    • I wonder which superpower EY trusts to properly identify a hypothetical "rogue datacentre" that's worthy of a military strike for the good of humanity, or whether this will just end up with parallels to other failed excursions abroad 'for the greater good' or to advance individual national interests.
  • If nuclear weapons are a reasonable comparison, we might expect limitations to end up with a few competing global powers to have access to AI developments, and countries that do not. It seems plausible that criticism around these treaties being used to maintain the status quo in the nuclear nonproliferation / disarmament debate may be applicable here too.
  • Unlike nuclear weapons (though nuclear power may weaken this somewhat), developments in AI has the potential to help immensely with development and economic growth.
  • Thus the conversation may eventually bump something that looks like:
    • Richer countries / first movers that have obtained significant benefits of AI take steps to prevent other countries from catching up.[1]
    • Rich countries using the excuse of preventing AI extinction as a guise to further national interests
    • Development opportunities from AI for LMICs are similarly hindered, or only allowed in a way that is approved by the first movers in AI.
  • Given the above, and that conversations around and tangential to AI risk already receive some pushback from the Global South community for distracting and taking resources away from existing commitments to UN Development Goals, my sense is that folks working in AI governance / policy would likely strongly benefit from scoping out how these developments are affecting Global South stakeholders, and how to get their buy-in for such measures.

    (disclaimer: one thing this gestures at is something like - "global health / development efforts can be instrumentally useful towards achieving longtermist goals"[2], which is something I'm clearly interested in as someone working in global health. While it seems rather unlikely that doing so is the best way of achieving longtermist goals on the margin[3], it doesn't exclude some aspect of this in being part of a necessary condition for important wins like an international treaty, if that's what is currently being advocated for. It is also worth mentioning because I think this is likely to be a gap / weakness in existing EA approaches).
  1. ^

    this is applicable to a weaker extent even in the event that international agreements on indefinite moratorium on new large training runs passes, if you see AI as a potential equalising force or if you think first movers might be worried about this

  2. ^
  3. ^
[comment deleted]7
0
0
[comment deleted]2
0
0
[comment deleted]2
0
0
Curated and popular this week
Garrison
 ·  · 7m read
 · 
This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race to build Machine Superintelligence. Consider subscribing to stay up to date with my work. Wow. The Wall Street Journal just reported that, "a consortium of investors led by Elon Musk is offering $97.4 billion to buy the nonprofit that controls OpenAI." Technically, they can't actually do that, so I'm going to assume that Musk is trying to buy all of the nonprofit's assets, which include governing control over OpenAI's for-profit, as well as all the profits above the company's profit caps. OpenAI CEO Sam Altman already tweeted, "no thank you but we will buy twitter for $9.74 billion if you want." (Musk, for his part, replied with just the word: "Swindler.") Even if Altman were willing, it's not clear if this bid could even go through. It can probably best be understood as an attempt to throw a wrench in OpenAI's ongoing plan to restructure fully into a for-profit company. To complete the transition, OpenAI needs to compensate its nonprofit for the fair market value of what it is giving up. In October, The Information reported that OpenAI was planning to give the nonprofit at least 25 percent of the new company, at the time, worth $37.5 billion. But in late January, the Financial Times reported that the nonprofit might only receive around $30 billion, "but a final price is yet to be determined." That's still a lot of money, but many experts I've spoken with think it drastically undervalues what the nonprofit is giving up. Musk has sued to block OpenAI's conversion, arguing that he would be irreparably harmed if it went through. But while Musk's suit seems unlikely to succeed, his latest gambit might significantly drive up the price OpenAI has to pay. (My guess is that Altman will still ma
 ·  · 5m read
 · 
When we built a calculator to help meat-eaters offset the animal welfare impact of their diet through donations (like carbon offsets), we didn't expect it to become one of our most effective tools for engaging new donors. In this post we explain how it works, why it seems particularly promising for increasing support for farmed animal charities, and what you can do to support this work if you think it’s worthwhile. In the comments I’ll also share our answers to some frequently asked questions and concerns some people have when thinking about the idea of an ‘animal welfare offset’. Background FarmKind is a donation platform whose mission is to support the animal movement by raising funds from the general public for some of the most effective charities working to fix factory farming. When we built our platform, we directionally estimated how much a donation to each of our recommended charities helps animals, to show users.  This also made it possible for us to calculate how much someone would need to donate to do as much good for farmed animals as their diet harms them – like carbon offsetting, but for animal welfare. So we built it. What we didn’t expect was how much something we built as a side project would capture peoples’ imaginations!  What it is and what it isn’t What it is:  * An engaging tool for bringing to life the idea that there are still ways to help farmed animals even if you’re unable/unwilling to go vegetarian/vegan. * A way to help people get a rough sense of how much they might want to give to do an amount of good that’s commensurate with the harm to farmed animals caused by their diet What it isn’t:  * A perfectly accurate crystal ball to determine how much a given individual would need to donate to exactly offset their diet. See the caveats here to understand why you shouldn’t take this (or any other charity impact estimate) literally. All models are wrong but some are useful. * A flashy piece of software (yet!). It was built as
Omnizoid
 ·  · 9m read
 · 
Crossposted from my blog which many people are saying you should check out!    Imagine that you came across an injured deer on the road. She was in immense pain, perhaps having been mauled by a bear or seriously injured in some other way. Two things are obvious: 1. If you could greatly help her at small cost, you should do so. 2. Her suffering is bad. In such a case, it would be callous to say that the deer’s suffering doesn’t matter because it’s natural. Things can both be natural and bad—malaria certainly is. Crucially, I think in this case we’d see something deeply wrong with a person who thinks that it’s not their problem in any way, that helping the deer is of no value. Intuitively, we recognize that wild animals matter! But if we recognize that wild animals matter, then we have a problem. Because the amount of suffering in nature is absolutely staggering. Richard Dawkins put it well: > The total amount of suffering per year in the natural world is beyond all decent contemplation. During the minute that it takes me to compose this sentence, thousands of animals are being eaten alive, many others are running for their lives, whimpering with fear, others are slowly being devoured from within by rasping parasites, thousands of all kinds are dying of starvation, thirst, and disease. It must be so. If there ever is a time of plenty, this very fact will automatically lead to an increase in the population until the natural state of starvation and misery is restored. In fact, this is a considerable underestimate. Brian Tomasik a while ago estimated the number of wild animals in existence. While there are about 10^10 humans, wild animals are far more numerous. There are around 10 times that many birds, between 10 and 100 times as many mammals, and up to 10,000 times as many both of reptiles and amphibians. Beyond that lie the fish who are shockingly numerous! There are likely around a quadrillion fish—at least thousands, and potentially hundreds of thousands o