This is a special post for quick takes by bruce. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Sorted by Click to highlight new quick takes since:

Reposting from LessWrong, for people who might be less active there:[1]

TL;DR

  • FrontierMath was funded by OpenAI[2]
  • This was not publicly disclosed until December 20th, the date of OpenAI's o3 announcement, including in earlier versions of the arXiv paper where this was eventually made public.
  • There was allegedly no active communication about this funding to the mathematicians contributing to the project before December 20th, due to the NDAs Epoch signed, but also no communication after the 20th, once the NDAs had expired.
  • OP claims that "I have heard second-hand that OpenAI does have access to exercises and answers and that they use them for validation. I am not aware of an agreement between Epoch AI and OpenAI that prohibits using this dataset for training if they wanted to, and have slight evidence against such an agreement existing."

Tamay's response:

  • Seems to have confirmed the OpenAI funding + NDA restrictions
  • Claims OpenAI has "access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities."
    • They also have "a verbal agreement that these materials will not be used in model training."


Edit (19/01): Elliot (the project lead) points out that the holdout set does not yet exist (emphasis added): 

As for where the o3 score on FM stands: yes I believe OAI has been accurate with their reporting on it, but Epoch can't vouch for it until we independently evaluate the model using the holdout set we are developing.[3]

Edit (24/01):
Tamay tweets an apology (possibly including the timeline drafted by Elliot). It's pretty succinct so I won't summarise it here! Blog post version for people without twitter. Perhaps the most relevant point:

OpenAI commissioned Epoch AI to produce 300 advanced math problems for AI evaluation that form the core of the FrontierMath benchmark. As is typical of commissioned work, OpenAI retains ownership of these questions and has access to the problems and solutions.

Nat from OpenAI with an update from their side:

  • We did not use FrontierMath data to guide the development of o1 or o3, at all.
  • We didn't train on any FM derived data, any inspired data, or any data targeting FrontierMath in particular
  • I'm extremely confident, because we only downloaded frontiermath for our evals *long* after the training data was frozen, and only looked at o3 FrontierMath results after the final announcement checkpoint was already picked .

============

Some quick uncertainties I had:

  • What does this mean for OpenAI's 25% score on the benchmark?
  • What steps did Epoch take or consider taking to improve transparency between the time they were offered the NDA and the time of signing the NDA?
  • What is Epoch's level of confidence that OpenAI will keep to their verbal agreement to not use these materials in model training, both in some technically true sense, and in a broader interpretation of an agreement? (see e.g. bottom paragraph of Ozzi's comment).
  • In light of the confirmation that OpenAI not only has access to the problems and solutions but has ownership of them, what steps did Epoch consider before signing the relevant agreement to get something stronger than a verbal agreement that this won't be used in training, now or in the future?
  1. ^

    Epistemic status: quickly summarised + liberally copy pasted with ~0 additional fact checking given Tamay's replies in the comment section

  2. ^

    arXiv v5 (Dec 20th version) "We gratefully acknowledge OpenAI for their support in creating the benchmark."

  3. ^

    See clarification in case you interpreted Tamay's comments (e.g. that OpenAI "do not have access to a separate holdout set that serves as an additional safeguard for independent verification") to mean that the holdout set already exists

Note that the hold-out set doesn't exist yet. https://x.com/ElliotGlazer/status/1880812021966602665

What does this mean for OpenAI's 25% score on the benchmark?

Note that only some of FrontierMath's problems are actually frontier, while others are relatively easier (i.e. IMO level, and Deepmind was already one point from gold on IMO level problems) https://x.com/ElliotGlazer/status/1870235655714025817

first funding, then talent, then PR, and now this.

how much juice will OpenAI squeeze out of EA?

Its OK man because Sam has promised to donate 500 million a year to EA causes!

What did we say about making jokes on the forum Nick?

It's true we've discussed this already...

I've known Jaime for about ten years. Seems like he made an arguably wrong call when first dealing with real powaah, but overall I'm confident his heart is in the right place.

Some very quick thoughts from EY's TIME piece from the perspective of someone ~outside of the AI safety work. I have no technical background and don't follow the field closely, so likely to be missing some context and nuance; happy to hear pushback!

Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.

Frame nothing as a conflict between national interests, have it clear that anyone talking of arms races is a fool. That we all live or die as one, in this, is not a policy but a fact of nature. Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.

 

  • My immediate reaction when reading this was something like "wow, is this representative of AI safety folks? Are they willing to go to any lengths to stop AI development?". I've heard anecdotes of people outside of all this stuff saying this piece reads like a terrorist organisation, for example, which I think is a stronger term than I'd describe, but I think suggestions like this does unfortunately play into potential comparisons to ecofascists.
  • As someone seen publicly to be a thought leader and widely regarded as a founder of the field, there are some risks to this kind of messaging. It's hard to evaluate how this trades off, but I definitely know communities and groups that would be pretty put off by this, and it's unclear how much value the sentences around willingness to escalate nuclear war are actually adding.
    • It's an empirical Q about how to tradeoff between risks from nuclear war and risks from AI, but the claim of "preventing AI extinction is a priority above a nuclear exchange" is ~trivially true; the reverse is also true: "preventing extinction from nuclear war is a priority above preventing AI training runs". Given the difficulty of illustrating and defending a position that the risks of AI training runs is substantially higher than that of a nuclear exchange to the general public, I would have erred on the side of caution when saying things that are as politically charged as advocating for nuclear escalation (or at least something can be interpreted as such).
    • I wonder which superpower EY trusts to properly identify a hypothetical "rogue datacentre" that's worthy of a military strike for the good of humanity, or whether this will just end up with parallels to other failed excursions abroad 'for the greater good' or to advance individual national interests.
  • If nuclear weapons are a reasonable comparison, we might expect limitations to end up with a few competing global powers to have access to AI developments, and countries that do not. It seems plausible that criticism around these treaties being used to maintain the status quo in the nuclear nonproliferation / disarmament debate may be applicable here too.
  • Unlike nuclear weapons (though nuclear power may weaken this somewhat), developments in AI has the potential to help immensely with development and economic growth.
  • Thus the conversation may eventually bump something that looks like:
    • Richer countries / first movers that have obtained significant benefits of AI take steps to prevent other countries from catching up.[1]
    • Rich countries using the excuse of preventing AI extinction as a guise to further national interests
    • Development opportunities from AI for LMICs are similarly hindered, or only allowed in a way that is approved by the first movers in AI.
  • Given the above, and that conversations around and tangential to AI risk already receive some pushback from the Global South community for distracting and taking resources away from existing commitments to UN Development Goals, my sense is that folks working in AI governance / policy would likely strongly benefit from scoping out how these developments are affecting Global South stakeholders, and how to get their buy-in for such measures.

    (disclaimer: one thing this gestures at is something like - "global health / development efforts can be instrumentally useful towards achieving longtermist goals"[2], which is something I'm clearly interested in as someone working in global health. While it seems rather unlikely that doing so is the best way of achieving longtermist goals on the margin[3], it doesn't exclude some aspect of this in being part of a necessary condition for important wins like an international treaty, if that's what is currently being advocated for. It is also worth mentioning because I think this is likely to be a gap / weakness in existing EA approaches).
  1. ^

    this is applicable to a weaker extent even in the event that international agreements on indefinite moratorium on new large training runs passes, if you see AI as a potential equalising force or if you think first movers might be worried about this

  2. ^
  3. ^
[comment deleted]7
0
0
[comment deleted]2
0
0
[comment deleted]2
0
0
Curated and popular this week
Echo Huang
 ·  · 6m read
 · 
Summary Reading full research (with a complete reference list) This article examines how voluntary governance frameworks in Corporate Social Responsibility (CSR) and AI domains can complement each other to create more effective AI governance systems. By comparing ISO 26000 and NIST AI RMF, I identify: Key findings: * Current AI governance lacks standardized reporting mechanisms that exist in CSR * Framework effectiveness depends on ecosystem integration rather than isolated implementation * The CSR ecosystem model offers valuable lessons for AI governance Main issues identified: 1. Communication barriers between governance and technical implementation 2. Rapid AI advancement outpacing policy development 3. Lack of standardized metrics for AI risk assessment Recommendations: 1. Develop standardized AI risk reporting metrics comparable to GRI standards 2. Create sector-specific implementation modules while maintaining baseline comparability 3. Establish clear accountability mechanisms and verification protocols 4. Build cross-border compliance integration  Understanding ISO 26000: A Model for Effective Policy Ecosystems The Foundation and Evolution of ISO 26000 ISO 26000, established in 2010, represents one of the most comprehensive attempts at creating a global framework for social responsibility. Its development involved experts from over 90 countries and 40 international organizations, creating a global standard. Unlike narrower technical frameworks, ISO 26000 takes a holistic approach to organizational accountability, recognizing that an organization's social and environmental impact directly affects its operational effectiveness. What makes ISO 26000 particularly interesting is its ecosystem integration. The framework doesn't operate alone - it's part of a sophisticated web of interconnected standards, reporting mechanisms, and regulatory requirements. This integration isn't accidental; it's a deliberate response to the limitations of volunt
Omnizoid
 ·  · 9m read
 · 
Crossposted from my blog which many people are saying you should check out!    Imagine that you came across an injured deer on the road. She was in immense pain, perhaps having been mauled by a bear or seriously injured in some other way. Two things are obvious: 1. If you could greatly help her at small cost, you should do so. 2. Her suffering is bad. In such a case, it would be callous to say that the deer’s suffering doesn’t matter because it’s natural. Things can both be natural and bad—malaria certainly is. Crucially, I think in this case we’d see something deeply wrong with a person who thinks that it’s not their problem in any way, that helping the deer is of no value. Intuitively, we recognize that wild animals matter! But if we recognize that wild animals matter, then we have a problem. Because the amount of suffering in nature is absolutely staggering. Richard Dawkins put it well: > The total amount of suffering per year in the natural world is beyond all decent contemplation. During the minute that it takes me to compose this sentence, thousands of animals are being eaten alive, many others are running for their lives, whimpering with fear, others are slowly being devoured from within by rasping parasites, thousands of all kinds are dying of starvation, thirst, and disease. It must be so. If there ever is a time of plenty, this very fact will automatically lead to an increase in the population until the natural state of starvation and misery is restored. In fact, this is a considerable underestimate. Brian Tomasik a while ago estimated the number of wild animals in existence. While there are about 10^10 humans, wild animals are far more numerous. There are around 10 times that many birds, between 10 and 100 times as many mammals, and up to 10,000 times as many both of reptiles and amphibians. Beyond that lie the fish who are shockingly numerous! There are likely around a quadrillion fish—at least thousands, and potentially hundreds of thousands o
Garrison
 ·  · 7m read
 · 
This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race to build Machine Superintelligence. Consider subscribing to stay up to date with my work. Wow. The Wall Street Journal just reported that, "a consortium of investors led by Elon Musk is offering $97.4 billion to buy the nonprofit that controls OpenAI." Technically, they can't actually do that, so I'm going to assume that Musk is trying to buy all of the nonprofit's assets, which include governing control over OpenAI's for-profit, as well as all the profits above the company's profit caps. OpenAI CEO Sam Altman already tweeted, "no thank you but we will buy twitter for $9.74 billion if you want." (Musk, for his part, replied with just the word: "Swindler.") Even if Altman were willing, it's not clear if this bid could even go through. It can probably best be understood as an attempt to throw a wrench in OpenAI's ongoing plan to restructure fully into a for-profit company. To complete the transition, OpenAI needs to compensate its nonprofit for the fair market value of what it is giving up. In October, The Information reported that OpenAI was planning to give the nonprofit at least 25 percent of the new company, at the time, worth $37.5 billion. But in late January, the Financial Times reported that the nonprofit might only receive around $30 billion, "but a final price is yet to be determined." That's still a lot of money, but many experts I've spoken with think it drastically undervalues what the nonprofit is giving up. Musk has sued to block OpenAI's conversion, arguing that he would be irreparably harmed if it went through. But while Musk's suit seems unlikely to succeed, his latest gambit might significantly drive up the price OpenAI has to pay. (My guess is that Altman will still ma