Announcing Scry: A research tool over arXiv, EA Forum, etc. w/ SQL + Vibe/Vector Algebra

exopriors

Announcing Scry: A research tool over arXiv, EA Forum, etc. w/ SQL + Vibe/Vector Algebra

exopriors

3 min readDec 26, 2025

Comments 12

Sorted by

New & upvoted

EdoArad🔸

6mo

I'm curious about how representative the EA forum is to the EA community, particularly in regards to worldview. One thing you could try is to take the EA community surveys done by RP, and estimate how a random representative of "the forum" would answer, where the random representative might be a random user or a random comment/post weighted by karma or amount of text

exopriors

6mo*

I fed your message to Scry! Seems the forum is significantly more longtermist.

Clara Torres Latorre 🔸

6mo

99% karma-weighted of tagged posts about AI seems wrong

if you check the top 4 posts of all time, the 1st and 3rd are about FTX, the 2nd about earning to give and the 4th about health, totalling > 2k karma

might want to check for bugs

exopriors

6mo*

You're so right, I should have caught that. The database data is good, but Opus 4.5 did a rare error. I've since updated the response.

EdoArad🔸

6mo

Im still in awe of current AI capabilities

exopriors

6mo*

It's hard to fathom what's straightforwardly possible already.

david_reinstein

6mo

Trying this out for using for various Unjournal.org processes (like prioritizing research, finding potential evaluators, linking research to pivotal questions) and projects (assessing LLM vs human research evaluations). Some initial forays (comming from a conversation with Xyra). I still need to human-check it.

~prompt to Claude code about @Toby_Ord and How Well Does RL Scale?

``Toby Ord's writing -- what do the clusters look like? What other research/experts come closest to his post .... https://forum.effectivealtruism.org/posts/TysuCdgwDnQjH3LyY/how-well-does-rl-scale``

This interactive visual (with some extra prompts)

Key Takeaways

For “How Well Does RL Scale?”, the closest intellectual neighbors are:
ML researchers focused on compute efficiency and scaling laws (e.g. 1a3orn, Pablo Villalobos), and
- AI safety researchers analyzing inference scaling implications (e.g. Ryan Kidd, ryan_greenblatt).
Stuart Armstrong appears to be the only researcher in this corpus writing substantively across all three of Toby Ord’s main intellectual themes:
- AI scaling
- Existential risk / longtermism
- Infinite ethics and decision theory

Most Similar Researchers to“How Well Does RL Scale?”

Top Authors (by Semantic Similarity)

Author	Distance	Key Work
1a3orn	0.143	Parameter Scaling Comes for RL, Maybe; New Scaling Laws for LLMs
Pablo Villalobos	0.150	Trading off compute in training and inference
Matrice Jacobine	0.168	Does RL Really Incentivize Reasoning Capacity in LLMs?
Lukas Finnveden	0.169	Before smart AI, there will be many mediocre or specialized AIs
ryan_greenblatt	0.179	What’s going on with AI progress and trends?
Ryan Kidd	0.188	Implications of the inference scaling paradigm for AI safety

@1a3orn @Pablo Villalobos @Matrice Jacobine🔸🏳️‍⚧️ @Lukas Finnveden @Ryan Greenblatt @Ryan Kidd -- if you have a chance, let me know if this is accurate/relevant.

Most Similar arXiv Paper

Paper	Authors	Distance
The Art of Scaling RL Compute for LLMs	Khatri et al.	0.125
Webscale-RL: Automated Data Pipeline for Scaling RL	Cen et al.	0.142
AReaL: Large-Scale Asynchronous RL for Language Reasoning	Fu et al.	0.142
Does RLHF Scale?	Hou et al.	0.162

Toby Ord’s Writing Clusters

Based on semantic analysis, Toby Ord’s writing appears to fall into three main thematic clusters.

Cluster 1: AI Scaling & Timelines (Recent)

“How Well Does RL Scale?” (Oct 2025)
- Core claim: RL training scales poorly; most gains come from inference scaling (longer chains of thought), not training compute.
- Key implication: This likely lengthens AI timelines and affects governance and safety strategies.

Cluster 2: Existential Risk & Longtermism

The Precipice and related work
“The Precipice Revisited” (Jul 2024)
“Shaping Humanity’s Longterm Trajectory” (Jul 2023)
“Robust Longterm Comparisons” (May 2024)

Topics include:

Existential risk estimation
Discounting
Anthropic effects / anthropic shadow

Cluster 3: Infinite Ethics & Decision Theory

“Evaluating the Infinite” (Sep 2025) — hyperreal-based approach to divergent sums and integrals
“What’s wrong with the dogmatic solution to recklessness?” (Feb 2023)
“The Ethics of Copying Conscious States” (Aug 2025)

Topics include:

Pascal’s Mugging
Unbounded utility
Decision-making under infinity

Matrice Jacobine🔸🏳️‍⚧️

6mo

This was a linkpost, I didn't write that paper.

david_reinstein

6mo

Thanks, helpful to understand the limitations of this tool

exopriors

6mo*

It's important to mention that this project has received $0 in funding from EAIF, LTFF, Manifund, and other EA/EA-adjacent orgs. That tells you A LOT about the epistemic inclination and technological orientation of funders in the community.

david_reinstein

6mo*

Did a decent job for this academic paper, but I think it’s hampered by only having content from Arxiv and various EA/tech forums. Still, it generated some interesting leads.

https://gistcdn.githack.com/daaronr/b9447c40a7a6b948f399073496f98c37/raw/scanner_elasticity_experts.html

Prompt:

... find the most relevant authors and work for Observational price variation in scanner data cannot reproduce experimental price elasticities https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4899765 -- we're looking for methodological experts to evaluate this for The Unjournal to inform our pivotal question "How do plant-based products substitute for animal products (welfare footprint)?"

exopriors

6mo

Update: The service has been improving a lot. Infra is more robust, higher query limits, more sources, a further optimized Claude Prompt with much improved guidance for meaningful embedding vector manipulation. Someone could write some interesting quicktakes from the research you're easily able to do with this.

Comments

Curated and popular this week

Counting animals: Stable population size is not equivalent to priority level

abrahamrowe, mal_graham🔸·1w ago·Curated 4d ago·16m read

AI Use Note: Main body text entirely human written. Claude (Opus 4.8) helped develop models of animal life histories in the appendix. Cross-posted from Good Structures. Executive Summary * Animal advocates sometimes make claims like “there are X of this animal...

113

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·5d ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

How (not) to fundraise from Anthropic staff

Jack Lewars·4d ago·7m read

Adapted from my Substack, Funding Anthropalypse. Short version: if you want a share of the coming Anthropic and OpenAI windfall - the $37bn+ that could be in play next year - the way in is to become 'legibly excellent', so the evaluators and donors that frontier lab staff already trust point them to yo...

david_reinstein

6mo

This interactive visual (with some extra prompts)

Key Takeaways

For “How Well Does RL Scale?”, the closest intellectual neighbors are:
ML researchers focused on compute efficiency and scaling laws (e.g. 1a3orn, Pablo Villalobos), and
- AI safety researchers analyzing inference scaling implications (e.g. Ryan Kidd, ryan_greenblatt).
Stuart Armstrong appears to be the only researcher in this corpus writing substantively across all three of Toby Ord’s main intellectual themes:
- AI scaling
- Existential risk / longtermism
- Infinite ethics and decision theory

Most Similar Researchers to“How Well Does RL Scale?”

Top Authors (by Semantic Similarity)

Author	Distance	Key Work
1a3orn	0.143	Parameter Scaling Comes for RL, Maybe; New Scaling Laws for LLMs
Pablo Villalobos	0.150	Trading off compute in training and inference
Matrice Jacobine	0.168	Does RL Really Incentivize Reasoning Capacity in LLMs?
Lukas Finnveden	0.169	Before smart AI, there will be many mediocre or specialized AIs
ryan_greenblatt	0.179	What’s going on with AI progress and trends?
Ryan Kidd	0.188	Implications of the inference scaling paradigm for AI safety

@1a3orn @Pablo Villalobos @Matrice Jacobine🔸🏳️‍⚧️ @Lukas Finnveden @Ryan Greenblatt @Ryan Kidd -- if you have a chance, let me know if this is accurate/relevant.

Most Similar arXiv Paper

Paper	Authors	Distance
The Art of Scaling RL Compute for LLMs	Khatri et al.	0.125
Webscale-RL: Automated Data Pipeline for Scaling RL	Cen et al.	0.142
AReaL: Large-Scale Asynchronous RL for Language Reasoning	Fu et al.	0.142
Does RLHF Scale?	Hou et al.	0.162

Toby Ord’s Writing Clusters

Based on semantic analysis, Toby Ord’s writing appears to fall into three main thematic clusters.

Cluster 1: AI Scaling & Timelines (Recent)

“How Well Does RL Scale?” (Oct 2025)
- Core claim: RL training scales poorly; most gains come from inference scaling (longer chains of thought), not training compute.
- Key implication: This likely lengthens AI timelines and affects governance and safety strategies.

Cluster 2: Existential Risk & Longtermism

The Precipice and related work
“The Precipice Revisited” (Jul 2024)
“Shaping Humanity’s Longterm Trajectory” (Jul 2023)
“Robust Longterm Comparisons” (May 2024)

Topics include:

Existential risk estimation
Discounting
Anthropic effects / anthropic shadow

Cluster 3: Infinite Ethics & Decision Theory

“Evaluating the Infinite” (Sep 2025) — hyperreal-based approach to divergent sums and integrals
“What’s wrong with the dogmatic solution to recklessness?” (Feb 2023)
“The Ethics of Copying Conscious States” (Aug 2025)

Topics include: