Introduction
When a system is made safer, its users may be willing to offset at least some of the safety improvement by using it more dangerously. A seminal example is that, according to Peltzman (1975), drivers largely compensated for improvements in car safety at the time by driving more dangerously. The phenomenon in general is therefore sometimes known as the “Peltzman Effect”, though it is more often known as “risk compensation”.[1] One domain in which risk compensation has been studied relatively carefully is NASCAR (Sobel and Nesbit, 2007; Pope and Tollison, 2010), where, apparently, the evidence for a large compensation effect is especially strong.[2]
In principle, more dangerous usage can partially, fully, or more than fully offset the extent to which the system has been made safer holding usage fixed. Making a system safer thus has an ambiguous effect on the probability of an accident, after its users change their behavior.
There’s no reason why risk compensation shouldn’t apply in the existential risk domain, and we arguably have examples in which it has. For example, reinforcement learning from human feedback (RLHF) makes AI more reliable, all else equal; so it may be making some AI labs comfortable releasing more capable, and so maybe more dangerous, models than they would release otherwise.[3]
Yet risk compensation per se appears to have gotten relatively little formal, public attention in the existential risk community so far. There has been informal discussion of the issue: e.g. risk compensation in the AI risk domain is discussed by Guest et al. (2023), who call it “the dangerous valley problem”. There is also a cluster of papers and works in progress by Robert Trager, Allan Dafoe, Nick Emery-Xu, Mckay Jensen, and others, including these two and some not yet public but largely summarized here, exploring the issue formally in models with multiple competing firms. In a sense what they do goes well beyond this post, but as far as I’m aware none of t
Organizing good EAGx meetups
EAGx conferences often feature meetups for subgroups with a shared interest / identity, such as "animal rights", "academia" or "women". Very easy to set up - yet some of the best events. Four forms I've seen are
a) speed-friending
b) brainstorming topics & discussing them in groups
c) red-teaming projects
d) just a big pile of people talking
If you want to maximize the amount of information transferred, form a) seems optimal purely because 50% of people are talking at any point in time in a personalized fashion. If you want to add some choice, you can start by letting people group themselves / order themselves on some spectrum. Presenting this as "human cluster-analysis" might also make it into a nerdy icebreaker. Works great with 7 minute rounds, at the end of which you're only nudged, rather than required, to shift partners.
I loved form c) for AI safety projects at EAGx Berlin. Format: A few people introduce their projects to everyone, then grab a table and present them in more detail to smaller groups. This form might in general be used to allow interesting people to hold small low-effort interactive lectures & utilizing interested people as focus groups.
Form b) seems to be most common for interest-based meetups. It usually includes 1) group brainstorming of topics 2) voting on the topics 3) splitting up 4) presentations. This makes up for a good low-effort event that's somewhere between a lecture and a 1-on-1 in terms of required energy. However, I see 4 common problems with this format: Firstly, steps 1) and 2) take a lot of time and create unnaturally clustered topics (as brainstorming creates topics "token-by-token", rather than holistically). Secondly, in ad hoc groups with >5 members, it's hard to coordinate who takes the word and in turn, conversations can turn into sequences of separate inputs, i.e. members build less upon themselves. Thirdly, spontaneous conversations are hard to compress into useful takeaways that can be presented on the whole group's behalf.
Therefore, a better way of facilitating form b) may be:
Step 0 - before the event, come up with a natural way to divide the topic into a few clusters.
Step 1 - introduce these clusters, perhaps let attendees develop the sub-topics. Their number should divide the group into subgroups of 3-6 people.
Step 2 - every 15 minutes, offer attendees to change a group
Step 3 - 5 minutes before the end, prompt attendees to exchange contact info
Step 4 - the end.
(I haven't properly tried out this format yet.)
Recently, I made RatSearch for googling within EA-adjecent webs. Now, you can try the GPT bot version! (GPT plus required)
The bot is instructed to interpret what you want to know in relation to EA and search for it on the Forums. If it fails, it searches through the whole web, while prioritizing the orgs listed by EA News.
Cons: ChatGPT uses Bing, which isn't entirely reliable when it comes to indexing less visited webs.
Pros: It's fun for brainstorming EA connections/perspective, even when you just type a raw phrase like "public transport" or "particle physics"
Neutral: I have yet to experiment whether it works better when you explicitly limit the search using the site: operator - try AltruSearch 2. It seems better at digging deeper within the EA ecosystem; AltruSearch 1 seems better at digging wider.
Update (12/8): The link now redirects to an updated version with very different instructions. You can still access the older version here.
Cool. I'd be interested in tentatively providing this search for free on EA News via the OpenAI API, depending on monthly costs. Do you know how to implement it?
Sorry, I don't have any experience with that.
what you were(/are?) looking for seems closest to https://platform.openai.com/docs/assistants
Hey @Daniel_Friedrich, great efforts and updates, the v2 worked great for me.
The first GPT link shows a blank page for me, can you check/update/clean it? It seems, perhaps, you've posted your private editing page link instead of the publicly usable link?
Done, thanks!
I got access to Bing Chat. It seems:
- It only searches through archived versions of websites (it doesn't retrieve today's news articles, it accessed an older version of my Wikipedia user site)
- During archivation, it only downloads the content one can see without any engagement with the website (tested on Reddit "see spoiler" buttons which reveal new content in the code. It could retrieve info from posts that gained less attention but weren't hidden behind the spoiler button)
I. e. it's still in a box of sorts, unless it's much more intelligent than it pretends.
Edit: A recent ACX post argues text-predicting oracles might be safer, as their ability to form goals is super limited, but it provides 2 models how even they could be dangerous: By simulating an agent or via a human who decides to take bad advice like "run the paperclip maximizer code". Scott implies thinking it would spontaneously form goals is extreme, linking a post by Veedrac. The best argument there seems to be: It only has memory equivalent to 10 human seconds. I find this convincing for the current models but it also seems limiting for the intelligence of these systems, so I'm afraid for future models, the incentives are aligned with reducing this safety valve.