It seems if we can't make the basic versions of these tools well aligned with us, we won't have much luck with future more advanced versions.
Therefore, all AI safety people should work on alignment and safety challenges with AI tools that currently have users (image generators, GPT, etc).
Agree? Disagree?
Some researchers are working on making real world models more aligned, and they either work on the cutting edge (as you suggest here), or maybe on something smaller (if their research is easier to start on a smaller model, maybe).
Some researchers work on problems like Agent Foundations (~ what is the correct mathematical way to model agents, utility functions, and things like that), and I assume they don't use actual models to experiment with (yet).
Some researchers are trying to make tools that will help other researchers.
And there are other directions.
You can see many of the agendas here:
(My understanding of) What Everyone in Technical Alignment is Doing and Why
I think this is one reasonable avenue to explore alignment, but I don't want everybody doing it.
My impression is that AI researchers exist on a spectrum from only doing empirical work (of the kind you describe) to only doing theoretical work (like Agent Foundations), and most fall in the middle, doing some theory to figure out what kind of experiment to run, and using empirical data to improve their theories (a lot of science looks like this!).
I think all (or even a majority of) AI safety researchers moving to doing empirical work on current AI systems is unwise, for two reasons:
The first one is the biggy. I can imagine this approach working (perhaps inefficiently) in a world were (1) were false and (2) were true, but I can't imagine this approach working in any worlds where (1) holds.
Agree that some could. Since you brought it up, how would you align image generators? They're still dumb tools, do you mean align the users? Add safety features? Stable Diffusion had a few safeguards put in place, but users can easily disable them. Now it's generating typical porn and as well as more dangerous or harmful things, I suspect, but only because people are using it that way, not because it does that on its own. So yeah do you want Stable Diffusion source code to be removed from the web? I second the motion, lol.