A

alejbo

98 karmaJoined

Comments
3

It’s a good callout that potentially ‘holistic epistemic quality’ could possibly confuse models. I wanted to as concisely as possible articulate the quality we were interested in, but maybe not the best choice. But which result would you like to see replicated (or not) with a more natural prompt?

I’d describe myself as also skeptical of model / human ability here! And I’d agree we are to some extent measuring things LLMs confuse for Quality, or whatever target metric we’re interested in. But I think humans/models can be bad at this, and the practice can still be valuable.

My take is even crude measures of quality are helpful, which is why ea forum uses ‘karma’. And most the time crowdsourced quality scores are not available, e.g. outside of EA forum or before publication. LLMs change this by providing cheap cognitive labor. They’re (informally) providing quality feedback to users all the time. So marginally better quality judgement might marginally improve human epistemics.

I think right now LLM quality judgement is not near the ceiling of human capability. Their quality measures will probably be worse than, for example, ea karma. This is (somewhat) supported by the finding that better models produce quality scores that are more correlated with karma. Depending on how much better human judgement is than model judgement one could potentially use things like karma as tentpoles to optimize prompts, context scaffolding, or even models towards. 

One nice thing about models maybe worth discussing here, is that they are sensitive to prompts. If you tell them to score quality they’ll try to score quality and if you tell them to score controversy they’ll try to score controversy. Both model graded Quality and Controversy correlate with both post karma and number of post comments. But it looks like Quality correlates more with karma and Controversy correlates more with number of post comments. You can see this in the correlations tab here: https://moreorlesswrong.streamlit.app/. So careful prompts should (to an extent*) help with overfitting.