EA Forum Bot Site
EA Forum

Hide table of contents

Comment Permalink

Answer by AidanGothAug 07, 202015

I found the answers to this question on stats.stackexchange useful for thinking about and getting a rough overview of "uninformative" priors, though it's mainly a bit too technical to be able to easily apply in practice. It's aimed at formal Bayesian inference rather than more general forecasting.

In information theory, entropy is a measure of (lack of) information - high entropy distributions have low information. That's why the principle of maximum entropy, as Max suggested, can be useful.

Another meta answer is to use Jeffreys prior. This has the property that it is invariant under a change of coordinates. This isn't the case for maximum entropy priors in general and is a source of inconsistency (see e.g. the partition problem for the principle of indifference, which is just a special case of the principle of maximum entropy). Jeffrey's priors are often unwieldy, but one important exception is for the interval $[0, 1]$ (e.g. for a probability), for which the Jeffrey's prior is the $beta (1 / 2, 1 / 2)$ distribution. See the red line in the graph at the top of the beta distribution Wikipedia page - the density is spread to the edges close to 0 and 1.

This relates to Max's comment about Laplace's Rule of Succession: taking N_v = 2, M_v = 1 corresponds to the uniform distribution on $[0, 1]$ (which is just beta(1,1)). This is the maximum entropy entropy distribution on $[0, 1]$ . But as Max mentioned, we can vary N_v and M_v. Using Jeffrey's prior would be like setting N_v = 1 and M_v = 1/2, which doesn't have as nice an interpretation (1/2 a success?) but has nice theoretical features. Especially useful if you want to put the density around 0 and 1 but still have mean 1/2.

There's a bit more discussion of Laplace's Rule of Sucession and Jeffrey's prior in an EA context in Toby Ord's comment in response to Will MacAskill's Are we living at the most influential time in history?

Finally, a bit of a cop-out, but I think worth mentioning, is the suggestion of imprecise credences in one of the answers to the stats.stackexchange question linked above. Select a range of priors and seeing how much they converge, you might find prior choice doesn't matter that much and when it does matter, I expect this could be useful for determining your largest uncertainties.

Showing 3 of 5 replies (Click to show all)

MaxRa

Sep 22 2020

Thanks a lot for the pointers! Greaves' example seems to suffer the same problem, though, doesn't it? We have information about the set and distribution of colors, and assigning 50% credence to the color red does not use that information. The cube factory problem does suffer less from this, cool! I wonder if one should simply model this hierarchically, assigning equal credence to the idea that the relevant measure in cube production is side length or volume. For example, we might have information about cube bottle customers that want to fill their cubes with water. Because the customers vary in how much water they want to fit in their cube bottles, it seems to me that we should put more credence into partitioning it according to volume. Or if we'd have some information that people often want to glue the cubes under their shoes to appear taller, the relevant measure would be the side length. Currently, we have no information like this, so we should assign equal credence to both measures.

AidanGoth

Sep 22 2020

I don't think Greaves' example suffers the same problem actually - if we truly don't know anything about what the possible colours are (just that each book has one colour), then there's no reason to prefer {red, yellow, blue, other} over {red, yellow, blue, green, other}. In the case of truly having no information, I think it makes sense to use Jeffreys prior in the box factory case because that's invariant to reparametrisation, so it doesn't matter whether the problem is framed in terms of length, area, volume, or some other parameterisation. I'm not sure what that actually looks like in this case though

MaxRaSep 23 20201

Hm, but if we don't know anything about the possible colours, the natural prior to assume seems to me to give all colors the same likelihood. It seems arbitrary to decide to group a subsection of colors under the label "other", and pretend like it should be treated like a hypothesis on equal footing with the others in your given set, which are single colors.

Yeah, Jeffreys prior seems to make sense here.

See in context

[ Question ]

What are some low-information priors that you find practically useful for thinking about the world?

by Linch

Aug 7 20201 min read9 answers 1