Hi Gregory --
There’s so much to respond to that I’m going to split this up into three parts. Hopefully this will clarify things.
(Preface) On the innate problem of the logarithmic transform
Something that’s getting persistently and rather confusing lost is the fact that the log transform has inherent and enormous problems in interpretation, when using it in a linear model, alongside non-transformed variables. A regular linear parameter is describing an additive effect — a log-transformed parameter is describing a multiplicative effect. This means that reporting the estimate for the transformed variable right next to the estimate of an untransformed variable, without emphasizing the difference, is much worse than comparing apples and oranges. It’s more like selling apples, and *calling* them oranges. But this is how the WHR reports its results. In the 2022 WHR, the plots of “contributions” are not marked as LOG gdp; they’re just marked as gdp, and they are placed alongside linear parameters as though they are directly comparable. They are not. I consider this, alone, a “critical failure,” and is one of the central reasons I title my paper that way.
Another way to put this is that log gdp showing up as 40% of model variation is not because that model actually thinks gdp is hugely important — it is a *numerical illusion*, based on innately incomparable quantities. And yet, “GDP” (NOT log gdp) is put at the very head of the 2023 report, as THE variable that matters. The model simply does not support this.
Of course you can do a more responsible job of presentation, but it’s unavoidable that you’ve made your job harder for yourself. In ranking priorities, the goal is to rank things of equal nature, and the log creates a single element that is of an inherently different nature.
But all of this is preamble to the question you raise. My point is, given all the trouble it can cause, there’s got to be a VERY good reason to use the log. So — is there?
On the inappropriateness of the log transform in this context
Unfortunately, your graphs are a fundamentally incomplete, and ultimately misleading, view of the system, because satisfaction and GDP are simply not the only two variables involved. I might even say that accounting for the other variables is the point of the whole exercise. But the important part is that in this particular case, they makes all the difference.
The approach you suggest is misleading because 1. it doesn’t take all the relevant variables into account, and 2. it assumes a strong form for GDP based on a critically limited view of the data. But it’s easy to solve both of these problems. This can be done by fitting a Generalized Additive Model (GAM), with the full set of conditioning variables, and a non-parametric spline on the GNI term. This will give it the freedom to be log-shaped if the data supports that -- but also any other shape, if that is a better fit. So I fit this model, I take the shape formed by the non-parametric component, and I fit two lines to it: a straight one, and a logarithmic one. They look like this:
(The degree of the spline, or the extent of the waviness, can of course be controlled manually, but this is the degree chosen algorithmically by the model. To force it to curve less will of course bias things even more in the direction of the linear model, because it would be forcing the fit to be simpler.)
Because I don’t want to make definitive mathematical judgments just by looking at things, I assess the models statistically — and the linear one just fits better. R^2 = 0.92, vs. 0.85. They’re close, sure, which I think is entirely consistent with a visual assessment — but log certainly doesn’t dominate, and in fact, is numerically worse, again, like in the first model. So the log is hard to interpret — this part is unavoidable — and empirically, *it’s just numerically worse.* To me, that’s open and shut. I see no reason, whatsoever, to use it, unless you’re actually *trying* to inflate the effect of GDP by 1100%.
Now — if you still look at this and, for whatever reason, absolutely refuse to accept any model without a non-linear effect — then fine, we can still make one of those, and we can also do it without making the strong, unnecessary, and practically-incomparable assumptions about the form of the relationship.
Since you’re inherently making the claim that there are fundamental differences in the dynamics of GDP in low-GDP countries vs high-GDP countries, then let’s just look at those countries separately! Looking at the raw plot of satisfaction ~ gdppc (untransformed), we can see a clear elbow at around $5,000. If your eyeball says $10k, that’s fine too, I tried both. But that’s clearly the approximate point where the effects of gdp *seem* to change — based on exactly the kind of visual analysis you’re using. To sketch the basic idea, you might imagine breaking the mass of data down into the following two approximate linear trends:
So then we fit models to each of these subsets of data, in which we now have some visible reason to believe that the sat ~ gdp/ni relationship actually *is* linear — and you get a contribution of 1.2% of model variation for the poor side, and 5.8% for the rich side. Which average out to 3.5%, where my pooled model estimated 2.5%. Yes the slope is steeper on the poor side — about twice and a half times as steep — but in neither case does GDP suddenly come anywhere near to looking like the dominant force, or the only way to salvation. Which again, just shouldn’t even be a surprise, because the *apparent* 40% contribution in the WHR model — and I really cannot stress this enough — *is a numerical illusion.*
Want three tranches? By all means. It’s the same. The log is profoundly misleading, utterly unnecessary, and quite simply not the right choice for this data.
On your assertion that my search space is “wrong”
To clear up what appears to be a surprisingly confident misconception, I do have log gdp available for the search algorithm. It does not make the final cut, because in addition to being of an intrinsically different nature than the rest of the variables, it simply does not produce a better model fit, statistically, whether you choose to believe this or not. But the other variables chosen are not influenced by my “wrong” search space, because that is not actually the search space I use. With my apologies, I’m really not sure why you think otherwise.
To be precise, the search with log gdp as a candidate still returns cities, and two variables on education, and public financing, and unemployment, and shadow government, and power for women (though labor market instead of political), and clean water, and gay and lesbian acceptance — along with lifespan, social support, freedom, and GNI, which is still selected 100% of the time, *even though log gdp is also in the model.* Though even *before* it’s removed, log gdp still comes out as less important than social support, once you actually properly specify the model.
Though I’m also not entirely sure why this is even relevant. I find a better model — period. It fits better, it predicts better, there are no serious diagnostic issues, and it doesn’t include log gdp. If you can find a better model than that, have at it — but if you can’t, I’m not sure how bias in the search process would make a difference. It is a simple constructive proof, in which the product of the search just performs better than what’s being currently presented as the international standard.
Thank you for the effort you went to in producing those graphs — and, sincerely, in pushing me to a more thorough defense of my decisions — I hope this helps you see why your approach is not a definitive analysis.
I think I captured my intended meaning fairly well with my ending comment:
"If, as the more accurate model suggests, GDP is only making up a small piece of the total, then that suggests it's far more likely that if a country were to take the same effort that would be required to triple their gdp, and put those resources instead into the other variables, they'd get a far larger return."
If you're asking because you didn't find that convincing though, I'm happy to elaborate.
I think "some experts" is fairly misleading when the experts I'm referring to have multiple econ Nobels, led an international working group on the subject in question, for a study commissioned by the president of a G7 country, and the EU is now continuing that path to construct entirely new national indicator systems based on satisfaction. I'm comfortable saying that's a pretty substantial consensus on the need for alternatives to GDP.
I think your point that a correctly-specified model would have no effect for GDP makes a good deal of sense -- or at best, that GDP could be seen as a sort of residual category for all of the consumption not accounted for by explicit variables. This is also an approach that would unambiguously favor my model over the WHR’s, which reports an enormous effect for GDP. Do you see this as just an error term in their model?
But this idea also seems to conflict pretty directly with your assertion that the model "says nothing at all about the effect of GDP per capita on life satisfaction or happiness." If I've succeed in driving that term to almost zero, then you seem to be suggesting I've captured all the relevant effects, and the WHR hasn't come close.
If you’re saying that GDP only matters to satisfaction via consumption, then there’s still the absolutely enormous question of: consumption of *what*? GDP is about as precise as staying “economic stuff,” so it’s barely a coherent question to even ask how “GDP” affects satisfaction. At the barest minimum, I would say this model clarifies what *parts* of all of the things that are together rolled into the GDP fruitcake are actually counting towards GDP. And this is critical. You shouldn’t be able to build a building, burn it down, and claim you’re helping, because construction counts towards GDP and GDP causes happiness. That’s just a semantic shell game.
But even if you are trying to formulate the model as satisfaction <= consumption <= GDP … you have to deal with the fact that the biggest effect in the model is on social support! That's just not "economic!" You can look at the chosen variables, and see how much of GDP they actually account for, and see that GDP is outright missing several of the largest measured effects, by its inherent definition. So even if it’s only in the negative, or estimating an upper bound, or pushing the question towards clarifying the relationship between water and GDP, I have a pretty hard time seeing how that says “nothing at all” about the relationship between GDP and satisfaction.
The point is not that 1.5 is a large number, in terms of single variables -- it is -- the point is that 2.7x is a ridiculous number.
But 1.5 also isn't such a huge effect within the full scale of what's measured. The maximum value in the data is just over 8. Even something "huge" like 1.5, out of a total of 8, is less than twenty percent. If, as the more accurate model suggests, GDP is only making up a small piece of the total, then that suggests it's far more likely that if a country were to take the same effort that would be required to triple their gdp, and put those resources instead into the other variables, they'd get a far larger return.
Thank you! And, good point, certainly something I should at least put in an appendix. The WHR variables are explained in their statistical appendices, which I link and quote here:
"Social support (or having someone to count on in times of trouble) is the national average of the binary responses (either 0 or 1) to the GWP question 'If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?' "
" Freedom to make life choices is the national average of responses to the GWP question 'Are you satisfied or dissatisfied with your freedom to choose what you do with your life?' "
https://happiness-report.s3.amazonaws.com/2023/WHR+23_Statistical_Appendix.pdf
Re: your girlfriend's insightful comment, research has been done that demonstrates the question used for satisfaction is widely understood, and understand to be different than what we generally think of as "happiness." In fact, trying to predict satisfaction with "positive affect" (a technical name for happiness) gets you an R^2 of only 0.27. By contrast, a model with "social support" alone gets you 0.52! So if they're not even closely correlated, they're definitely not being widely interpreted as the same thing.
I get into this more in the paper, but I argue that if social support really is so central, we really need to have a variety of variables on it, like we do for health and the economy. I use this one because it's the one that's there, and the effect is huge, but I'd really prefer to be able to draw on and analyze multiple aspects of relationships.
I think the biggest danger to that reasoning is the premise that they are caused by GDP, and only by gdp, which I quite flatly dispute. At a minimum, gdp-measurable paths are only one way to achieve these components. For example, you can spend a lot of money on cleaning your water sources -- or, you can make the choice not to destroy your clean water supplies in the first place. One looks "productive" only because it failed to account for the destruction. Of course exactly the same thing can be said of carbon into the atmosphere, leaded gasoline into the brains of children, etc, etc, but I choose water because it's in the model.
Any attempt of a defense of GDP, specifically, needs to take into the account the fact that it's just a deeply flawed measure of value. That's why econ nobelists have been arguing against it for over a decade (and likely much longer, given that whole international reports were being published on it in 2012). So even if it were more predictive than the model suggests, that still wouldn't address the fact it's known to be misleading, all on its own, and not something I would spend a lot of time defending on the merits.
Separately though, if you replace "gdp" with "money" (since they're also very definitely not the same thing) it sounds sort of like you're saying that if people have money, they can just buy anything else they want, thus money is the only thing that matters -- which I could respond to by getting into all of the ways that's just not accurate, such as the fact that a single person can't pay for a 1/1,000,000th fraction of a national clean water system to get clean water for themselves --
But perhaps the most definitive argument against the unique value of gdp is in simple counterexamples. Between 2005 and 2022, Costa Rica had a higher life satisfaction than the United States, with less than a third of the GDPpc. This simply wouldn't be possible, if gdp just bought you happiness. Ergo, that simply cannot be the answer.
I think the first thing to emphasize is that, even when you do include log(g:dp/ni), the measured effect still isn't all that big. It says that you'll get an increase of 1.5 points satisfaction ... if you almost triple gdp! (I.e. multiply it by 2.7, because that's just mechanically what it means when you transform a linear predictor by the natural log.) Since that's either ludicrous or impossible for many countries, there are plenty of cases where it doesn't even make sense to consider. My largest problem has nothing to do with the non-linearities of the log -- if it fit better, great! But 1) it just, simply, numerically, doesn't and 2) the fact that you have to interpret the log in a fundamentally different way than all the non-transformed variables makes it extraordinarily misleading. You get a bigger bump on the graph -- but it's a bigger bump that means something fundamentally different than all of the other bumps (a multiplicative effect, not an additive one). Then when you include it in charts as if it doesn't mean something different, you're floating towards very nasty territory.
It is certainly the case that many of the variables are highly collinear, but there are clearly no obvious close proxies in the list. If I removed log(gdp) but introduced log(trading volume) or something, that would be suspicious -- but you can see all 14 of the variables that are actually in the model. I would have to be approximating log(gdp) with -- water and preschool? The 1,058 variables are searched over, yes -- but then 1,044 of them are rejected, and simple don't enter.
I'm sorry though, I just don't understand your last paragraph. If the true effect needs a log, then the log should account for that effect. And if the effect is properly transformed, I don't understand how a different variable would do a better job of accounting for the variance than the true variable. Happy to discuss if you can clarify though.
Thank you! To the first part of your comment, I certainly hope so -- Nobel-winning economists Amartya Sen and Joseph Stiglitz edited an international report (supported by dozens more famous researchers) all the way back in 2012 saying that GDP was fundamentally deficient, if not broken, as a national guide. Many European countries have started collecting national satisfaction data as part of a way to fix this problem, but I don't know how much it's paid attention to, and I know the US doesn't even collect this data to begin with.
I think the larger point you're making is that there might be dependencies between the discovered variables, with which I absolutely agree. In the same way I think it's dangerous to guess what the right variables are, I think it's dangerous to guess at exactly what the dependencies are, but I do think it's critical to understand these relations better. Still, we certainly can't do that if we don't even get the variables right, so I believe this is at least a first step in an important direction.
The chart you're referring to, for those who aren't looking at the paper, is not a chart of life satisfaction -- it is a chart of carbon emissions. So pretty fundamentally, I don't think the comparison is relevant. But in addition, the outcome (emissions) is also log-transformed, which makes the relationship much more intuitive -- and since both axes are clearly labeled, and contain all relevant information, I don't see any possibility of confusion.
By contrast, the use of log GDP in models of satisfaction appears to only have a potential for confusion, especially when the WHR only mentions this transformation in the technical tables, fails to mention it in their most prominent descriptions, and only transforms this single variable in that way. Furthermore, as I describe in quite substantial detail in the paper, the argument they give to justify this transformation (improved model fit) simply doesn't hold up to statistical scrutiny when you're using a properly specified model. This isn't just a marginal technical complaint either -- the transformation exaggerates the effect by 1100%
I agree that there may be non-linearities in the effect, but this is equally possible with any of the other variables. This is worth exploring, but I think it is much more important to make sure the dramatic potential improvements from simple changes are recognized before getting into more subtle, and far more slippery, models, especially when the current model isn't showing serious issues.
But most critically, the whole goal is to not just assume that we know how life satisfaction works, but rather to let the data tell us. And when the data simply doesn't support a log transform, I'm not going to include it.
Hi Erich, sorry for the delay, and thank you for the very careful response. In
order:
Wrt: “… It seems to me like your model assumes that GDP does not have any casual influence on any of these variables,…”
I don’t actually make any such assumption about GDP, and in fact am completely agnostic (for now) about causal dependencies within the graph. I only make the tentative assumption that every variable listed has some causal effect, direct or indirect, on national satisfaction (ergo, it’s not *all* GDP, which is what you accurately quote me as disputing), based on 1) a thorough search being more likely to exclude spurious causes, and 2) expert knowledge. Water, Shelter, Freedom, Friends, Being Accepted — most of these seem pretty unimpeachable. Beyond that I’m actually trying to be especially cautious about proposing particular dependencies because based on my experience with causal systems of even moderate size, the pattern of influences is likely to be spectacularly complicated, and unintuitive. This has certainly been borne out by all of my early explorations with causal discovery tools.
(As an aside, I am very interested in these questions, and continuing to work on them, but my first goal is simply to start with the right set of variables. I think progress on this itself could be a huge improvement over what I currently understand to be the globally accepted standard.)
Wrt “But if you were to find empirically that GDP causes something we do care about, …”
That feels like a reasonably fair description of the arguments with which I’m familiar, but I think there are at least two important nuances. The most simple is that GDP can have not just limited utility, but also horrific externalities — most obvious among them, global warming. It's essentially your point (2), but with the emphasis that what's left out can actually be more powerful, and worse, than what's left in. In other words, even if GDP can cause satisfaction *in the short term,* satisfaction itself actually leaves out the very important question of the future. That’s an inherent shortcoming of the model, but an important strike against the concept. I go into this more in the paper.
The other is that I see “GDP” as practically too vague to be applicable for intervention. You might estimate a causal effect for “GDP,” but that might only be because *one* of the thousand things within the concept actually makes a difference. Then when you go to intervene on a different one of the thousand things, because you identify it is part of “GDP,” you just don’t get the same effect — essentially, because your variables weren’t precisely defined enough. So I’m happy to talk about how economic processes might play critical roles, but I don’t feel comfortable talking about “water” and “the entire economy” as if they have equivalent structural validity. At a minimum, one of them is much more vulnerable to bad accounting practices.
Wrt “But I don't think OP or anyone else in this comment section is saying that GDP/wealth/money is the only thing that influences life satisfaction,…”
I do agree with you, and agree I misread A. de Vries position. Though, while I don’t think anyone has said explicitly that they think GDP is the *only* cause of satisfaction, there have also been almost no explicit proposals of anything that *does* cause satisfaction, *apart* from GDP — so I may have been reading too much between the lines there, but my trying to get some distance from the concept is really driven by a confusion that it’s the only variable we’re talking about. Still, I could have expressed that more cogently.