I recently tried to figure out where DALYs come from.
After a bit of searching, the best I could find was this report on the origin of the metric (the first Global Burden of Disease assessment). The report includes this explanation:

And:

But I'm left with many other questions:
- How many health workers were consulted?
- Were people other than health workers consulted, especially people who have themselves experienced the relevant health issues?
- Were DALY values updated in successive instances of the GBD?
- Are transcripts of any of these "formal exercises" available somewhere?
Ideally, I'd love to find a document/video that covers DALYs in the style of a factory tour video; I want to know what goes into them, who is involved, and what the creation process looks like.
Does anyone know of such a resource, and/or the answers to any of my questions?
You can check out the methodology of calculating the most recent dataset (2019). It seems quite legitimate: internationally shared data, Bayesian modeling, compliance with the Guidelines for Accurate and Transparent Health Estimates Reporting (GATHER), etc.
I wonder if any methods/assumptions/biases were carried over from the earlier study that you share. The main bias can be of omission, since health can be a relatively insignificant influence of one's wellbeing. For example, I found (Categorized tab, q4) that only 1/30 slum residents wanted Health to change the most but 8/28 wanted to live 0 additional years (q16). So, people can be healthy (have high QALY) but suffer (low WALY). The dataset can be accurate.
This focus bias can be due to the priority perceptions of the researchers in 1996 (who may have valued health, perhaps since subjective wellbeing improvements were not as readily possible?) in combination with the experimenter bias of the context experts (e. g. due to authority dynamics in these contexts).
Yes, for the YLL estimates they combined different datasets to find accurate causes of death disaggregated by age, sex, location, and year. There should be little bias since data is objective and 'cleaned' using relevant expert knowledge. The authors
- Used vital registration (VR)[1] data and combined them with other sources if these were incomplete (2.2.1, p. 22 the PDF)[2]
- Disaggregated the data by "age, sex, location, year GBD cause" (p. 32 the PDF) and made various adjustments for mis-diagnoses and mis-classifications, noise, non-representative data,
... (read more)