How to do theoretical research, a personal perspective

Mark Xu

Comments 7

Sorted by

New & upvoted

Coming from an undergrad background in social science/liberal arts, I was honestly kind of let down by this discussion of "theory." At least for me, when people are comparing between "empirical" vs. "theoretical" research, one of the key features of theoretical research is that one doesn't get to just "[falsify] their hypotheses by testing them against data" or look for simple counterexamples to their "algorithm."

For example, if I'm trying to determine something like "did the creation/usage of machine guns asymmetrically benefit insurgent forces against counterinsurgents" (my thesis) I can't just go out, grab some data, throw it at the hypothesis, then get a reliable result: there just isn't sufficient granular, reliable, relevant data. (I literally had an entire ~3-page subsection in my thesis on why even a single great case pair seemed inherently unlikely to exist in the recorded history of warfare.)

Thus, most of the "testing" in theoretical research has to be done at the theoretical level, which in my view is distinguished by its reliance on inference (rather than more-deductive arguments), non-mechanical calculations (i.e., in your head or on paper with current methods, which I think ought to change!)—including argument clash, oversimplified simulations, etc. These differences matter often because they involve a lower degree of legibility^[1], transparency, reproducibility/reliability, reputation stake ^[2], etc.—compared to traditional empirical methods like statistical hypothesis testing/randomized controlled trials, engineering prototypes, high-fidelity simulations, direct observation tests (e.g., spectrometry), etc.

Ultimately, I don't think this post tells people how to do that theoretical research; in my perhaps-overly-cynical evaluation, it basically just seems to say "think about the problem until you can figure out how to test it with traditional empirical methods." (A lot of this post literally seems to be about engineering and mathematics problems!)

But perhaps I'm just misunderstanding something; could you elaborate on how you distinguish theoretical vs. empirical research?

^{^}
i.e., ease of understanding arguments, including what is or isn't being assumed.
^{^}
For example, it might be harder to detect that a writer is strawmanning their intellectual opponents or such mischaracterizations might even be welcomed by some readers, whereas fraudulent data in STEM is egregious. (Part of the issue is that people might have more plausible deniability for strawmanning: it's harder to determine that they were acting in bad faith, and/or it might even be hard to determine that they were in fact misrepresenting others' views.)

JulianR

I understood "theoretical" to mean your 'data' comes from thought experiments - eg "How would I, as a counterinsurgent general, use machine guns in creative ways against insurgents?"

I'm a bit confused by your distinction: the question "Did (as opposed to Would) machine guns asymmetrically benefit insurgents?" seems entirely empirical to me. If you can't find reliable data, that just makes it hard, not theoretical.

it basically just seems to say "think about the problem until you can figure out how to test it with traditional empirical methods."

Well, yeah, what else would you expect? The post describes how you might use argument clashes and oversimplified simulations in thinking about the problem.

Marcel2

I'm a bit confused by your distinction: the question "Did […]

If you can't find reliable data, that just makes it hard, not theoretical.

The use of “did” vs “would” wasn’t very intentional or precise.

As to the empirical vs. theoretical nature of my hypothesis, it is indeed claiming that certain relationships empirically existed (and, with a lot of caveats, may continue into the future). However, my point was that the research methods I used were much more “theoretical”: I couldn’t do a large-N empirical analysis or controlled experiments to even establish meaningfully-controlled correlation (let alone causation) between the dependent, control, and independent variables, and instead had to rely on lines of reasoning such as:

Hypothetical scenarios (e.g., imagine comparing an ambush where both parties have machine guns vs. one where neither side has machine guns)—which is impractical to clinically/experimentally test (I.e., with high reality fidelity)
More-qualitative (and somewhat subjective) comparison of case studies, using a large amount of argumentation/theoretical reasoning to deal with the many gaps and flaws in the case comparison (given that, as I noted, there didn’t seem to be any good case comparison pair in the historical record)
Agreement with existing theoretical and/or empirical concepts in the literature, such as Biddle’s Modern System.

it basically just seems to say "think about the problem until you can figure out how to test it with traditional empirical methods."

Well, yeah, what else would you expect? The post describes how you might use argument clashes and oversimplified simulations in thinking about the problem.

Again, perhaps I was being a bit too imprecise with my language? My point is that for some questions (arguably including my thesis), theoretical argumentation has to bear a lot of the analytical burden. This analytical burden can include things like:

Explaining why variables Q, K, and W—none of which you could experimentally control for—probably do or don’t affect the relationship;
Explaining why your very limited sample size can probably be extrapolated to some other cases;
Explaining why some metric is probably a decent proxy for what you actually are trying to measure;
Reasoning about hypothetical scenarios which will not actually empirically occur.

(Caveat: all of those activities can be supported by direct reference to supporting data in some situations, but not always.)

In contrast, it seems that much of the “theoretical” research methods described in this post are basically just “use lots of thinking to figure out how to test this empirically against data [at which point these empirical methods do almost all the legwork.]”

There is perhaps some debate to be had over the meaning of “theoretical” research methods: do mathematical proofs or algorithms count as theory? While I’m not universally opposed to using the term in such a context, I think it is much less helpful to use the term “theory” when you’re trying to juxtapose it with empirical methods. This especially feels true if a major reason you support a mathematical proof or algorithm is based on your determination that “this empirically works every single time.” When teaching research methods, I think it’s important to emphasize the differences that I described previously (e.g., legibility/transparency, reliability/consistency, reputation stake) which, in my view, have tended to make empirical methods so much more effective when they can be used.

Michael_PJ

I think this post applies mainly to mathematical theoretical research, I'm not sure it generalizes well to other kinds of theory work.

Marcel2

If that was the intention, I think the title and content should have more clearly expressed that. I’m unclear on what the significant difference is between theory and empirics in math; I think most of the value in distinguishing between theoretical and empirical research comes from highlighting the inability to simply “use data or counter-examples to falsify/test a hypothesis”—but in this case, that doesn’t apply.

Ben_West🔸

Thanks, this was helpful. I often hear from people who don't really know what it means to do certain types of theoretical research, and I think this will go a decent way towards addressing their uncertainty.

LuisEUrtubey

Great selection of sources on the work of theorizing in the social sciences, if of interest:

https://twitter.com/LarsJohannessen/status/1559221205441323013

https://twitter.com/LarsJohannessen/status/1560169368146903042

Comments

More from the author

Be Specific About Your Career

Mark Xu·5y ago·2m read

'Dropping out' isn't a Plan

Mark Xu·4y ago·1m read

Money Can't (Easily) Buy Talent

Mark Xu·5y ago·13m read

Curated and popular this week

Hard-to-reverse decisions destroy option value

Stefan_Schubert·9y ago·Curated 21h ago·14m read

This post is co-authored with Ben Garfinkel. It is cross-posted from the CEA blog. A PDF version can be found here. Summary: Some strategic decisions available to the effective altruism m...

If you're agentic, work in biosecurity

sharmaayushmaan🔸·5d ago·7m read

Disclaimer: Although I work on the Groups Team at CEA, I’m writing this in a personal capacity, and this post does not constitute an endorsement by CEA. Agency - the realisation that you really can just do things. TL;DR Biosecurity needs people (of any background) who are agentic and have a high execution velocity and track record....

Introducing Impact List: a ranking of philanthropists by expected lives saved

Elliot Olds·1d ago·6m read

TL;DR: I'm releasing a website that ranks philanthropists according to EA principles and research, and allows users to re-rank the list using their own assumptions. I'd like feedback and help making it better. I'd especially like ideas for how to make the results more trustworthy. Funding may be available. I recently built Impact List (impactlist.xyz), a site which ranks people by their positive impact via donations. The goal is t...

Recent opportunities to take action

Marginal Victories: career advising and opportunities for U.S. democracy preservation & political work

Annika Burman 🔸·3d ago·2m read

I'm stepping down as Hive's Executive Director, and we're hiring my successor

SofiaBalderson, Hive·4d ago·3m read

Starting an EA group @ SUNY Binghamton

micahzarin·2d ago·1m read

^{^}

i.e., ease of understanding arguments, including what is or isn't being assumed.

^{^}

For example, it might be harder to detect that a writer is strawmanning their intellectual opponents or such mischaracterizations might even be welcomed by some readers, whereas fraudulent data in STEM is egregious. (Part of the issue is that people might have more plausible deniability for strawmanning: it's harder to determine that they were acting in bad faith, and/or it might even be hard to determine that they were in fact misrepresenting others' views.)

Marcel2

I'm a bit confused by your distinction: the question "Did […]

If you can't find reliable data, that just makes it hard, not theoretical.

The use of “did” vs “would” wasn’t very intentional or precise.

Hypothetical scenarios (e.g., imagine comparing an ambush where both parties have machine guns vs. one where neither side has machine guns)—which is impractical to clinically/experimentally test (I.e., with high reality fidelity)
More-qualitative (and somewhat subjective) comparison of case studies, using a large amount of argumentation/theoretical reasoning to deal with the many gaps and flaws in the case comparison (given that, as I noted, there didn’t seem to be any good case comparison pair in the historical record)
Agreement with existing theoretical and/or empirical concepts in the literature, such as Biddle’s Modern System.

it basically just seems to say "think about the problem until you can figure out how to test it with traditional empirical methods."

Well, yeah, what else would you expect? The post describes how you might use argument clashes and oversimplified simulations in thinking about the problem.

Explaining why variables Q, K, and W—none of which you could experimentally control for—probably do or don’t affect the relationship;
Explaining why your very limited sample size can probably be extrapolated to some other cases;
Explaining why some metric is probably a decent proxy for what you actually are trying to measure;
Reasoning about hypothetical scenarios which will not actually empirically occur.

(Caveat: all of those activities can be supported by direct reference to supporting data in some situations, but not always.)

How to do theoretical research, a personal perspective

How to do research

Figuring out what you want to happen in real-world cases

ELK Examples

Potshot algorithms

Translating what you want in real-world cases into desiderata for simple cases

ELK Example

Articulating an algorithm for solving simple cases

ELK Examples

Finding cases where your algorithm doesn’t do what you want

ELK Examples

Other random tips