Hide table of contents

Hi! I'm Kat Steiner - a few of you may have met me at EA Global London recently. I'm a librarian at the University of Oxford, so I spend a lot of time working with people on how to find literature (books, journal articles, reports) in their chosen area, how to organise it, and how to reference it correctly. After the conference, I realised that these are useful skills for the EA community to have as well, and I'm more than willing to teach them! I usually take several hours in a practical class doing this so this is my attempt to distil just the fundamentals into a relatively readable blog post.  

Disclaimer: I am not an academic, and EA covers a broad range of topics. I'm sure many of you will have favourite sources which I will inevitably fail to mention. None of this is going to be a comprehensive list! So please share any databases and websites that you couldn't do without in the comments for everyone to see, and I hope that some of the sources I do include are new to you and worth checking out.

Second disclaimer: I am based in Oxford, so I am more familiar with the databases that Oxford subscribes to. If you are affiliated with another university, it is worth seeing if your library has guides (often called LibGuides) on the databases you have access to.

 

The TL;DR:

1) What are you going to search for?

This involves breaking your question into concepts, thinking about the relevant importance of each one, coming up with synonyms, related concepts, broader and narrower terms, alternate spellings, that sort of thing. It feels like unnecessary work but it will save you time (and missing important content) in the long run.

2) Where are you going to search for it?

Think about how much time you have - it's usually worth using at least 2 subject-specific databases like ArXiv, PubMed, Web of Science, Scopus. 

You can also leverage Google's search algorithm to search within domains like .gov or .gov.uk, for useful PDF reports.

3) How are you going to search for it?

How does each website work? Is there an advanced search you can use for the words you came up with in 1? Can you filter by useful things like date or language? Do articles come with keywords, tags, or a thesaurus of useful terms?

4) Managing your PDFs, citations and referencing

Reference management software can save you a lot of time if you're writing long-form academic work or for publication. It will do all the pesky formatting of your references to a particular style, keep track of what you cite in your work as you go along, and even help you manage your PDFs. But it won't read the stuff for you, and you will have to do some data cleanup.

5) How libraries and librarians can help you!

Even if you're not part of a university, a local academic library may still be of use. You can often pay a small fee to be able to go there and use their electronic subscriptions (for non-commercial use), which means better databases and access to a lot more journal articles.

Librarians can also help, either in person, or by writing LibGuides - try searching Google for some of those on your area of interest.

Ok, let's get down to the details.

6) An attempt at a not-at-all comprehensive list of sources of literature

What it says on the tin.

 

What are you going to search for?

We're going to take a concrete example. You have a vague idea what you want to know about, which is 'the effectiveness of deworming interventions in Africa'. You might go to PubMed Central to search for that as it's medicine-related, but if you just put it in the simple search you get 459 results, way too many to read! And they might not even contain all of the relevant literature - what if the article was about Kenya but didn't mention Africa?

It's important to think about your terms - what synonyms might there be? What broader or narrower terms could you use? Do you get too many results or too few? How important are your concepts - do you really want articles that mention deworming in the full text but not the abstract, or the title? Are there alternative spellings for some of the concepts? Globalisation vs. globalization, labour vs labor are two common ones. Also acronyms are important - consider spelling them out as well as using the abbreviation (QALY, DALY, RCT).

Mind-mapping is great for this sort of work. For the example above, I came up with:

I've split the question into 3 concepts and tried to think of related terms for each one. Then I've grouped them together and thought about how I would logically search for them with OR and AND (these are called Boolean operators). Some databases also allow NOT, but you have to be careful in using it because you can lose relevant papers just because they mention an irrelevant word in passing.

 

Where are you going to search for it?

We're all pretty used to searching Google and perhaps Google Scholar - you plug in a few words and you get millions of results. You scroll down the first half a page, click a few links, and you're away. Fantastic. But what about all the stuff you're missing because it's on the 5th page, or the 50th?

Google will always try to give you as many results as it can, in what it considers the most 'relevant' order. But when you're searching for dry academic literature, this can work against you, and you run the risk of not finding important things, as well as wasting time reading poorly-researched stuff.

Google Scholar is a little different - it's trying to find scholarly articles and citations which match your search. But it will still include a lot of stuff that's completely unrelated to your topic if you're not careful, and it doesn't vet its contents, not does it contain everything ever written. And if you want to search really specifically, its attempts to be clever can work against you.

Subject-specific databases are very different from Google in that they don't do as much of the searching work for you. They won't search for synonyms or related terms, so you need to think about those first - luckily we did that in the previous section. But they are more likely to give you actually relevant results instead of a lot of noise. They are also curated by humans (or in the case of Semantic Scholar, machine-learning techniques), which means that someone has tried to work out what an article is all about, even if it doesn't give many clues in the title. Some databases tag articles with subjects or keywords drawn from a controlled vocabulary to help you, and some like Web of Science will tell you where an article has been cited later (although never comprehensively).

See the end of this article for a list of some databases you might want to try searching in, as well as good sources of data and reports.

 

How are you going to search for it?

Now we need to think of a way to search for these concepts. It's generally best to look for an advanced search option. That will tell you what search techniques are available - sometimes you can narrow down by date, language, and so on. Often they look a lot like this:

So I might build my search up in PubMed with: 

deworm* OR de-worm* OR "intestinal worm*" or "soil-transmitted helminth*"

I'm using * to say that I don't care what comes after the start of the word - this picks up things like deworming or worms

I'm also using the " marks to say I want to find a whole phrase and not the separate words.

And I'm using OR in capitals to say I want to match at least one of these terms because they're synonyms (sort of).

I'd probably also choose to search for these in the title, because I do actually want my article to be about deworming. I do that with the drop-down menu.

Then on the next line, I add my next concept, Africa. I don't want to lose things that don't mention Africa in the title or abstract if they mention other countries, so I'll search all fields for 

Africa OR African OR Africans

I could use the * again but PubMed gives me an error - it found too many options for words starting Africa, so I pick the ones I care the most about.

I type this into the second line down so the concepts are linked by AND, because I want to find something to do with deworming AND something to do with Africa.

Then I move on to my third concept. I might search in the abstract for

effective OR effectiveness OR "cost-effective*" OR "cost effective*" OR "cost-benefi*"

Having done that search I find 25 results. Much more manageable! It's worth doing a few more searches with different combinations of title, abstract, full text, just to see if there are some I'm missing, but that's a really good start.

Obviously, PubMed isn't going to have all the articles ever on deworming, so you might want to try a different database like Web of Science (if you have it through your university) which covers more of the social sciences as well - doing that quickly gave me 50 results so you do get different things.

Other tips and tricks

Some databases have a thesaurus and a set of keywords for each article - this can be manually done by experts or via machine learning like Semantic Scholar. These are great if you've missed an important synonym or bit of jargon from the field. You're unlikely to find a perfect thesaurus term for each of your concepts, but you can use a combination of your own terms and those from a thesaurus to good effect.

Web of Science does lots of fancy work on citations - you can see who has cited a paper later. Google Scholar also does this for free so it's useful to check that out if you think you've found a seminal paper. To trace citations backwards, look at the list of references as you're reading and try and find them.

Being clever when searching Google

Even Google has an advanced search option! After you've run a search it's under 'Settings'. You can use quote marks " to search for particular phrases or filter by region, language, date.

You can also use it to search a particular domain - say you want to know what the UK government has written about deworming - you can put .gov.uk in the 'site or domain' box (only one domain at a time, sadly). You can dictate that your search terms appear in the title of the page, or the url, not just somewhere in the text. You can also restrict the file type, so if you're looking for reports, PDFs would be a good bet.

Here is a video of me demoing some of these techniques:

 

 

Reference management

Here is a teaser of what reference management software can help you achieve:

If you are thinking of writing content for publication, or really any long-form academic work, you should be thinking about how you're going to keep track of and reference what you read. It's easy to lose track of where ideas came from and you don't want to be accused of plagiarism down the line, or waste lots of time having to search for things you read months ago all over again.

Reference management software does some of this work for. The two main free options are Mendeley and Zotero, and the two main paid ones are Endnote and Refworks. If you use LateX to typeset then you may be familiar with BibTeX - you can also use the other types of software even if you are using LaTeX.

Most of the choice between them is personal preference and whether or not you want to pay (or you have institutional access). They all do broadly the same thing, so I will keep things fairly generic, but with a Mendeley flavour, as that's what I know most about.

You can use browser plugins like the Mendeley Web Importer to add details of what you're reading in your browser (webpages, online news sites, PDFs of journal articles) automatically to your library. 

You can use your software as a way to organise your PDFs, either by saving them all to one folder and getting the software to add details of anything in there into your library, or by having the software automatically rename your PDFs using a particular schema.

You can use plugins for Word, Pages, etc. to help you correctly reference - find the item you want to cite in your library and it will automatically insert the reference in your text and in the Reference List at the end. If you need to correct anything about the citation (page numbers, year, authors), you can refresh your document and it will automatically make the corrections!

Some versions of this software will allow you to share libraries and documents (with limited cloud storage capacity usually) with other people so this can be helpful when you are writing collaboratively.

Of particular relevance to EA: you may be working on something really interdisciplinary and want to submit it to several different journals. These journals will all have their own ways of referencing, and these vary widely between disciplines. For example, philosophy uses footnotes and endnotes, while social sciences mostly use in-line citations - this is where you would say something like "Librarianship is an interesting degree to study (Jenkins, 2014)," and then both have a reference list at the end. And the references will have slightly different formatting - italics, where to put the full stops, how many authors to include if there are loads. You don't want to be doing all of that by hand if each journal has different requirements - instead, your reference management software will do it all with one click just by selecting a different citation style.

Cautionary note: This software isn't going to solve all your problems. You do still have to do a fair amount of manual cleanup on your library of citations - if you put rubbish in, you'll get rubbish out in your reference list. Sometimes browser plugins can't pick stuff up from a secured PDF and you have to type the details of the journal article in yourself. But overall, I think anyone planning to write anything for publication should definitely consider getting to grips with Zotero or Mendeley - it will save you time in the long-run!

 

Where can you go for more help

If you're part of a university, your librarian! I do one-to-one tutorials on these things all the time, and big classes for all our new graduate students. Your library almost certainly does too.

The internet! There are loads of resources written by librarians called LibGuides which are available free to read. Oxford's one on reference management software is here http://libguides.bodleian.ox.ac.uk/reference-management These are also fantastic sources of places to search - here are some on EA-related areas:

If you're not affiliated with a university, but you live in a big university town, you should check what membership options there are for independent researchers. For example, in Oxford, it's pretty cheap to get a reader card for the Bodleian Libraries, and then you can go in and access all of their online resources (except a few legal databases). More information is here: http://www.bodleian.ox.ac.uk/using/getting-a-readers-card

This is something for non-profits to consider as well: it may be worth factoring in a regular bit of money and time for someone to go and sit in a library and run some literature searches - you'll often get better results with a subject-specific database and you can download PDFs of the articles to read later using the library's paid subscriptions. (Be aware that this usually isn't allowed if you're a commercial business, but otherwise it's fine as long as you're not obviously trying to download the entirety of JSTOR in one go).

The Bodleian even offers a scanning service where if they only have a print copy of an older item, you can have a scan of a chapter or article within 24 hours for £2 if you have a reader card. So you wouldn't even need to send someone to scan it themselves.

 

An attempt at a not-at-all comprehensive list of sources of literature

[Some of these are completely free, some have some free and some paid-for content, and some are subscription-only. See above for my suggestions on what to do if you don't have a subscription. Some will provide citations but not necessarily full text - these are known as 'bibliographic databases'. The ones in bold are those that I think are the largest - if you are tight on time I would pick a couple of these from your subject area and search them.]

arXiv.org (free) - database of pre-prints (versions of articles before a publisher formatted them) from science, mathematics, computer science, economics, and engineering disciplines

SSRN (free) - the Social Science Research Network - like arXiv but for the social sciences more broadly

PubMed Central (free) - a massive repository of medical science literature

Semantic Scholar (free) - a search engine which allows you to search across various free repositories including arXiv.org and PubMed Central. It uses machine learning to classify papers, giving it some of the advantages of a subject-specific database, although without an advanced search option

RePEc (free) - Research Papers in Economics - a volunteer-run repository for economics pre-prints and papers

NBER (free) - National Bureau of Economic Research - they produce lots of US reports on economics

PhilPapers (free) - a bibliographic database of philosophy papers (not necessarily the full text)

The Existential Risk Research Assessment (free) - a new (and incomplete) bibliography of papers on existential risk, being put together by the Centre for the Study of Existential Risk and crowd-sourced by people like you!

UK Data Service (free) - access to major UK government-sponsored surveys and economic data

World Bank Data Catalog (free) - access to the World Bank's global development data

ICPSR (free) - a huge data archive of social science research data

EThOS (free) - the best resource for UK dissertations and theses. Not all will be available online.

ORA (free) - Oxford's own repository of pre-print papers - many will not be available until after an embargo period of 6 months - 2 years
Many other institutions have their own repositories - if you are looking for a particular paper by an academic, you can try looking there for a copy

JSTOR (subscription) - a huge collection of digitised journal articles covering all subjects

Scopus (subscription) - a major interdisciplinary bibliographic database

Web of Science (subscription) - a major interdisciplinary bibliographic database, including collections like the Social Sciences Citation Index.

Philosopher's Index (subscription) - one of the biggest bibliographic databases of philosophy

EconLit (subscription) - indexes over 120 years of economics literature from around the world

OECD iLibrary (subscription) - the online library of the Organisation for Economic Cooperation and Development including data, reports, articles and books

ACM Digital Library (subscription) - journal articles and conference proceedings from the Association for Computing Machinery

MathSciNet (subscription) - a bibliographic database for the mathematical sciences

PsycINFO (subscription) - a large bibliographic database for psychology

Comments9


Sorted by Click to highlight new comments since:

This looks great! Looking forward to doing a more detailed read when I have more time, but I already see some resources and techniques I wasn't aware of or have failed to fully implement thus far, so this will serve as added motivation and a nice reference.

I find that the archive of Data Is Plural is a great source for data on a wide variety of topics: http://bit.ly/2h3bNzQ

Thanks! I can't recommend Sci-Hub or I might have my librarianship license revoked! But that archive looks really interesting.

People reading this post: If you find work that seems relevant for evaluation by The Unjournal, please suggest it here.

Atm we are focusing on global priorities work that economists and business/social science quants could evaluate and that academics would also appreciate (aiming for rigor)… but we may pivot as the project evolves.

See this early question. (Note: the bounty prize is not active atm but if we put one back in we will try to make it retroactive.)

Thanks for writing this. I agree that reference management is really useful for paper-writing, and I have come across a bunch of these resources repeatedly. I get the impression people vary a bunch in how much they use subject-specific databases and the structured queries. I usually get by pretty well with Google Scholar. I don't encounter too much noise with the machine learning and biology work that I tend to read, although I can imagine they would be super useful if I was publishing a literature review.

The video at the start is a cool blog post structure. I wonder if anyone else will try it...

Thanks! I really didn't want it to be boring and dry, and I'm not on here a lot so I though having a face to put to the blog would help.

How thorough you need to be absolutely depends on what you're working on - obviously if you're writing a literature review for publication you need to do a bit more due diligence than if you're just looking for the next thing to read. I would recommend Semantic Scholar as a more finely-tuned alternative to Google Scholar while still having a lot of free content.

"I would recommend Semantic Scholar as a more finely-tuned alternative to Google Scholar while still having a lot of free content" - any specific ways in which it works better?

I haven't used it in anger yet, but I think Semantic Scholar only searches databases that give you free access to the PDFs - so if you want to know you'll actually be able to click through and read the article, that's an advantage over Google Scholar, which will bring citations which are paywalled or unavailable online as results.

I believe also only searches (fairly) respectable databases like ArXiv and PubMed Central, so you are less likely to get poor-quality results.

Thank you for writing this! The images under 'What are you going to search for?' are not loading.

Thanks for flagging this up - I think I've fixed that now.

Curated and popular this week
 ·  · 5m read
 · 
[Cross-posted from my Substack here] If you spend time with people trying to change the world, you’ll come to an interesting conundrum: Various advocacy groups reference previous successful social movements as to why their chosen strategy is the most important one. Yet, these groups often follow wildly different strategies from each other to achieve social change. So, which one of them is right? The answer is all of them and none of them. This is because many people use research and historical movements to justify their pre-existing beliefs about how social change happens. Simply, you can find a case study to fit most plausible theories of how social change happens. For example, the groups might say: * Repeated nonviolent disruption is the key to social change, citing the Freedom Riders from the civil rights Movement or Act Up! from the gay rights movement. * Technological progress is what drives improvements in the human condition if you consider the development of the contraceptive pill funded by Katharine McCormick. * Organising and base-building is how change happens, as inspired by Ella Baker, the NAACP or Cesar Chavez from the United Workers Movement. * Insider advocacy is the real secret of social movements – look no further than how influential the Leadership Conference on Civil Rights was in passing the Civil Rights Acts of 1960 & 1964. * Democratic participation is the backbone of social change – just look at how Ireland lifted a ban on abortion via a Citizen’s Assembly. * And so on… To paint this picture, we can see this in action below: Source: Just Stop Oil which focuses on…civil resistance and disruption Source: The Civic Power Fund which focuses on… local organising What do we take away from all this? In my mind, a few key things: 1. Many different approaches have worked in changing the world so we should be humble and not assume we are doing The Most Important Thing 2. The case studies we focus on are likely confirmation bias, where
 ·  · 1m read
 · 
Are you looking for a project where you could substantially improve indoor air quality, with benefits both to general health and reducing pandemic risk? I've written a bunch about air purifiers over the past few years, and its frustrating how bad commercial market is. The most glaring problem is the widespread use of HEPA filters. These are very effective filters that, unavoidably, offer significant resistance to air flow. HEPA is a great option for filtering air in single pass, such as with an outdoor air intake or a biosafety cabinet, but it's the wrong set of tradeoffs for cleaning the air that's already in the room. Air passing through a HEPA filter removes 99.97% of particles, but then it's mixed back in with the rest of the room air. If you can instead remove 99% of particles from 2% more air, or 90% from 15% more air, you're delivering more clean air. We should compare in-room purifiers on their Clean Air Delivery Rate (CADR), not whether the filters are HEPA. Next is noise. Let's say you do know that CADR is what counts, and you go looking at purifiers. You've decided you need 250 CFM, and you get something that says it can do that. Except once it's set up in the room it's too noisy and you end up running it on low, getting just 75 CFM. Everywhere I go I see purifiers that are either set too low to achieve much or are just switched off. High CADR with low noise is critical. Then consider filter replacement. There's a competitive market for standardized filters, where most HVAC systems use one of a small number of filter sizes. Air purifiers, though, just about always use their own custom filters. Some of this is the mistaken insistence on HEPA filters, but I suspect there's also a "cheap razors, expensive blades" component where manufacturers make their real money on consumables. Then there's placement. Manufacturers put the buttons on the top and send air upwards, because they're designing them to sit on the floor. But a purifier on the floor takes up
 ·  · 4m read
 · 
[Note: I (the primary author) am writing this entirely in a personal capacity. Funding for the bounty and donations mentioned in this post comes entirely from personal savings and the generosity of friends and family. Colleagues at Open Philanthropy (my employer) reviewed this post at my request, but this project is completely unaffiliated with Open Philanthropy.]   In 2023, GiveWell reported that it received over $250M from more than 30,000 donors, excluding Open Philanthropy. I expect (though haven’t confirmed) that at least $50M of this came from unmatched retail donations, meaning from individuals who don’t work at a company that offers a donation match. I can’t help but hope that there may be some way to offer these donors a source of matching funds that wouldn’t otherwise go toward charitable causes. Over the last couple of years, friends and I have spent >100 hours looking into potential legal, collaborative corporate donation matching opportunities. I think there may be promising ways to partner with corporate donors, but I haven’t found a way forward that I am comfortable with, and I don’t think I’m the best person to continue work on this project. Some donors may be choosing to give through surrogates (friends who work at companies that match donations) without understanding the risks involved. My understanding is that there can be several (particularly if donors send surrogates money conditionally, e.g., by asking them to sign an agreement to give through their company’s match): * The surrogate might inadvertently violate their company’s terms for donation matching. * The surrogate, donor, or company might fail an IRS audit if they don’t correctly report the donations + match. * The surrogate or donor might be sued by the company. * The company might discontinue its matching program and/or claw back funds from recipient nonprofits. “Getting to yes” with a corporate partner in a completely legal, transparent, and good faith way could direct signi