Hide table of contents

A post covering the topic and content of the author's current career as a data scientist at the Centers for Disease Control and Prevention (CDC).  

Context: As per You should write about your job and Writing about your job is (still) great — consider doing it, forum-users have offered information on their careers, which are typically highly relevant to EA goals, alongside their thoughts on the work they've done through the career; in a similar fashion, I am providing information on my current job doing data science in epidemiology, as I think sharing my experiences in this regard may benefit the EA community. 

Readership: This post might be particularly valuable to the following people (NB: this is by no means an attempt at an exhaustive list): graduate or undergraduates students in STEM fields or with degrees in STEM (particularly Biology, Computer Science, Statistics, or Physics), other data scientists, those who are interested in epidemiology, or those who are interested in or work in biosecurity or pandemic preparedness. 

Disclaimer: Some of the information I provide is deliberately vague, so that I can avoid being identified (I highly value online anonymity). Forgive me for this—if there is something more you would like to know, my DMs are open (I usually respond within 48 hours). 


Since I am discussing life as a Data Scientist, I cannot help but instantiate a quote I read not too long ago in Writing about my job: Data Scientist by Gavin (I would appreciate it if someone could provide additional information about the quote): 

Data Scientist: Person who is worse at statistics than any statistician & worse at software engineering than any software engineer.

—Will Cukierski


Application Process

What was the application process like? How long did it take from application to start?

Historically, I have found Biosecurity and AI Safety to both be highly important cause-areas. Using 80K Hours in early-mid 2022, I found a listing for Data Scientist and other positions at the CDC.  Give my experience with statistical modeling, the positive expected value of this position in biosecurity, and my life-needs, I opted to apply to the Data Scientist position. Following the link, I ended up on USAJOBS, where some additional searching produced another related position. The positions were junior (GS-11) and senior (GS-13) data scientist roles at a newish Center within the CDC. 

In late May 2022, I sent in my application, which consisted of my CV and some other personal information (there was not a test task at this stage). There was a delay with one of my applications: the application for the senior role was received in mid-July 2022. It was not until early October that I received a notice indicating that I was referred for the junior but not the senior role, which I believe is reasonable, given my personal background (see next section). After this notice, I had to wait until January for further hiring activity: I received an email from someone in the Center who asked when I was next available for an interview. 

Over the month of January, I had the interview, which lasted roughly 45 minutes, if I recall correctly, and did not consist of any concrete problem solving, but did contain some descriptive questions[1]. This occurred a week or so before I was given a coding test task (this meant that I passed the interview and had moved into the next stage; I had one week to complete the test task) that was flexible (I could choose which programming language I used) and maybe of medium-LeetCode difficulty overall (the task had sections; these sections were of varying difficult). I was notified that I passed the test task via email and also received more information on the offer. I then accepted the offer. 

Timeline:

  • [0 months post-apply] May 2022: Apply
  • [5 months post-apply] October 2022: Referred
  • [8 months post-apply] January 2023: Interview & Test Task
  • [10 month post-apply] March 2023: Start
  • I've been working here now for 6 months

Personal Background

What did my life look like before I began this job? How do I believe my past experiences were helpful with getting and performing the job?

The following points roughly constitute my background at the time of applying [A] and interviewing [I]. If I write [A, I]  I mean to indicate that I had AND gained experience in the category at the time of applying and interviewing, respectively. 

  • Bachelor's degree in Math & other stem field [A]
  • ~3.5 years of statistical modeling  [A, I]
  • ~4.5 years of general coding experience [A, I]
  • One research work (DL in biology) published [A]
  • ~4 years of EA activities (local EA org.) [A]
  • ~1.5 years of forecasting experience + code [A, I]
  • 3 research assistantships (intra- and extra-my-school) during college [A]
  • 3-month part-time paid job at EA-org. [I]
  • 3-month part-time unpaid volunteering at other EA-org.

From my conversations later on with those who interviewed me, the strongest signals of my competence that I believe made me a particularly attractive candidate in their eyes was the publication and the forecasting experience (some of this forecasting experience was in epidemiology, but not through formal channels). 

The statistical modelling and coding experience I had along with my performance on the coding test task were lower-level filters for making sure that I would, in expectation, perform the job adequately. I do not think I would have been referred for the position I am now in had there not been a steady background-noise of coding and statistical modeling in my life (college played a large hand in this). 

I have not been able to gauge my colleagues perceptions of EA, but I think that my experiences with the EA-orgs. and EA in general did not seem to make much of a difference (partially because the work I performed in those organizations did not involve much mathematical modeling), but if it did, I would expect my experiences to have been slightly beneficial. 

I am not sure where EA-optics, on average, stand at the moment, and also do not have an accurate mental-model of the distribution for how those-in-academia-adjacent orgs. perceive EA, so I am not confident how the"EA" frame that encapsulated some of my past work affected my hiring prospects. 

Possible helpful lesson: In applying to EA-orgs., really make sure the work of the org. is calibrated with what you roughly expect much of your future time to be spent on — do not just [apply, interview, work] there solely or mostly for the reason that the org. is an EA-org. Content > Topic. 


Job Content

What typically occurs on the job at the daily, weekly, and monthly intervals? 

Time & Place

I have to be on-call for the same 4 hours each work day. The other 4 hours I can fill in between 6AM and 11PM. Place = Remote. 

Meetings & Presentations

If we are to think about Meetings & Presentations by the week, then in the first 6 months of my work, the situation looks roughly like this:

  • Monday: [0-2 months] ~30m  [2-4 months] ~1h [4-6 months] ~1h
  • Tuesday: [0-2 months] ~30m [2-4 months] ~1.5h [4-6 months] ~1.5h
  • Wednesday:  [0-2 months] ~30m [2-4 months] ~30m  [4-6 months] ~45m 
  • Thursday: [0-2 months] ~45m [2-4 months] ~45m  [4-6 months] ~1.5h 
  • Friday: [0-2 months] ~1h [2-4 months] ~2h  [4-6 months] ~2h 

Estimated weekly average hours spent in meetings and presentations:

  • [0-2 months]: 3.25 hours 
  • [2-4 months]: 5.75 hours
  • [4-6 months]: 6.75 hours

There is a set of recurring meetings that I have each week that do not frequently change in nature and are fairly important to the actual content of the job. Then there are presentations on different tools, on standard practices, and on bureaucratic procedures and presentations from external groups that both occur infrequently but are usually much longer (~1.5-2h). 

Coding

The majority of my time not in meetings or presentations is spent in VSCode writing Python, Julia, Stan, or R (ordered by time spent writing) code for epidemiological models (infectious disease modeling). Most of my work to date has involved building a novel mechanistic model for influenza based off of a fairly common model architecture (I do not want to be too specific here). Much of the time I spend coding involves support actions, such as learning how a particular package works, debugging, writing tests, documenting the code, etc...  Most of the code I am writing does not reinvent the wheel, i.e., for the most part, I am not designing and implementing custom algorithms. The brunt of the coding skills I've picked come from being exposed to Python and Julia packages and protocols I had not been previously familiar with.  

Learning 

My job has been very good with providing opportunities to spend time learning (e.g., writing scratch-code, doing recommended textbook exercises, reading papers, using some new computational tool). Maybe, on average, 10% of my non-meeting-presentation time each week is spent learning something to improve my capabilities as an epidemiological data scientist. Although this has yet to be implemented in my Center, there are likely going to be formal learning-workplans where employees can take part-time online courses. In terms of what I've learned, I'd say that I've come to better understand the major algorithms for inference (HMC, MCMC, NUTS) from a mathematical standpoint, dynamical systems, downstream coding practices (building production ready tools), and epidemiological processes in general.   


Wage & Benefits

I receive GS-11 pay which is around 60k USD (unadjusted) annual in 2023. The adjustment is based on which geographic region you live in; my adjusted pay is around 80k USD. My contract automatically converts my position to GS-12 after 1 year of work. There was a sign on bonus of roughly 8k USD. 

There are numerous government health and savings benefits I received from working at the CDC. I will not list them here, but for more information see here.


Hopefully some of you found this post helpful — regardless, have a nice day. 

Notes

  1. ^

    e.g., I was asked (paraphrasing) "How familiar are you with Bayesian Inference?"; upon saying that I was familiar, I was asked to describe or define Bayesian Inference. For the (I believe) 2 occasions where I did not have experience with the concept, I answered honestly that I did not have much experience with it; I think this honesty was appreciated by the interviewers (there were 2 interviewers). 

Comments5


Sorted by Click to highlight new comments since:

Very interesting thank you. I was surprised to read that recruiters valued non-formal forecasting experience highly. Do you have this on your CV? If so, how do you phrase it in a way that's legible to lay-people?

I should have chosen a clearer phrase than "not through formal channels". What I meant was that my much of my forecasting work and experiences came about through my participation on Metaculus, which is "outside" of academia; this participation did not manifest as forecasting publications or assistantships (as would be done through a Masters or PhD program), but rather as my track record (linked in CV to Metaculus profile) and my GitHub repositories. There was also a forecasting tournament I won, which I also linked on the CV. 

Thanks for sharing your experience! That seems unusually long interview process. Was most of it dormant period? How long did it take to hear back after interview and the programming assessment?

The dormant period occurred between applying and getting referred for the position, and between getting referred and receiving an email for an interview. These periods were unexpectedly long and I wish there had been more communication or at least some statement regarding how long I should expect to wait. However, once I had the interview, I only had to wait a week (if I am remembering correctly) to learn if I was to be given a test task. After completing the test task, it was around another week before I learned I had performed competently enough to be hired.   

Hello,

Thank you immensely for sharing your experience with me; it's been incredibly helpful, especially since I wasn't sure what to expect from the interviews. Also, a belated congratulations on securing the job!

Could you possibly provide a bit more detail about your 45-minute interview? Any additional insights you could share would be greatly beneficial in helping me prepare for my own.

Curated and popular this week
 ·  · 13m read
 · 
Notes  The following text explores, in a speculative manner, the evolutionary question: Did high-intensity affective states, specifically Pain, emerge early in evolutionary history, or did they develop gradually over time? Note: We are not neuroscientists; our work draws on our evolutionary biology background and our efforts to develop welfare metrics that accurately reflect reality and effectively reduce suffering. We hope these ideas may interest researchers in neuroscience, comparative cognition, and animal welfare science. This discussion is part of a broader manuscript in progress, focusing on interspecific comparisons of affective capacities—a critical question for advancing animal welfare science and estimating the Welfare Footprint of animal-sourced products.     Key points  Ultimate question: Do primitive sentient organisms experience extreme pain intensities, or fine-grained pain intensity discrimination, or both? Scientific framing: Pain functions as a biological signalling system that guides behavior by encoding motivational importance. The evolution of Pain signalling —its intensity range and resolution (i.e., the granularity with which differences in Pain intensity can be perceived)— can be viewed as an optimization problem, where neural architectures must balance computational efficiency, survival-driven signal prioritization, and adaptive flexibility. Mathematical clarification: Resolution is a fundamental requirement for encoding and processing information. Pain varies not only in overall intensity but also in granularity—how finely intensity levels can be distinguished.  Hypothetical Evolutionary Pathways: by analysing affective intensity (low, high) and resolution (low, high) as independent dimensions, we describe four illustrative evolutionary scenarios that provide a structured framework to examine whether primitive sentient organisms can experience Pain of high intensity, nuanced affective intensities, both, or neither.     Introdu
 ·  · 7m read
 · 
Article 5 of the 1948 Universal Declaration of Human Rights states: "Obviously, no one shall be subjected to torture or to cruel, inhuman or degrading treatment or punishment." OK, it doesn’t actually start with "obviously," but I like to imagine the commissioners all murmuring to themselves “obviously” when this item was brought up. I’m not sure what the causal effect of Article 5 (or the 1984 UN Convention Against Torture) has been on reducing torture globally, though the physical integrity rights index (which “captures the extent to which people are free from government torture and political killings”) has increased from 0.48 in 1948 to 0.67 in 2024 (which is good). However, the index reached 0.67 already back in 2001, so at least according to this metric, we haven’t made much progress in the past 25 years. Reducing government torture and killings seems to be low in tractability. Despite many countries having a physical integrity rights index close to 1.0 (i.e., virtually no government torture or political killings), many of their citizens still experience torture-level pain on a regular basis. I’m talking about cluster headache, the “most painful condition known to mankind” according to Dr. Caroline Ran of the Centre for Cluster Headache, a newly-founded research group at the Karolinska Institutet in Sweden. Dr. Caroline Ran speaking at the 2025 Symposium on the recent advances in Cluster Headache research and medicine Yesterday I had the opportunity to join the first-ever international research symposium on cluster headache organized at the Nobel Forum of the Karolinska Institutet. It was a 1-day gathering of roughly 100 participants interested in advancing our understanding of the origins of and potential treatments for cluster headache. I'd like to share some impressions in this post. The most compelling evidence for Dr. Ran’s quote above comes from a 2020 survey of cluster headache patients by Burish et al., which asked patients to rate cluster headach
 ·  · 2m read
 · 
A while back (as I've just been reminded by a discussion on another thread), David Thorstad wrote a bunch of posts critiquing the idea that small reductions in extinction risk have very high value, because the expected number of people who will exist in the future is very high: https://reflectivealtruism.com/category/my-papers/mistakes-in-moral-mathematics/. The arguments are quite complicated, but the basic points are that the expected number of people in the future is much lower than longtermists estimate because: -Longtermists tend to neglect the fact that even if your intervention blocks one extinction risk, there are others it might fail to block; surviving for billions  (or more) of years likely  requires driving extinction risk very low for a long period of time, and if we are not likely to survive that long, even conditional on longtermist interventions against one extinction risk succeeding, the value of preventing extinction (conditional on more happy people being valuable) is much lower.  -Longtermists tend to assume that in the future population will be roughly as large as the available resources can support. But ever since the industrial revolution, as countries get richer, their fertility rate falls and falls until it is below replacement. So we can't just assume future population sizes will be near the limits of what the available resources will support. Thorstad goes on to argue that this weakens the case for longtermism generally, not just the value of extinction risk reductions, since the case for longtermism is that future expected population  is many times the current population, or at least could be given plausible levels of longtermist extinction risk reduction effort. He also notes that if he can find multiple common mistakes in longtermist estimates of expected future population, we should expect that those estimates might be off in other ways. (At this point I would note that they could also be missing factors that bias their estimates of