About the Project
The Welfare Alignment Project aims to provide the theoretical and practical foundations for aligning current and future AI systems with the interests of sentient beings, including nonhuman animals and potentially sentient AI systems. The project is housed within NYU’s Center for Mind, Ethics, and Policy (CMEP), and both positions are independent contractor arrangements through CMEP.
Among other project components, we are developing an AI welfare benchmark and an animal welfare benchmark to evaluate the degree to which frontier language models demonstrate an appropriate level of concern for the welfare of potentially sentient AIs and nonhuman animals. The aim will be for each benchmark to follow an LLM-as-judge approach, in which model responses to specific questions and real-world scenarios are evaluated against predefined evaluation dimensions. We will also test whether incorporating welfare-related principles into system prompts improves benchmark performance and whether increased welfare concern negatively impacts model capabilities and safety, and write accompanying benchmarking papers.
Looking to Engage:
Technical Benchmarking Lead
We are looking for a contractor to lead the technical development of the benchmarks, coordinating with the Principal Technical Advisor and serving as the technical lead for the Technical Benchmarking Researcher.
Strong candidates will have demonstrated experience developing and shipping LLM evaluation benchmarks or automated evaluation pipelines end-to-end, including independently designing benchmark architectures, evaluation rubrics, and scoring methodologies. Experience with LLM-as-judge frameworks, system prompt engineering, statistical analysis of model outputs, and technical research writing is expected. Familiarity with AI welfare or animal welfare research is welcome but not required.
Responsibilities include:
- Leading the technical implementation of the AI welfare benchmark and the animal welfare benchmark
- Conducting system prompt modification testing and evaluating its effects on benchmark performance
- Running capabilities and safety evaluations to assess whether welfare-oriented prompts negatively impact model performance
- Contributing to writing the methodology, results, and technical sections of the benchmarking papers
- Coordinating with the project's philosophical leads to ensure the benchmark accurately operationalizes the evaluation dimensions developed in a foundational paper
Details: 4 months, Estimated 20-30 hours/week, Project budgeted for $70-$150/hour (commensurate with experience), independent contractor arrangement through NYU CMEP. Expected start date: May, June, July 2026.
Technical Benchmarking Researcher
We are looking for someone to provide technical support to the Benchmarking Lead across all aspects of benchmark development. This role will receive technical guidance from the Principal Technical Advisor and the Technical Benchmarking Lead.
Strong candidates will have solid software engineering skills and practical experience working with LLM APIs, along with experience in data collection, statistical analysis, and prompt engineering. Candidates should be comfortable writing clean, reproducible code and contributing to technical sections of benchmarking papers. Familiarity with AI welfare or animal welfare research is welcome but not required.
Responsibilities include:
- Implementing and testing the benchmark infrastructure
- Assisting with system prompt modification experiments
- Running capabilities and safety evaluations across multiple frontier models
- Performing statistical analysis of benchmark results
- Contributing to the benchmarking papers
Details: 4 months, Estimated 35-40 hours/week, Project budgeted for $40-$70/hour (commensurate with experience), contractor arrangement through NYU CMEP. Expected start date: May, June, July 2026.
How to Apply
To apply for either or both positions, please fill out the following Expression of Interest form by Thursday April 30 2026, for priority consideration. After that, we will continue reviewing submissions on a rolling basis until the positions are filled.
- Expression of Interest: Technical Benchmarking Lead & Technical Benchmarking Researcher (~15-30 minutes to complete).
The form includes a short technical assessment. If you have questions, feel free to reach out to audrey.lynn.becker@nyu.edu.
