Needless to say, which project ideas one chooses to pursue make a big difference. That's kind of the whole premise of EA. But it seems like there's no well-defined quantitative framework for evaluating what makes a good project idea. This post is my attempt at making one.
Why build a quantitative evaluation framework?
I'm curating a large dataset of ideas from across the internet and need to ensure that the best ideas are populated to the top. My current approach is to use LLMs to evaluate each idea using the quantitative framework and surface the best ideas to the top.
Of course, it's not just about the idea being good; it also must be a good fit for the person trying to execute it. So, based on more metadata, the idea rankings can be personalised, enabling people to discover the best content immediately.
The evaluations won't be perfect, but surely we can at least get it good enough to be able to eliminate obviously bad ideas.
Proposed evaluation framework
For each idea, we ask an LLM to score the idea along each metric. We then compute a weighted sum of all the scores.
Metrics
- Financial Potential: The maximum financial value (in USD) that the project could generate if successful, based on market size and potential revenue/acquisition value.
- Impact Breadth: How many people would be noticeably affected by the idea if executed successfully – from individual impact to global scale.
- Impact Depth: How deeply or significantly the execution would impact the average affected person – from trivial changes to life-altering transformations.
- Impact Positivity: Our confidence in the net positivity of the impact – somewhere between definitely harmful and definitely helpful.
- Impact Duration: How long-lasting the impact would be – from extremely temporary effects to permanent changes that last multiple generations.
- Uniqueness: How unique and innovative the idea is, considering both its novelty and the unique insight behind it – from common tarpit ideas to groundbreaking paradigm shifts.
- Plausibility: How logically sound the idea is, with special attention to identifying fundamentally flawed concepts – from impossible ideas to well-reasoned solutions.
- Implementability: How easy the idea is to implement – from impossible projects to immediately implementable solutions, considering technical, financial, and societal barriers.
- Replicability: How easy it would be for others to replicate the project after successful execution – from nearly impossible to trivially replicable.
- Market Timing: How well-timed the idea is with current market and technology conditions – from bad timing to perfect timing with newly unlocked opportunities.
Example Scales
Here are detailed scales for some of our key metrics to illustrate how we quantify them:
Impact Breadth Scale (0.0-5.0):
- 0.0: Affects <10 people. Example: A personal productivity tool used by a single person.
- 1.0: Affects 10-1,000 people. Example: A specialized tool used by a few hundred professionals in a specific field.
- 2.0: Affects 1K-100K people. Example: A productivity app used by tens of thousands of remote workers worldwide.
- 3.0: Affects 100K-10M people. Example: A popular educational platform used by millions of students globally.
- 4.0: Affects 10M-100M people. Example: A widely adopted communication tool used by tens of millions of people.
- 5.0: Affects 100M+ people. Example: A fundamental technology or platform used by billions of people worldwide.
Implementability Scale (0.0-7.0):
- 0.0: Impossible. The idea violates known laws of physics or requires technology that is purely theoretical. Example: Perpetual motion machine.
- 1.0: Practically impossible. Requires massive resources and breakthroughs in multiple fields. Example: Building a space elevator.
- 2.0: Very Difficult to Implement. Faces major technical, financial, or societal barriers. Example: Developing a vaccine for a major disease.
- 3.0: Moderately Difficult to Implement. Requires substantial effort and expertise from a dedicated team over several years. Example: Developing a new drug.
- 4.0: Somewhat Difficult to Implement. Achievable by a dedicated individual or small team over 1-3 years. Example: Creating a new social media platform.
- 5.0: Implementable with Effort. Achievable by a dedicated individual or small team within 6-12 months. Example: Creating a moderately complex mobile app.
- 6.0: Implementable with Effort. Can be done with common skills within 1-3 months. Example: Creating a simple website or organizing a small community event.
- 7.0: Immediately Implementable. Can be done right away with readily available resources. Example: Starting a social media campaign or writing a blog post.
Additional metadata
For each idea, we also collect the following metadata which would help identify best fit ideas:
- Project Type: One of the following categories that best describes the core nature of the project:
- Digital Product (software, apps, websites, digital games)
- Physical Product (tangible items that can be manufactured)
- Service (B2B or B2C services where the primary offering is human effort)
- Research (scientific, academic, or experimental work)
- Content (media, educational materials, or entertainment)
- Other (projects that don't fit the above categories) - Categories: An array of 5-6 categories that the idea falls into, helping with keyword discovery.
- Skills: A list of 5-15 key skills needed to execute the idea, focusing on specific technical, analytical, or problem-solving skills rather than vague terms or domain names.
- Resources Needed: A list of key resources (2-5 words each) that are prerequisites for execution, focusing on items that might be difficult or expensive (>$1000) to procure for the average person. This excludes human resources or skills.
- Minimum Hours to Execute: The minimum number of human hours required across all collaborators to create a basic version that delivers some value. This helps identify quick wins and low-hanging fruit.
- Estimated Hours to Execute: The total number of human hours needed to fully implement the project in its intended scope. This helps assess the overall resource commitment required.
- Estimated Number of Collaborators: The number of people who would need to be majorly involved in executing the idea. This helps understand the team size and coordination complexity.
Example implementation
Here are some ideas and their corresponding evaluations: auto-tracking code bugs, moral frameworks for future civilisations, vegan fast food, and preventing drunk texting.
Open questions
I'd love to hear your thoughts on the framework – what's problematic here? What's missing?
In particular, a few things I'm not sure about:
- Should there be an explicit
neglectedness
attribute? How would that work? Or is that automatically incorporated into the impact metrics? - Based on some conversations, specificity seems to matter. Vague ideas "solve climate change" score high on most evals but are obviously good. How should this be incorporated?
- Financial potential feels particularly tricky to evaluate in one shot – is there a better way of doing this metric?
If there are any other related resources on the subject, I'd love to look at those too.