WillPearson

I'm taking decision making under deep uncertainty as a base. So being comfortable with making decisions under many view points. So trying to avoid any one dominant view point or analysis paralysis.

WillPearson's Quick takes

WillPearson2mo1

Existential risk

I'm trying to create a website/organisation/community around exploring difficult problems and improving the decisions people make.

I've currently got an alpha website where people can interact with AI in different scenarios and record the decisions and reasoning they make, to inform others.

I'm curious how others would approach this endeavour (I don't have a broad network)

WillPearson's Quick takes

WillPearson3mo*1

So I've been trying to think of ways to improve the software landscape. If we do this it might make traditional software more aligned with human values and it's models for building more advanced systems too.

One piece I've been looking at is software licensing.

Instead of traditional open source, have an easy to get license for a version of software, based on a cryptographic identity. This could make it less frictional to be a bad actor.

This license is checked on startup that it matches the version of the software running (git sha stored somewhere). If it doesn’t the software fails to start. It can also be used by clients and servers to identify each other but does not have to marry up one to one with a person’s identity.

The license is acquired as easily as a let’s encrypt style certificate, but the identity has to part of the reputation system (which might require a fee).

The software might require a license from one of many reputation monitoring systems. So that no monitoring system becomes a single point of failure.

Edit: effective altruism might decide to fund awards for work of software ecosystem engineering with non software engineers as the judges to bring this digital infrastructure to the publics consciousness and incentivise making it understandable as well

WillPearson's Quick takes

WillPearson10mo3

I had an idea for a new concept in alignment that might allow nuanced and human like goals (if it can be fully developed).

Has anyone explored using neural clusters found by mechanistic interpretability as part of a goal system?

So that you would look for clusters for certain things e.g. happiness or autonomy and have that neural clusters in the goal system. If the system learned over time it could refine that concept.

This was inspired by how human goals seem to have concepts that change over time in them.

WillPearson's Quick takes

WillPearson11mo3

I've got an idea for a business that could help biosecurity by helping stop accidental leaks of data to people that shouldn't have it. I'm thinking about proving the idea with personal identifiable information. Looking for feedback and collaborators.

Will AI R&D Automation Cause a Software Intelligence Explosion?

WillPearson1y4

My expectation is that software without humans in the loop evaluating it, will Goodhart's law itself and over fit to the metrics/measures given.

WillPearson's Quick takes

WillPearson1y0

My blog might be of interest to people

Fractal Governance: A Tractable, Neglected Approach to Existential Risk Reduction

WillPearson1y1

Here is a blog post also written with Claudes help that I hope to engage with home scale experimenters with

Share AI Safety Ideas: Both Crazy and Not

WillPearson1y2

I appreciate your views on space and AI working with ML systems in that way might be useful.

But I think that I am drawn to the base reality a lot because of threats to that from things like gamma ray bursts or aliens. These things can only be represented probabilistically in simulations because they are out of context. The branching tree explodes with possibilities.

I agree that we aren't ready for agents , but I would like to try to build time non-static intelligence augmentation as slowly as possible. Starting with building systems to control and shape them tested out with static ML systems. Then testing them with people. Then testing them inside simulations etc

Share AI Safety Ideas: Both Crazy and Not

WillPearson1y2

I find your view of things interesting. A few questions, how do you deal with democracy when people might be inhabiting worlds unlike the real one and have forgotten the real one exists?

I think static AI models lack corrigibility, humans can't give them instruction on how to change how to act, so they might be a dead end in terms of day to day usefulness. They might be good as scientists though as they can be detached from human needs. So worth exploring.

WillPearson

Posts 12

Comments96

Posts
12

Comments
96