Milan Weibel

Copresident @ AIS UC Chile
108 karmaJoined Pursuing an undergraduate degreeSantiago, Región Metropolitana, Chile
weibac.github.io

Bio

Participation
4

CS, AIS, PoliSci @ UC Chile.

Comments
17

  • I think it is very likely that the top American AI labs are receiving substantial help from the NSA et al in implementing their "administrative, technical, and physical cybersecurity protections". No need to introduce Crowdstrike as a vulnerability.
  • The labs get fined if they don't implement such protections, not if they get compromised.

Humans could use AI propaganda tools against other humans. Autonomous AI actors may have access to better or worse AI propaganda capabilities than those used by human actors, depending on the concrete scenario.

I guess this somewhat depends on how good you expect AI-augmented persuasion/propaganda to be. Some have speculated it could be extremely effective. Others are skeptical. Totalitarian regimes provide an existence proof of the feasibility of controlling populations on the medium term using a combination of pervasive propaganda and violence.

Contra hard moral anti-realism: a rough sequence of claims

Epistemic and provenance note: This post should not be taken as an attempt at a complete refutation of moral anti-realism, but rather as a set of observations and intuitions that may or may not give one pause as to the wisdom of taking a hard moral anti-realist stance. I may clean it up to construct a more formal argument in the future. I wrote it on a whim as a Telegram message, in direct response to the claim 

> you can't find "values" in reality.


Yet, you can find valence in your own experiences (that is, you just know from direct experience whether you like the sensations you are experiencing or not), and you can assume other people are likely to have a similar enough stimulus-valence mapping. (Example: I'm willing to bet 2k USD on my part against a single dollar yours that that if I waterboard you, you'll want to stop before 3 minutes have passed.)[1]

However, since we humans are bounded imperfect rationalists, trying to explicitly optimize valence is often a dumb strategy. Evolution has made us not into fitness-maximizers, nor valence-maximizers, but adaptation-executers.

"values" originate as (thus are) reifications of heuristics that reliably increase long term valence in the real world (subject to memetic selection pressures, among them social desirability of utterances, adaptativeness of behavioral effects, etc.)

If you find yourself terminally valuing something that is not someone's experienced valence, then either one of these propositions is likely true:

  • A nonsentient process has at some point had write access to your values.
  • What you value is a means to improving somebody's experienced valence, and so are you now.

     

crossposted from lesswrong

  1. ^

    In retrospect, making this proposition was a bit crass on my part.

In a certain sense, an LLM's token embedding matrix is a machine ontology. Semantically similar tokens have similar embeddings in the latent space. However, different models may have learned different associations when their embedding matrix was trained. Every forward pass starts colored by ontological assumptions, an these may have alignment implications.

For instance, we would presumably not want a model to operate within an ontology that associates the concept of AI with the concept of evil, particularly if it is then prompted to instantiate a simulacrum that believes it is an AI.

Has someone looked into this? That is, the alignment implications of different token embedding matrices? I feel like it would involve calculating a lot of cosine similarities and doing some evals.

Intriguing. Looking forward to the live demo.

PSA: The form accepts a maximum of 10 files, that is, 5 design proposals maximum (because each proposal requires uploading both a .png and a .svg file).

Just for the sake of clarity: I think the word "schism" is inaccurate here because it carries false connotations of conflict.

Hi Jack! 

Have you considered booking a call with 8000 hours career advising? They can help you analyse the factors behind your plans about your future career, and put you in contact with people working in the areas that interest you. 

You could also contact CLR and CRS. If you show knowledge of and interest in their work, they may be eager to help. You can't be sure if you'll get a reply, and that may seem intimidating, but remember that the cost is minimal, EV is high, and how you feel about not getting a reply is at least partly under your control.

While this last point is not specifically focused on s-risks, a very cheap, very valuable, action you can take is subscribing to the AI Safety opportunities update emails at AI Safety Training. Many hackathons advertised there are beginner-friendly.

Load more