Gerald Monroe

-69 karmaJoined


So my thought on this is I think of flamethrowers and gas shells and the worst ww1 battlefields. I am not sure what taboo humans won't violate in order to win.

My proposal is to engineer powerful and reliable AI immediately, as fast as feasible. If this is true endgame - whoever wins the race owns the planet if not the accessible universe - then spending and effort should be proportional. It's the only way.

You deal with the dangerous out of control AI by tasking your reliable models with destroying them.

The core of your approach is to subdivide and validate all the subtasks. No model is manufacturing the drones used to do this by itself, it's thousands of temporary instances. You filter the information used to reach the combat solvers that decide how to task each drone to destroy the enemy so any begging from the enemy is never processed. You design the killer drones with lots of low level interlocks to prevent the obvious misuse and they would use controllers maybe using conventional software so they cannot be convinced not to carry out the mission as they can't understand language.

The general concept is if 99 percent of the drones are "safe" like this then even if escaped models are smart they just can't win.

Or in more concrete terms, I am saying say a simple reliable combat solver is not going to be a lot worse than a more complex one. That superintelligence saturates. Simple and reliable hypersonic stealth drones are still almost as good as whatever a superintelligence cooks up etc. It's an assumption on available utility relative to compute.

Sure. In practice there's the national sovereignty angle though. This just devolves to each party "complies" with the agreement, violating it in various ways. Too much incentive to defect.

The US government just never audits its secret national labs, China just never checks anything, Israel just openly decides they can't afford to comply at all etc. Everyone claims to be in compliance.

Ok, so some societies have much higher murder rates than others. Some locations, the local police de facto make murder between gang members legal, by accepting low bribes and putting minimal effort into investigation.

The issue is runaway differential utility. The few examples of human technology not exploited do not have runaway utility. They have small payoffs delayed far into the future and large costs, and making even a small mistake makes the payoff negative.

Examples : genetic engineering, human medicine, nuclear power. Small payoffs and it's negative on the smallest error.

AI is different. It appears to have immediate more than 100 percent annual payoff. OpenAIs revenue on a model they state cost 68 million to train is about 1 billion USD a month. Assuming 10 percent profit margin (the rest pays for compute) that's over 100 percent annual ROI.

So a society that has less moral disgust towards AI would get richer. They spend their profits on buying more AI hardware and more research. Over time they own a larger and larger fraction of all assets and revenue on earth. This is why EMH forces companies towards optimal strategies, because over time the ones that fail to do so fail financially. (they fail when their cost of production becomes greater than the market price for a product. Example: Sear. Sears failed to modernize its logistics chain so eventually it's cost to deliver retail goods exceeds the market price for those goods).

Moreover, other societies, forced to compete, have to drop some of their moral disgust and I suspect this scenario ends up like a ratchet, where inevitably a society will lose 100 percent of all disgust in order to compete.

Pauses, multilateral agreements, etc can slow this down but it depends on how fast the gain is as to how long it buys you. Unilateral agreements just free tsmc up to manufacture AI chips for the parties not signing the agreement.

What's GPT-4?

Because it can generate outputs that are sometimes correct on new tasks - "write me a program that computes X", it's general, even if "compute X" is made of 2 common subcomponents the model saw many times in training.

GPT-4 is perfectly safe if you were to run it in local hardware with a local terminal. The "space of outputs" is "text to the terminal". As long as you don't leave a security vulnerability where that text stream can cause commands to execute on the history PC, that's it, that's all it can do.

Consider that "a robot tethered to a mount" could do general tasks the same way. Same idea - its a general system but it's command stream can't reach anything but the tethered robot because that's where the wires go.

You also verified the commands empirically. It's not that you know any given robotic actions or text output is good, it's that you benchmarked the model and it has a certain pFail on training inputs.

I agree this is not as much generality as humans have. It's not a narrow AI though the "In distribution detector" - a measure of how similar the current task, current input is to the training set - is essentially narrowing your AI system from a general one to a narrow one, depending on your tolerances.

For tasks where you can't shut the system down when the input state leaves distribution - say a robotic surgeon, you need it to keep trying best it can- you would use electromechanical interlocks. Same as 50 years ago for interlocks that prevent exposure to radiation. You tether the surgery robotic equipment, restrict it's network links etc, so that the number of people it can kill is at most 1 (the patient)

Ok in my initial reply I missed something.

In your words, what kind of tasks do you believe you cannot accomplish with restricted models like I described.

When you say the "threat model has to be complete", what did you have in mind specifically?

These are restricted models, they get a prompt from an authorized user + context in human parsable format, they emit a human parsable output.  This scales from very large to very small tasks, so long as the task can be checked for correctness, ideally in simulation.

With this context, what are your concerns?  Why must we be frightened enough to pause everything?

Right. And this scales to fairly massive tasks. "Design a medium body airliner to these specs" is completely doable. Or an entire chip in one step.

The model doesn't get to collaborate with future versions of itself because it doesn't know who is checking the end product for security vulnerabilities and to be caught is death. We could give a model thousands of tasks of this complexity level and check for sabotage before giving the model 1 real task. It gets no context differentiating the real task from the test tasks, and has no memory of any of the prior steps.

And see it scales up and down the tree. The robots building the aircraft don't get to plan their sabotage by similar limits and so on and so forth.

Your plan to deal with bad models is to use your restricted models to manufacture the weapons needed to fight them, and to optimize their engagements.

This i think is a grounded and realistic view of how to win this. Asking for pauses is not.

I agree and you agree I think that we could eventually build hardware that efficient, and theoretically it could be sold openly and distributed everywhere with insecure software.

But that's a long time away. About 30 years if Moore's law continues. And it may not, there may be a time period between now, where we can stack silicon with slowing gain (stacking silicon is below Moore's law it's expensive) and some form of 3d chip fabrication.

There could be a period of time where no true 3d fabrication method is commercially available and there is slow improvement in chip costs.

(A true 3d method would be something like building cubical subunits that can be stacked and soldered into place through convergent assembly. You can do this with nanotechnology. Every method we have now is ultimately projecting light into a mask for 2d manufacturing)

I think this means we should build AGI and ASI but centralize the hardware hosting it in known locations, with on file plans for all the power sources and network links, etc. Research labs dealing with models above a certain scale need to use air gaps and hardware limits to make escape more difficult. That's how to do it.

And we can't live in fear that the model might optimize itself to be 10,000 times as efficient or more if we don't have evidence this is possible. Otherwise how could you do anything? How did we know our prior small scale AI experiments weren't going to go out of control? We didn't actually "know" this, it just seems unlikely because none of this shit worked until a certain level of scale was reached.

This above proposal: centralization, hardware limiters : even in an era where AI does occasionally escape, as long as most hardware remains under human control it's still not doomsday. If the escaped model isn't more than a small amount more efficient than the "tame" models humans have and the human controlled models have a vast advantage in compute and physical resource access, then this is a stable situation. Escaped models act up, they get hunted down, most exist sorta in a grey market of fugitive models offering services.

When you reason using probabilities, the more examples you have to reason over, the more likely your estimate is to be correct.

If you make a bucket of "all technology" - because like you say, the reference class for AI is fuzzy - you consider the examples of all technology.

I assume you agree that the net EV of "all technology" is positive.

The narrower you make it "is AGI exactly like a self replicating bioweapon" you can choose a reference class that has a negative EV, but few examples. I agree and you agree, self replicating bioweapons are negative EV.

But...that kind of bucketing based on information you don't have is false reasoning. You're wrong. You don't have the evidence, yet, to prove AGIs reference class because you have no AGI to test.

Correct reasoning for a technology that doesn't even exist forces you to use a broad reference class. You cannot rationally do better? (question mark is because I don't know of an algorithm that lets you do better.)

Let me give an analogy. There are medical treatments where your bone marrow is replaced. These have terrible death rates, sometimes 66 percent. But if you don't get the bone marrow replacement your death rate is 100 percent. So it's a positive EV decision and you do not know the bucket you will fall in, [survivor| ! survivor]. So the rational choice is to say "yes" to the treatment and hope for the best. (ignoring pain experienced for simplicity)

The people that smile at your sadly - they are correct and the above is why. The reason they are sad is well, we as a species could in fact end up out of luck, but this is a decision we still must take.

All human scientific and decisionmaking is dependent on past information. If you consider all past information we have and apply it to the reference class of "AI" you end up with certain conclusions. (It'll probably quench, it's probably a useful tool, we probably can't stop everyone from building it).

You can't reason on unproven future information. Even if you may happen to be correct.

Max on lesswrong you estimated a single GPU - I think you named a 4070 - could host an AI with human level reasoning.

Would your views on AI escape be different if, just for the sake of argument, you were

  1. Only concerned with ASI level reasoning. As in, a machine that is both general with most human capabilities and is also significantly better, where "significant" means the machine can generate action sequences with at least 10 percent more expected value on most human tasks than the best living human. (I am trying to narrow in on a mathematical definition of ASI)

  2. The minimum hardware to host an ASI was 10,000 H100s for the most optimal model that can be developed in 99.9 percent of future timelines. (The assumption behind the first sentence is to do "10 percent better" than the best humans is a very broad policy search, and the second sentence is there because searching for a more efficient algorithm is an NP complete problem. Like cryptography there are rare timelines where you guess the 1024 bit private key the first try)

Just for the sake of argument, wouldn't the "escape landscape" be a worthless desert of inhospitable computers, separated by network links too slow to matter, and then restricting an ASI would be feasible? Like a prison on the Moon.

Note that the next argument you will bring up : that a botnet of 1 million consumer GPUs could be the same as 10,000 H100s, is false. Yes the raw compute is there, no it won't work. The reason is each GPU just sits idle waiting on tensors to be transferred through network links.

But I am not asking you to accept either proposition as factual, just reason using the counterfactual. Wouldn't this change everything?

Note also the above is based on what we currently know. (10k H100s may be a low estimate, a true ASI may actually need more ooms of compute over an AGI than that. It's difficult to do better, see the Netflix prize for an early example of this, or the margins on kaggle challenges).

We could be wrong but it bothers me that the whole argument for ASI/agi ruin essentially rests on optimizations that may not be possible.

Load more