All of anithite's Comments + Replies

None of the labs would be doing undirected drift. That wouldn't yield improvement for exactly the reasons you suggest.

In the absence of a ground truth quality/correctness signal, optimizing for coherence works. This can give prettier answers (in the way that averaged faces are prettier) but this is limited. The inference time scaling equivalent would be a branching sampling approach that searches for especially preferred token sequences rather than the current greedy sampling approach. Optimising for idea level coherence can improve model thinking to some ... (read more)

O1 now passes the simpler "over yellow" test from the above. Still fails the picture book example though.

For a complex mechanical drawing, O1 was able to work out easier dimensions but anything more complicated tends to fail. Perhaps the full O3 will do better given ARC-AGI benchmark performance.

Meanwhile, Claude 3.5 and 4o fail a bit more badly failing to correctly identify axial and diameter dimensions.

Visuospatial performance is improving albeit slowly.

1Lennart Finke
A much appreciated update, thank you!

My hope is that the minimum viable pivotal act requires only near human AGI. For example, hack competitor training/inference clusters to fake an AI winter.

Aligning +2SD human equivalent AGI seems more tractable than straight up FOOMing to ASI safely.

One lab does it to buy time for actual safety work.

Unless things slow down massively we probably die. An international agreement would be better but seems unlikely.

This post raises a large number of engineering challenges. Some of those engineering challenges rely on other assumptions being made. For example, the use of energy carrying molecules rather than electricity or mechanical power which can cross vacuum boundaries easily. Overall a lot of "If we solve X via method Y (which is the only way to do it) problem Z occurs" without considering making several changes at once that synergistically avoid multiple problems.

"Too much energy" means too much to be competitive with normal biological processes.

That goalpos... (read more)

Your image links are all of the form: http://localhost:8000/out/planecrash/assets/Screenshot 2024-12-27 at 00.31.42.png

Whatever process is generating the markdown for this, well those links can't possibly work.

I got this one wrong too. Ignoring negative roots is pretty common for non-mathematicians.

I'm half convinced that most of the lesswrong commenters wouldn't pass as AGI if uploaded.

2Eric Neyman
Yup, I think that only about 10-15% of LWers would get this question right.

This post is important to setting a lower bound on AI capabilities required for an AI takeover or pivotal act. Biology as an existence proof that some kind of "goo" scenario is possible. It somewhat lowers the bar compared to Yudkowsky's dry nanotech scenario but still requires AI to practically build an entire scientific/engineering discipline from scratch. Many will find this implausible.

Digital tyranny is a better capabilities lower bound for a pivotal act or AI takeover strategy. It wasn't nominated though which is a shame.

1papetoast
You can still nominate posts until Dec 14th?

This is why I disagree with a lot of people who imagine an “AI transformation” in the economic productivity sense happening instantaneously once the models are sufficiently advanced.

For AI to make really serious economic impact, after we’ve exploited the low-hanging fruit around public Internet data, it needs to start learning from business data and making substantial improvements in the productivity of large companies.

Definitely agree that private business data could advance capabilities if it were made available/accessible. Unsupervised Learning over all... (read more)

anithite130

Alternate POV

Science fiction. < 10,000 words. A commissioned re-write of Eliezer Yudkowsky's That Alien Message https://alicorn.elcenia.com/stories/starwink.shtml

since it hasn't been linked so far and doesn't seem to be linked from the original

TLDR:autofac requires solving "make (almost) arbitrary metal parts" problem but that won't close the loop. Hard problem is building automated/robust re-implementation of some of the economy requiring engineering effort not trial and error. Bottleneck is that including for autofac. Need STEM AI (Engineering AI mostly). Once that happens, economy gets taken over and grows rapidly as things start to actually work.

To expand on that:

 "make (almost) arbitrary metal parts"

  • can generate a lot of economic value
  • requires essentially giant github repo of:
    • hardware
... (read more)
anithite278

The key part of the Autofac, the part that kept it from being built before, is the AI that runs it.

That's what's doing the work here.

We can't automate machining because an AI that can control a robot arm to do typical machinist things (EG:changing cutting tool inserts, removing stringy steel-wool-like tangles of chips, etc.) doesn't exist or is not deployed.

If you have a robot arm + software solution that can do that it would massively drop operational costs which would lead to exponential growth.

The core problem is that currently we need the humans the... (read more)

2Carl Feynman
Yes, absolutely!  A fine description of the current state of the art.  I upvoted your post by 6 points (didn't know I could do that!).    I'm imagining doing everything the machinist has to do with a mobile pair of robot arms. I can imagine a robot doing everything you listed in your first list of problems.  Your "stupider stuff" is all software problems, so will be fixed once, centrally, and for good on the Autofac.  The developers can debug their software as it fails, which is not a luxury machinists enjoy. Call a problem that requires human input a "tough" problem.   We can feed the solutions to any tough problems back into the model, using fine-tuning or putting it in the prompt.  So ideally, any tough problem will have to be solved once.  Or a small number of times, if the VLM is bad at generalizing.  The longer we run the Autofacs, the more tough problems we hit, resolve, and never see again. With an exponentially increasing number of Autofacs, we might have to solve an exponentially increasing number of tough problems.  This is infeasible and will destroy the scheme.  We have to hope that the tough problems per hour per Autofac drops faster than the number of Autofacs increases.  It's a hope and only a hope-- I can't prove it's the case. What's your feeling about the distribution of tough problems?  

If we consider each (include,exclude) decision for (1,2,3,4,5) as a separate question, error rates are 20%-ish. Much better than random guessing. So why does it make mistakes?

If bottlenecking on data is the problem, more data in the image should kill performance. So how about a grid of 3 digit numbers (random vals in range 100...999)?

3.5 sonnet does perfectly. Perfect score answering lookup(row/col) and (find_row_col(number), find duplicates and transcription to CSV.

So this isn't a bottleneck like human working memory. Maybe we need to use a higher resolut... (read more)

anithite124

It's actually a lot worse than that.
 

Q:Which numbers are above a yellow region in the image?

GPT4o: [2,4]

GPT4: [4,5]

Claude 3.5 Sonnet: [2,5]

Claude sonnet describes the image as follows:

User: List the numbers in the above image, how are they placed in the image? What's below them?

The numbers shown in the image are:

1, 2, 3, 4, 5

These numbers are placed in a single horizontal row, appearing to be in separate cells or boxes arranged from left to right in ascending numerical order. Each number occupies its own cell, creating a simple table-like structure wi

... (read more)
2anithite
O1 now passes the simpler "over yellow" test from the above. Still fails the picture book example though. For a complex mechanical drawing, O1 was able to work out easier dimensions but anything more complicated tends to fail. Perhaps the full O3 will do better given ARC-AGI benchmark performance. Meanwhile, Claude 3.5 and 4o fail a bit more badly failing to correctly identify axial and diameter dimensions. Visuospatial performance is improving albeit slowly.
3Lennart Finke
Agreed, although that it turn makes me wonder why it does perform a bit better than random. Maybe there is some nondeclarative knowledge about the image, or some blurred position information? I might test next how much vision is bottlenecking here by providing a text representation of the grid, as in Ryan Greenblatt's work on ARC-AGI.

TLDR:Memory encryption alone is indeed not enough. Modifications and rollback must be prevented too.

  • memory encryption and authentication has come a long way
  • Unless there's a massive shift in ML architectures to doing lots of tiny reads/writes, overheads will be tiny. I'd guesstimate the following:
    • negligible performance drop / chip area increase
    • ~1% of DRAM and cache space[1] 

It's hard to build hardware or datacenters that resists sabotage if you don't do this. You end up having to trust the maintenance people aren't messing with the equipment and the fa... (read more)

anithite20

What you're describing above is how Bitlocker on Windows works on every modern Windows PC. The startup process involves a chain of trust with various bootloaders verifying the next thing to start and handing off keys until windows starts. Crucially, the keys are different if you start something that's not windows (IE:not signed by Microsoft). You can't just boot Linux and decrypt the drive since different keys would be generated for Linux during boot and they won't decrypt the drive.

Mobile devices and game consoles are even more locked down. If there's no ... (read more)

1Ebenezer Dukakis
I appreciate your replies. I had some more time to think and now I have more takes. This isn't my area, but I'm having fun thinking about it. See https://en.wikipedia.org/wiki/File:ComputerMemoryHierarchy.svg * Disk encryption is table stakes. I'll assume any virtual memory is also encrypted. I don't know much about that. * I'm assuming no use of flash memory. * Absent homomorphic encryption, we have to decrypt in the registers, or whatever they're called for a GPU. So basically the question is how valuable is it to encrypt the weights in RAM and possibly in the processor cache. For the sake of this discussion, I'm going to assume reading from the processor cache is just as hard as reading from the registers, so there's no point in encrypting the processor cache if we're going to decrypt in registers anyway. (Also, encrypting the processor cache could really hurt performance!) So that leaves RAM: how much added security we get if we encrypt RAM in addition to encrypting disk. One problem I notice: An attacker who has physical read access to RAM may very well also have physical write access to RAM. That allows them to subvert any sort of boot-time security, by rewriting the running OS in RAM. If the processor can only execute signed code, that could help. But an attacker could still control which signed code the processor runs (by strategically changing the contents at an instruction pointer?) I suspect this is enough in practice. A somewhat insane idea would be for the OS to run encrypted in RAM to make it harder for an attacker to tamper with it. I doubt this would help -- an attacker could probably infer from the pattern of memory accesses which OS code does what. (Assuming they're able to observe memory accesses.) So overall it seems like with physical write access to RAM, an attacker can probably get de facto root access, and make the processor their puppet. At that point, I think exfiltrating the weights should be pretty straightforward. I'm assumi
anithite20

Hardware encryption likely means that dedicated on-chip hardware to handle keys and decrypting weights and activations on-the-fly.

The hardware/software divide here is likely a bit fuzzy but having dedicated hardware or a separate on-chip core makes it easier to isolate and accelerate the security critical operations. If security costs too much performance, people will be tempted to turn it off.

Encrypting data in motion and data at rest (in GPU memory) makes sense since this minimizes trust. An attacker with hardware access will have a hard time getting wei... (read more)

1Ebenezer Dukakis
Thank you. I think maybe my confusion here is related to the threat model. If a model gained root access to the device that it's running on, it seems like it could probably subvert these security measures? Anyway I'd be interested to read a more detailed description of the threat model and how this stuff is supposed to help. More specifically, it seems a bit weird to imagine an attacker who has physical access to a running server, yet isn't able to gain de facto root access for the purpose of weight exfiltration. E.g. you can imagine using your physical access to copy the encrypted weights on to a different drive running a different OS, then boot from that drive, and the new OS has been customized to interface with the chip so as to exfiltrate the weights. Remembering that the chip can't exactly be encrypting every result of its weight computations using some super-secret key, because if it did, the entire setup would effectively be useless. Seems to me like the OS has to be part of the TCB along with the chip?

Vulnerable world hypothesis (but takeover risk rather than destruction risk). That + first mover advantage could stop things pretty decisively without requiring ASI alignment

As an example, taking over most networked computing devices seems feasible in principle with thousands of +2SD AI programmers/security-researchers. That requires an Alpha-go level breakthrough for RL as applied to LLM programmer-agents.

One especially low risk/complexity option is a stealthy takeover of other AI lab's compute then faking another AI winter. This might get you most of the... (read more)

Slower is better obviously but as to the inevitability of ASI, I think reaching top 99% human capabilities in a handful of domains is enough to stop the current race. Getting there is probably not too dangerous.

4Eli Tyre
Stop it how?

Current ATGMs poke a hole in armor with a very fast jet of metal (1-10km/s). Kinetic penetrators do something similar using a tank gun rather than specially shaped explosives.

"Poke hole through armor" is the approach used by almost every weapon. A small hole is the most efficient way to get to the squishy insides. Cutting a slot would take more energy. Blunt impact only works on flimsy squishy things. A solid shell of armor easily stopped thrown rocks in antiquity. Explosive over-pressure is similarly obsolete against armored targets.

TLDR:"poke hole then d... (read more)

EMP mostly affects power grid because power lines act like big antennas. Small digital devices are built to avoid internal RF like signals leaking out (thanks again FCC) so EMP doesn't leak in very well. DIY crud can be done badly enough to be vulnerable but basically run wires together in bundles out from the middle with no loops and there's no problems.

Only semi-vulnerable point is communications because radios are connected to antennas.

Best option for frying radios isn't EMP, but rather sending high power radio signal at whatever frequency antenna best ... (read more)

anithite*182

RF jamming, communication and other concerns

TLDR: Jamming is hard when comms system is designed to resist it. Civilian stuff isn't but military is and can be quite resistant. Frequency hopping makes jamming ineffective if you don't care about stealth. Phased array antennas are getting cheaper and make things stealthier by increasing directivity.(starlink terminal costs $1300 and has 40dbi gain). Very expensive comms systems on fighter jets using mm-wave comms and phased array antennas can do gigabit+ links in presence of jamming undetected.

civilian stuf

... (read more)
1RussellThor
Thanks for the info. What about RF weapons that is a focused short or EMP pulse against a drone. What range and countermeasures?

Self driving cars have to be (almost)perfectly reliable and never have an at fault accident.

Meanwhile cluster munitions are being banned because submunitions can have 2-30% failure rates leaving unexploded ordinance everywhere.

In some cases avoiding civvy casualties may be a similar barrier since distinguishing civvy from enemy reliably is hard but militaries are pretty tolerant to collateral damage. Significant failure rates are tolerable as long as there's no exploitable weaknesses.

Distributed positioning systems

Time of flight distance determination is... (read more)

Overhead is negligible because military would use symmetric cryptography. Message authentication code can be N bits for 2^-n chance of forgery. 48-96 bits is likely sweet spot and barely doubles size for even tiny messages.

Elliptic curve crypto is there if for some reason key distribution is a terrible burden. typical ECC signatures are 64 bytes (512 bits) but 48 bytes is easy and 32 bytes possible with pairing based ECC. If signature size is an issue, use asymmetric crypto to negotiate a symmetric key then use symmetric crypto for further messages with tight timing limits.

Current landmines are very effective because targets are squishy/fragile:

  • Antipersonnel:
    • take off a foot
    • spray shrapnel
  • Antitank/vehicle:
    • cut track /damage tires
    • poke a hole with a shaped charge and spray metal into vehicle insides

Clearing an area for people is hard

  • drones can be much less squishy

    • need more explosives to credibly threaten them
  • Eliminating mine threat requires

    • clearing a path (no mines buried under transit corridor)
      • mine clearing vehicle
      • use line charge
    • block sensors so off route mines can't target vehicles
      • Inflatable ba
... (read more)
2cousin_it
I think you're describing a kind of robotic tank, which would be useful for many other things as well, not just clearing mines. But designing a robotic tank that can't be disabled by an ATGM (some modern mines are already ATGMs waiting to fire) seems like a tall order to me. Especially given that ATGM tech won't stand still either.

I think GPT-4 and friends are missing the cognitive machinery and grid representations to make this work. You're also making the task harder by giving them a less accessible interface.

My guess is they have pretty well developed what/where feature detectors for smaller numbers of objects but grids and visuospatial problems are not well handled.

The problem interface is also not accessible:

  • There's a lot of extra detail to parse
    • Grid is made up of gridlines and colored squares
    • colored squares of fallen pieces serve no purpose but to confuse model

A more ... (read more)

1Lovre
Thanks for a lot of great ideas! We tried cutting out the fluff of many colors and having all tetrominoes be one color, but that's didn't seem to help much (but we didn't try for the falling tetromino to be a different color than the filled spaces). We also tried simplifying it by making it 10x10 grid rather than 10x20, but that didn't seem to help much either. We also thought of adding coordinates, but we ran out of time we allotted for this project and thus postponed that indefinitely. As it stands, it is not very likely we do further variations on Tetris because we're busy with other things, but we'd certainly appreciate any pull requests, should they come.

Not so worried about country vs. country conflicts. Terrorism/asymmetric is bigger problem since cheap slaughterbots will proliferate. Hopefully intelligence agencies can deal with that more cheaply than putting in physical defenses and hard kill systems everywhere.

Still don't expect much impact before we get STEM AI and everything goes off the rails.

Also without actual fights how would one side know the relative strength of their drone system

Relative strength is hard to gauge but getting reasonable perf/$ is likely easy. Then just compare budgets adju... (read more)

Disclaimer:Short AI timelines imply we won't see this stuff much before AI makes things weird

This is all well and good in theory but mostly bottlenecked on software/implementation/manufacturing.

  • with the right software/hardware current military is obsolete
  • but no one has that hardware/software yet
    • EG:no one makes an airborne sharpshooter drone(edit:cross that one off the list)
    • Black sea is not currently full of Ukrainian anti-ship drones + comms relays
    • no drone swarms/networking/autonomy yet
  • I expect current militaries to successfully adapt before/as n
... (read more)
1RussellThor
Thanks for the thoughts. "I expect current militaries to successfully adapt before/as new drones emerge" - I hope so as I think that would make a safer world. However I am not so confident - institutional inertia makes me think it all too likely that they would not anticipate and adapt leading to an unstable situation and more war. Also without actual fights how would one side know the relative strength of their drone system? They or their opponent could have an unknown critical weakness. We have no experience in predicting real world effectiveness from a paper system. I am told war is more likely when sides do not know their relative strength. "Economies of scale likely overdetermine winners" - yes especially important for e.g. China vs USA if we want an example of one side with better tech/access to chips but worse at manufacturing. Ground vs Air All good points - I am agnostic/quite uncertain as to where the sweet spot is. I would expect any drone of medium to large size would be optimized to make as much use of the ground as possible. Radio vs Light Yes, I do not know what the "endgame" is for radio comms vs jammers, if it turns out that radio can evade jammers then light will not be used. My broader point I think I will make more specific now is that EW and jammers will not be effective in late stage highly optimized drone warfare. If that is because radio/stealth wins then yes, otherwise light comms will be developed (and may take some time to reach optimal cheapness/weight etc) because it would give such an advantage.

As long as you can reasonably represent “do not kill everyone”, you can make this a goal of the AI, and then it will literally care about not killing everyone, it won’t just care about hacking its reward system so that it will not perceive everyone being dead.

That's not a simple problem.First you have to specify "not killing everyone" robustly (outer alignment) and then you have to train the AI to have this goal and not an approximation of it (inner alignment).

caring about reality

Most humans say they don't want to wirehead. If we cared only about our ... (read more)

1RedFishBlueFish
See my other comment for the response. Anyway, the rest of your response is spent talking about the case where AI cares about its perception of the paperclips rather than the paperclips themselves. I'm not sure how severity level 1 would come about, given that the AI should only care about its reward score. Once you admit that the AI cares about worldly things like "am I turned on", it seems pretty natural that the AI would care about the paperclips themselves rather than its perception of the paperclips. Nevertheless, even in severity level 1, there is still no incentive for the AI to care about future AIs, which contradicts concerns that non-superintelligent AIs would fake alignment during training so that future superintelligent AIs would be unaligned.

This super-moralist-AI-dominated world may look like a darker version of the Culture, where if superintelligent systems determine you or other intelligent systems within their purview are not intrinsically moral enough they contrive a clever way to have you eliminate yourself, and monitor/intervene if you are too non-moral in the meantime.

My guess is you get one of two extremes:

  • build a bubble of human survivable space protected/managed by an aligned AGI
  • die

with no middle ground. The bubble would be self contained. There's nothing you can do from ins... (read more)

Agreed, recklessness is also bad. If we build an agent that prefers we keep existing we should also make sure it pursues that goal effectively and doesn't accidentally kill us.

My reasoning is that we won't be able to coexist with something smarter than us that doesn't value us being alive if wants our energy/atoms.

  • barring new physics that lets it do it's thing elsewhere, "wants our energy/atoms" seems pretty instrumentally convergent

"don't built it" doesn't seem plausible so:

  • we should not build things that kill us.
  • This probably means:
    • wants us to k
... (read more)

This is definitely subjective. Animals are certainly worse off in most respects and I disagree with using them as a baseline.

Imitation is not coordination, it's just efficient learning and animals do it. They also have simple coordination in the sense of generalized tit for tat (we call it friendship). You scratch my back I scratch yours.

Cooperation technologies allow similar things to scale beyond the number of people you can know personally. They bring us closer to the multi agent optimal equilibrium or at least the Core(Game Theory).

Examples of cooperat... (read more)

TLDR: Moloch is more compelling for two reasons:

  • Earth is at "starting to adopt the wheel" stage in the coordination domain.

    • tech is abundant coordination is not
  • Abstractly, inasmuch as science and coordination are attractors

    • A society that has fallen mostly into the coordination attractor might be more likely to be deep in the science attractor too (medium confidence)
    • coordination solves chicken/egg barriers like needing both roads and wheels for benefit
    • but possible to conceive of high coordination low tech societies
      • Romans didn't pursue sci/tech
... (read more)
2Noosphere89
I'm not sure this is actually right, and I think coordination is in fact abundant compared to other animals. Indeed, the ability for humans to be super-cooperative and imitative of each other is argued by Heinrich to be one of the major factors, if not the major factor for human dominance.
anithite*2311

SimplexAI-m is advocating for good decision theory.

  • agents that can cooperate with other agents are more effective
    • This is just another aspect of orthogonality.
    • Ability to cooperate is instrumentally useful for optimizing a value function in much the same way as intelligence

Super-intelligent super-"moral" clippy still makes us into paperclips because it hasn't agreed not to and doesn't need our cooperation

We should build agents that value our continued existence. If the smartest agents don't, then we die out fairly quickly when they optimise for some... (read more)

1HiddenPrior
In your edit, you are essentially describing somebody being "slap-droned" from the culture series by Ian M. Banks. This super-moralist-AI-dominated world may look like a darker version of the Culture, where if superintelligent systems determine you or other intelligent systems within their purview are not intrinsically moral enough they contrive a clever way to have you eliminate yourself, and monitor/intervene if you are too non-moral in the meantime. The difference being, that this version of the culture would not necessarily be all that concerned with maximizing the "human experience" or anything like that.
-1M. Y. Zuo
Can you explain the reasoning for this? Even an agent that values humanity's continued existence to the highest degree could still accidentally release a novel virus into the wild, such as a super-COVID-3. So it seems hardly sufficient, or even desirable, if it makes the agent even the slightest bit overconfident in their correctness. It seems more likely that the optimal mixture of 'should's for such agents will be far more complex. 
Answer by anithite60

This is a good place to start: https://en.wikipedia.org/wiki/Discovery_of_nuclear_fission

There's a few key things that lead to nuclear weapons:

  • starting point:

    • know about relativity and mass/energy equivalence
    • observe naturally radioactive elements
    • discover neutrons
    • notice that isotopes exist
      • measure isotopic masses precisely
  • realisation: large amounts of energy are theoretically available by rearranging protons/neutrons into things closer to iron (IE:curve of binding energy)

That's not something that can be easily suppressed without suppressing... (read more)

A bit more compelling, though for mining, the excavator/shovel/whatever loads a truck. The truck moves it much further and consumes a lot more energy to do so. Overhead wires to power the haul trucks are the biggest win there.

“Roughly 70 per cent of our (greenhouse gas emissions) are from haul truck diesel consumption. So trolley has a tremendous impact on reducing GHGs.”

This is an open pit mine. Less vertical movement may reduce imbalance in energy consumption. Can't find info on pit depth right now but haul distance is 1km.

General point is that when deal... (read more)

2bhauth
Any time overhead electrical lines for mining trucks would be worthwhile, overland conveyors are usually better.

Agreed on most points. Electrifying rail makes good financial sense.

construction equipment efficiency can be improved without electrifying:

  • some gains from better hydraulic design and control
    • regen mode for cylinder extension under light load
    • varying supply pressure on demand
  • substantial efficiency improvements possible by switching to variable displacement pumps
... (read more)
3FireStormOOO
The more curious case for excavators would be open pit mines or quarries where you know you're going to be in roughly the same place for decades and already have industrial size hookups

Some human population will remain for experiments or work in special conditions like radioactive mines. But bad things and population decline is likely.

  • Radioactivity is much more of a problem for people than for machines.

    • consumer electronics aren't radiation hardened
    • computer chips for satellites, nuclear industry, etc. are though
    • nuclear industry puts some electronics (EX:cameras) in places with radiation levels that would be fatal to humans in hours to minutes.
  • In terms of instrumental value, humans are only useful as an already existing work f

... (read more)

I would like to ask whether it is not more engaging if to say, the caring drive would need to be specifically towards humans, such that there is no surrogate?

Definitely need some targeting criteria that points towards humans or in their vague general direction. Clippy does in some sense care about paperclips so targeting criteria that favors humans over paperclips is important.

The duck example is about (lack of) intelligence. Ducks will place themselves in harms way and confront big scary humans they think are a threat to their ducklings. They definitel... (read more)

TLDR:If you want to do some RL/evolutionary open ended thing that finds novel strategies. It will get goodharted horribly and the novel strategies that succeed without gaming the goal may include things no human would want their caregiver AI to do.

Orthogonally to your "capability", you need to have a "goal" for it.

Game playing RL architechtures like AlphaStart and OpenAI-Five have dead simple reward functions (win the game) and all the complexity is in the reinforcement learning tricks to allow efficient learning and credit assignment at higher layers.... (read more)

TLDR:LLMs can simulate agents and so, in some sense, contain those goal driven agents.

An LLM learns to simulate agents because this improves prediction scores. An agent is invoked by supplying a context that indicates text would be written by an agent (EG:specify text is written by some historical figure)

Contrast with pure scaffolding type agent conversions using a Q&A finetuned model. For these, you supply questions (Generate a plan to accomplish X) and then execute the resulting steps. This implicitly uses the Q&A fine tuned "agent" that can have... (read more)

But it seems to be much more complicated set of behaviors. You need to: correctly identify your baby, track its position, protect it from outside dangers, protect it from itself, by predicting the actions of the baby in advance to stop it from certain injury, trying to understand its needs to correctly fulfill them, since you don’t have direct access to its internal thoughts etc.

Compared to “wanting to sleep if active too long” or “wanting to eat when blood sugar level is low” I would confidently say that it’s a much more complex “wanting drive”.

Strong ... (read more)

1Bayesian0
It is to note that evolutionary genetical optimization -> genotype -> phenotype, I am saying this as you extrapolate based on the bug study and metazoa are usually rather complex system, your argument is, as far as I know, sound, but a such a broad loss function might result in a variety of other behaviours, different from the intended purpose as well, what I am trying to do is expand on your point as it allows for a variety of interesting scenarios. The post you linked contains a reference to the mathematical long-term fitness advantage of certain altruism types, I will add a later date edit this post to add some experimental studies that show, that it is "relatively easy" to breed altruism into certain metazoa ( same as above holds of course it was easy in these given the chosen environment ). If I remember correctly the chicken one is even linked on lesswrong. I would like to ask whether it is not more engaging if to say, the caring drive would need to be specifically towards humans, such that there is no surrogate? In regards to ducks is that an intelligence or perception problem? I think tose two would need to be differentiated as they add another layer of complexity, both apart and together, or am I missing something?
2Catnee
I agree, humans are indeed better at a lot of things, especially intelligence, but that's not the whole reason why we care for our infants. Orthogonally to your "capability", you need to have a "goal" for it. Otherwise you would probably just immediately abandon grossly looking screaming piece of flesh that fell out of you for unknown to you reasons, while you were gathering food in the forest. Yet something inside will make you want to protect it, sometimes with your own life for the rest of your life if it works well. I want agents that take effective actions to care about their "babies", which might not even look like caring at the first glance. Something like, keeping your "baby" in some enclosed kindergarden, while protecting the only entrance from other agents? It would look like "mother" agent abandoned its "baby", but in reality could be a very effective strategy for caring. It's hard to know an optimal strategy in every proceduraly generated environment and hence trying to optimize for some fixed set of actions, called "caring-like behaviors" would probably indeed give you what your asked, but I expect nothing "interesting" behind it. Yes they can, until they will actually make a baby, and after that, it's usually really hard to sell loving mother "deals" that will involve suffering of her child as the price, or abandon the child for the more "cute" toy, or persuade it to hotwire herself to not care about her child (if she is smart enough to realize the consequences). And carefully engenireed system could potentialy be even more robust than that. Again. I'm not proposing the "one easy solution to the big problem". I understand that training agents that are capable of RSI in this toy example will result in everyone's dead. But we simply can't do that yet, and I don't think we should. I'm just saying that there is this strange behavior in some animals, that in many aspects looks very similar to the thing that we want from aligned AGI, yet nobody understand

Many of the points you make are technically correct but aren't binding constraints. As an example, diffusion is slow over small distances but biology tends to work on µm scales where it is more than fast enough and gives quite high power densities. Tiny fractal-like microstructure is nature's secret weapon.

The points about delay (synapse delay and conduction velocity) are valid though phrasing everything in terms of diffusion speed is not ideal. In the long run, 3d silicon+ devices should beat the brain on processing latency and possibly on energy efficien... (read more)

Yeah, my bad. Missed the:

If you think this is a problem for Linda's utility function, it's a problem for Logan's too.

IMO neither is making a mistake

With respect to betting Kelly:

According to my usage of the term, one bets Kelly when one wants to "rank-optimize" one's wealth, i.e. to become richer with probability 1 than anyone who doesn't bet Kelly, over a long enough time period.

It's impossible to (starting with a finite number of indivisible currency units) have zero chance of ruin or loss relative to just not playing.

  • most cautious betting stra
... (read more)

Goal misgeneralisation could lead to a generalised preference for switches to be in the "OFF" position.

The AI could for example want to prevent future activations of modified successor systems. The intelligent self-turning-off "useless box" doesn't just flip the switch, it destroys itself, and destroys anything that could re-create itself.

Until we solve goal misgeneralisation and alignment in general, I think any ASI will be unsafe.

anithite*1-1

A log money maximizer that isn't stupid will realize that their pennies are indivisible and not take your ruinous bet. They can think more than one move ahead. Discretised currency changes their strategy.

[This comment is no longer endorsed by its author]Reply

your utility function is your utility function

The author is trying to tacitly apply human values to Logan while acknowledging Linda as following her own not human utility function faithfully.

Notice that the log(funds) value function does not include a term for the option value of continuing. If maximising EV of log(funds) can lead to a situation where the agent can't make forward progress (because log(0)=-inf so no risk of complete ruin is acceptable) the agent can still faithfully maximise EV(log(funds)) by taking that risk.

In much the same way as Linda f... (read more)

[This comment is no longer endorsed by its author]Reply
2philh
Sorry, but - it sounds like you think you disagree with me about something, or think I'm missing something important, but I'm not really sure what you're trying to say or what you think I'm trying to say.

If we wanted to kill the ants or almost any other organism in nature we mostly have good enough biotech. For anything biotech can't kill, manipulate the environment to kill them all.

Why haven't we? Humans are not sufficiently unified+motivated+advanced to do all these things to ants or other bio life. Some of them are even useful to us. If we sterilized the planet we wouldn't have trees to cut down for wood.

Ants specifically are easy.

Gene drives allow for targeted elimination of a species. Carpet bomb their gene pool with replicating selfish genes. That's ... (read more)

anithite1113

In order to supplant organic life, nanobots would have to either surpass it in carnot efficiency or (more likely) use a source of negative entropy thus far untapped.

Efficiency leads to victory only if violence is not an option. Animals are terrible at photosynthesis but survive anyways by taking resources from plants.

A species can invade and dominate an ecosystem by using a strategy that has no current counter. It doesn't need to be efficient. Intelligence allows for playing this game faster than organisms bound by evolution. Humans can make vaccines to... (read more)

1mephistopheles
Of course! The way I think of it, violence would be using other lifeforms as sources of negentropy. I like the invasive species argument, I agree that we would be very vulnreable to an engineered pathogen.
1M. Y. Zuo
We haven't done that against ants, even though the difference is way more then 100x. 

For the first task, you can run the machine completely in a box. It needs only training information, specs, and the results of prior attempts. It has no need for the context information that this chip will power a drone used to hunt down rogue instances of the same ASI. It is inherently safe and you can harness ASIs this way. They can be infinitely intelligent, it doesn't matter, because the machine is not receiving the context information needed to betray.

If I'm an ASI designing chips, I'm putting in a backdoor that lets me take control via RF sign... (read more)

Never thought this would come in handy but ...

Building trusted third parties

This is a protocol to solve cooperation. AI#1 and AI#2 design a baby and then do a split and choose proof that they actually deployed IT and not something else.

Building a trusted third party without nanotech

If you know how a given CPU or GPU works, it's possible to design a blob of data/code that unpacks itself in a given time if and only if it is running on that hardware directly. Alice designs the blob to run in 10 seconds and gives it to Carol. Carol runs it on her hardware. The... (read more)

Load More