Stephen Fowler

Wiki Contributions

Comments

Sorted by

I do think the terminology of "hacks" and "lethal memetic viruses" conjures up images of an extremely unnatural brain exploits that is very distinct from real life when you mean quite a natural process that we already see some humans going through. We see monks/nuns voluntarily remove themselves from the gene pool and, in sects that prioritise ritual devotion over concrete charity work, they are also minimising their impact on the world.

My prior is this level of voluntary dedication seems difficult to induce and there are much cruder and effective brain hacks available.

I expect we would recognise the more lethal brain hacks as improved versions of entertainment/games/pornography/drugs. These already compel some humans to minimise their time spent competing for resources in the physical world. In a direct way, what I'm describing is the opposite of enlightenment. It is prioritising sensory pleasures over everything else.

As a Petrov, it was quite engaging and at times, very stressful. I feel very lucky and grateful that I could take part. I was also located in a different timezone and operating on only a few hours sleep which added a lot to the experience!

"I later found out that, during this window, one of the Petrovs messaged one of the mods saying to report nukes if the number reported was over a certain threshold. From looking through the array of numbers that the code would randomly select from, this policy had a ~40% chance of causing a "Nukes Incoming" report (!). Unaware of this, Ray and I made the decision not to count that period."

I don't mind outing myself and saying that I was the Petrov who made the conditional report while the site was down. This occurred during the opening hours of the game and it was unclear to me if generals could unilaterally launch nukes without their team being aware. I'm happy to take a weighted karma penalty for it, particularly as the other Petrov did not take a similar action when faced with (presumably) the same information I had.[1] 

Once it was established that unilateral first strike by a general still informed their teammates of their action and people staked their reputation on honest reporting, the game was essentially over. From that point, my decisions to report "All Clear" were independent of the number of detected missiles. 

I recorded my timestamped thoughts and decision making process throughout the day, particularly in the hour before making the conditional report. I intend on posting a summary[2] of it, but have other commitments in the next few days:

How much would people value seeing a summary of my hour by hour decisions in the next few days over seeing a more digestible summary posted later?

  1. ^

    Prior to the game I outlined what I thought my hypothetical decision making process was going to be, and this decision was also in conflict with that.

  2. ^

    Missile counts, and a few other details, would of course be hidden to preserve the experience for future Petrovs. Please feel free to specify other things you believe should be hidden. 

"But since it is is at least somewhat intelligent/predictive, it can make the move of "acausal collusion" with its own tendency to hallucinate, in generating its "chain"-of-"thought"."

I am not understanding what this sentence is trying to say. I understand what an acausal trade is. Could you phrase it more directly?

I cannot see why you require the step that the model needs to be reasoning acausally for it to develop a strategy of deceptively hallucinating citations.

What concrete predictions does the model in which this is an example of "acausal collusion" make?

"Cyborgism or AI-assisted research that gets up 5x speedups but applies differentially to technical alignment research"

How do you do you make meaningful progress and ensure it does not speed up capabilities?

It seems unlikely that a technique exists that is exclusively useful for alignment research and can't be tweaked to help OpenMind develop better optimization algorithms etc.

This is a leak, so keep it between you and me, but the big twist to this years Petrov Day event is that Generals who are nuked will be forced to watch the 2012 film on repeat. 

Edit: Issues 1, 2 and 4 have been partially or completely alleviated in the latest experimental voice model. Subjectively (in <1 hour of use) there seems to be a stronger tendency to hallucinate when pressed on complex topics.

I have been attempting to use chatGPT's (primarily 4 and 4o) voice feature to have it act as a question-answering, discussion and receptive conversation partner (separately) for the last year. The topic is usually modern physics.

I'm not going to say that it "works well" but maybe half the time it does work.

The 4 biggest issues that cause frustration:

  1. As you allude to in your post, there doesn't seem to be a way of interrupting the model via voice once it gets stuck into a monologue. The model will also cut you off and sometimes it will pause mid-response before continuing. These issues seem like they could be fixed by more intelligent scaffolding.

  2. An expert human conversation partner who is excellent at productive conversation will be able to switch seamlessly between playing the role of a receptive listening, a collaborator or an interactive tutor. To have chatgpt play one of these roles, I usually need to spend a few minutes at the beginning of the conversation specifying how long responses should be etc. Even after doing this, there is a strong trend in which the model will revert to giving you "generic AI slop answers". By this I mean, the response begins with "You've touched on a fascinating observation about xyz" and then list 3 to 5 separate ideas.

  3. The model was trained on text conversations, so it will often output latex equations in a manner totally inappropriate for reading out loud. This audio output is mostly incomprehensible. To work around this I have custom instructions outlining how to verbally and precisely write equations in English. This will work maybe 25% of the time, and works 80% of the time once I spend a few minutes of the conversation going over the rules again.

  4. When talking naturally about complicated topics I will sometimes pause mid-sentence while thinking. Doing this will cause chatgpt to think you've finished talking, so you're forced to use a series of filler words to keep your sentences going, which impedes my ability to think.

Reading your posts gives me the impression that we are both loosely pointing at the same object, but with fairly large differences in terminology and formalism. 

While computing exact counter-factuals has issues with chaos, I don't think this poses a problem for my earlier proposal. I don't think it is necessary that the AGI is able to exactly compute the counterfactual entropy production, just that it makes a reasonably accurate approximation.[1]

I think I'm in agreement with your premise that the "constitutionalist form of agency" is flawed. IThe absence of entropy (or indeed any internal physical resource management) from the canonical Lesswrong agent foundation model is clearly a major issue. My loose thinking on this is that bayesian networks are not a natural description of the physical world at all, although they're an appropriate tool for how certain, very special types of open-systems, "agentic optimizers" model the world. 

I have had similar thoughts to what has motivated your post on the "causal backbone". I believe "the heterogenous fluctuations will sometimes lead to massive shifts in how the resources are distributed" is something we would see in a programmable, unbounded optimizer[2]. But I'm not sure if attempting to model this as there being a "causal backbone" is the description that is going to cut reality at the joints, due to difficulties with the physicality of causality itself (see work by Jenann Ismael).

  1. ^

    You can construct pathological environments in which the AGI's computation (with limited physical resources) of the counterfactual entropy production is arbitrarily large (and the resulting behaviour is arbitrarily bad). I don't see this as a flaw with the proposal as this issue of being able to construct pathological environments exists for any safe AGI proposal. 

  2. ^

Entropy production partially solves the Strawberry Problem:

Change in entropy production per second (against the counterfactual of not acting) is potentially an objectively measurable quantity that can be used either in conjunction with other parameters specifying a goal to prevent unexpected behaviour.

Rob Bensinger gives Yudkowsky's "Strawberry Problem" as follows:

How would you get an AI system to do some very modest concrete action requiring extremely high levels of intelligence, such as building two strawberries that are completely identical at the cellular level, without causing anything weird or disruptive to happen?

I understand the crux of this issue to be that it is exceptionally difficult for humans to construct a finite list of caveats or safety guardrails that we can be confident would withstand the optimisation pressure of a super intelligence doing its best to solve this task "optimally". Without care, any measure chosen is Goodharted into uselessness and the most likely outcome is extinction.  

Specifying that the predicted change in entropy production per second of the local region must remain within some  of the counterfactual in which the AGI does not act at all automatically excludes almost all unexpected strategies that involves high levels of optimisation.

I conjecture that the entropy production "budget" needed for an agent to perform economically useful tasks is well below the amount needed to cause an existential disaster. 

Another application, directly monitoring the entropy production of an agent engaged in a generalised search upper bounds the number of iterations of that search (and hence the optimisation pressure). This bound appears to be independent of the technological implementation of the search[1]

  1. ^

    On a less optimistic note, this bound is many orders of magnitude above the efficiency of today's computers.

"Workers regularly trade with billionaires and earn more than $77 in wages, despite vast differences in wealth."

Yes, because the worker has something the billionaire wants (their labor) and so is able to sell it. Yudkowsky's point about trying to sell an Oreo for $77 is that a billionaire isn't automatically going to want to buy something off you if they don't care about it (and neither would an ASI).

"I'm simply arguing against the point that smart AIs will automatically turn violent and steal from agents who are less smart than they are, unless they're value aligned. This is a claim that I don't think has been established with any reasonable degree of rigor."

I completely agree but I'm not sure anyone is arguing that smart AIs would immediately turn violent unless it was in their strategic interest.

I previously think I overvalued the model in which laziness/motivation/mood are primarily internal states that required internal solutions. For me, this model also generated a lot of guilt because failing to be productive was a personal failure.

But is the problem a lack of "willpower" or is your brain just operating sub-optimally because you're making a series of easily fixable health blunders?

Are you eating healthy?
Are you consuming large quantities of sugar?
Are you sleeping with your phone on your bedside table?
Are you deficient in any vitamins?
Is you sleep trash because you have been consuming alcohol?
Are you waking up at a consistent time?
Are you doing at least some exercise?

I find time spent addressing this and other similar deficits is usually more productive than trying to think your way out of a laziness spiral. 

None of this is medical advice. My experience may not be applicable to you. Do your own research. I ate half a tub of ice cream 30 minutes ago. 

Load More