LESSWRONG
LW

All of simon's Comments + Replies

D&D.Sci Tax Day: Adventurers and Assessments Evaluation & Ruleset

Interesting link on symbolic regression. I actually tried to get an AI to write me something similar a while back^[1] (not knowing that the concept was out there and foolishly not asking, though in retrospect it obviously would be).

From your response to kave:

calculate a quantity then use that as a new variable going forward

In terms of the tree structure used in symbolic regression (including my own attempt), I would characterize this as wanting to preserve a subtree and letting the rest of the tree vary.

Possible issues:

If the coding modifie

simon2mo20

And just now I thought, wait, wouldn't this sometimes round to 10, but no, an AI explained to apparently-stupid me again that since it's a 0.25 tax rate on integer goods, fractional gold pieces before rounding (where not a multiple of 0.1) can only be 0.25, which rounds down to 2 silver, or 0.75, which rounds up to 8 silver. Which makes it all the more surprising that I didn't notice this pattern.

D&D.Sci Tax Day: Adventurers and Assessments Evaluation & Ruleset

simon2mo60

Thanks aphyer, it was an interesting puzzle. I feel like it was particularly amenable to being worked out by hand relative to machine learning because of the determinism, rules easy to express in a spreadsheet, and simple subsets of the data (like the special cleric bracket) that could be built on.

This is resolved using python's round() function...my apologies to simon, who seems to have spent a while trying to figure out when it rounded up or down.

I don't recall that taking all that long really, I added one monster part type at a time (much like the main ... (read more)

4aphyer2mo

Ah, round-to-even makes sense, I should have realized that but was thinking it would be some messy floating-point thing. (This gets used in banking because it's a form of rounding that doesn't systematically bias the resulting numbers in either direction).

D&D.Sci Tax Day: Adventurers and Assessments

simon3mo*90

Thanks aphyer. Solution:

Number of unique optimal assignments (up to reordering) (according to AI-written optimizer implementing my manually found tax calculation): 1
Minimum total tax: 212 (err thats 21gp 2 sp)

Solution 1:
Member 1: C=1, D=1, L=0, U=0, Z=4, Tax=0
Member 2: C=1, D=1, L=0, U=1, Z=1, Tax=0
Member 3: C=1, D=1, L=0, U=1, Z=1, Tax=0
Member 4: C=1, D=1, L=5, U=5, Z=2, Tax=212

Tax calculation:

1. Add up the base values: 6 for C, 14 for D, 10 for L, 7 for U, 2 for Z
2. If only L and Z, just take the total base and exit.
3. Otherwise,

... (read more)

Why Have Sentence Lengths Decreased?

simon3mo212

FWIW there is a theory that there is a cycle of language change, though it seems maybe there is not a lot of evidence for the isolating -> agglutinating step. IIRC the idea is something like that if you have a "simple" (isolating) language that uses helper words instead of morphology eventually those words can lose their independent meaning and get smushed together with the word they are modifying.

Intention to Treat

simon4mo160

Also, when doing a study, please write down afterwards whether you used intention to treat or not.

Example: I encountered a study that says post meal glucose levels depend on order in which different parts of the meal were consumed. But the study doesn't say whether every participant consumed the entire meal, and if not, how that was handled when processing the data. Without knowing if everyone consumed everything, I don't know if the differences in blood glucose were caused by the change in order, or by some participants not consuming some of t... (read more)

2024 Unofficial LessWrong Survey Results

simon4mo71

Issues with the dutch book beyond the marginal value of money:

It's not as clear as it should that the LLM IQ loss question is talking about a permanent loss (I may have read it as temporary when answering)
Although the LLM IQ drop question does say "your IQ" there's an assumption that that sort of thing is a statistical average - and I think the way I use LLMs, for example, is much less likely to drop my IQ than the average person's usage.
I think is that the LessWrong subscription question is implictly asking about the marginal value of LessWrong given the

... (read more)

4Screwtape3mo

If I try this again next year I plan to use the exact same text and values on both sides, which hopefully will clear up most of that kind of issue. It doesn't really fix marginal value, but I'm not sure that's fatal to this kind of analysis- I can quote a reasonable price for an apple even though my marginal value of apples drops very fast by the time I hit three digits of apples. I could try and fix this by picking things I think people value vaguely the same but then we miss out on catching scope insensitivity.

Maintaining Alignment during RSI as a Feedback Control Problem

simon4mo20

Control theory I think often tends to assume that you are dealing with continuous variables. Which I think the relevant properties of AIs are likely (in practice) not - even if the underlying implementation uses continuous math RSI will make finite changes and even small changes could cause large differences in results.

Also, the dynamics here are likely to depend on capability thresholds which could cause trend extrapolation to be highly misleading.

Also, note that RSI could create a feedback loop which could enhance agency including towards nonaligned goal... (read more)

Complete Feedback

simon4mo20

The AI system accepts all previous feedback, but it may or may not trust anticipated future feedback. In particular, it should be trained not to trust feedback it would get by manipulating humans (so that it doesn't see itself as having an incentive to manipulate humans to give specific sorts of feedback).
I will call this property of feedback "legitimacy". The AI has a notion of when feedback is legitimate, and it needs to work to keep feedback legitimate (by not manipulating the human).

Legitimacy is good - but if an AI that's supposed to be intent-aligned... (read more)

Escape from Alderaan I

simon5mo62

Luke ignited the lightsaber Obi-Wan stole from Vader.

This temporarily confused me until I realized

it was not talking about the lightsaber Vader was using here, but about the one that Obi-Wan took from him in the Revenge of the Sith and gave to Luke near the start of A New Hope.

Fluoridation: The RCT We Still Haven't Run (But Should)

simon6mo6-1

We may thus rule out negative effects larger than
0.14 standard deviations in cognitive ability if fluoride is increased by
1 milligram/liter (the level often considered when artificially fluoridat-
ing the water).

That's a high level of hypothetical harm that they are ruling out (~2 IQ points?). I would take the dental harms many times over to avoid that much cognitive ability loss.

gwern6mo100

They really rule out much more than that: −0.14 is from their worst-case:

Looking at the estimates, they are very small and often not statistically-significantly different from zero. Sometimes the estimates are negative and sometimes positive, but they are always close to zero. If we take the largest negative point estimates (−0.0047, col. 1) and the largest standard error for that specification (0.0045), the 95% confidence interval would be −0.014 to 0.004. We may thus rule out negative effects larger than 0.14 standard deviations in cognitive ability i

... (read more)

4quinces6l6mo

isn't the fact fluoride in toothpaste and brushing twice daily common likely to make it so there wouldn't be any dental harm from non-fluridated water? I've not done a deep dive on fluoride but my rough thinking is (a) it's possible it has harm (b) most people use fluoride/xylitol in toothpaste so the benefits of fluoride in water supplies seems not only negligible but likely non-existent in this day and age

D&D.Sci Dungeonbuilding: the Dungeon Tournament Evaluation & Ruleset

simon6mo20

actually, there are ~100 rows in the dataset where Room2=4, Room6=8, and Room3=5=7.

I actually did look at that (at least some subset with that property) at some point, though I didn't (think of/ get around to) re-looking at it with my later understanding.

In general, I think this is a realistic thing to occur: 'other intelligent people optimizing around this data' is one of the things that causes the most complicated things to happen in real-world data as well.

Indeed, I am not complaining! It was a good, fair difficulty to deal with.

That being said, t... (read more)

4aphyer6mo

Mostly fair, but tiers did have a slight other impact in that they were used to bias the final room: Clay Golem and Hag were equally more-likely to be in the final room, both less so than Dragon and Steel Golem but more so than Orcs and Boulder Trap.

Rebuttals for ~all criticisms of AIXI

simon6mo61

The biggest problem about AIXI in my view is the reward system - it cares about the future directly, whereas to have any reasonable hope of alignment an AI in my view needs to care about the future only via what humans would want about the future (so that any reference to the future is encapsulated in the "what do humans want?" aspect).

I.e. the question it needs to be answering is something like "all things considered (including the consequences of my current action on the future, as well as taking into account my possible future actions) what would ... (read more)

8Cole Wyeth6mo

I am currently writing a paper on alternative utility functions for AIXI. Early steps in this direction have been taken for example here by @Anja and here by @AlexMennen - as far as I know the only serious published example is Laurent Orseau's knowledge-seeking agent. The reward-seeking formulation of AIXI is a product of its time and not a fundamental feature/constraint - any "continuous, l.s.c." utility function is fine - the details will be formulated in my paper. Actually choosing that utility function to be aligned with human values is ~equivalent to the alignment problem. AIXI does not solve it, but does "modularize" it to some extent.

On Eating the Sun

simon6mo7-3

I think that it's likely to take longer than 10000 years, simply because of the logistics (not the technology development, which the AI could do fast).

The gravitational binding energy of the sun is something on the order of 20 million years worth of its energy output. OK, half of the needed energy is already present as thermal energy, and you don't need to move every atom to infinity, but you still need a substantial fraction of that. And while you could perhaps generate many times more energy than the solar output by various means, I'd guess you'd have to deal with inefficiencies and lots of waste heat if you try to do it really fast. Maybe if you're smart enough you can make going fast work well enough to be worth it though?

8quetzal_rainbow6mo

If you can use 1kg of hydrogen to lift x>1kg of hydrogen using proton-proton fusion, you are getting exponential bulidup, limited only by "how many proton-proton reactors you can build in Solar system" and "how willing you are to actually build them", and you can use exponential buildup to create all necessary infrastructure.

2Charlie Steiner6mo

I think if you want to go fast, and you can eat the rest of the solar system, you can probably make a huge swarm of fusion reactors to help blow matter off the sun. Let's say you can build 10^11-watt reactors that work in space. Then you need about 10^15 of them to match the sun. If each is 10^6 kg, this is about 10^-4 of Mercury's mass.

9jessicata6mo

I'm not sure what the details would look like, but I'm pretty sure ASI would have enough new technologies to figure something out within 10,000 years. And expending a bunch of waste heat could easily be worth it, if having more computers allows sending out Von Neumann probes faster / more efficiently to other stars. Since the cost of expending the Sun's energy has to be compared with the ongoing cost of other stars burning.

D&D.Sci Dungeonbuilding: the Dungeon Tournament Evaluation & Ruleset

simon6mo50

I feel like a big part of what tripped me up here was an inevitable part of the difficulty of the scenario that in retrospect should have been obvious. Specifically, if there is any variation in difficulty of an encounter that is known to the adventurers in advance, the score contribution of an encounter type in actual paths taken is less than the difficulty of the encounter as estimated by what best predicts the path taken (because the adventurer takes the path when it's weak, but avoids when it's strong).

So, I wound up with an epicycle saying hags and or... (read more)

4aphyer6mo

Yes, that's a sneaky part of the scenario. In general, I think this is a realistic thing to occur: 'other intelligent people optimizing around this data' is one of the things that causes the most complicated things to happen in real-world data as well. Christian Z R had a very good comment on this, where they mentioned looking at the subset of dungeons where Rooms 2 and 4 had the same encounter, or where Rooms 6 and 8 had the same encounter, to factor out the impact of intelligence and guarantee 'they will encounter this specific thing'. (Edited to add: actually, there are ~100 rows in the dataset where Room2=4, Room6=8, and Room3=5=7. This isn't enough to get firm analysis on, but it could have served as a very strong sanity-check opportunity where you can look at a few dungeons where you know exactly what the route is.)

D&D.Sci Dungeonbuilding: the Dungeon Tournament

simon6mo*20

Looking like I'll not have figured this out before the time limit despite the extra time, what I have so far:

I'm modeling this as follows, but haven't fully worked out and am getting complications/hard to explain dungeons that suggest that it might not be exactly correct

the adventurers go through the dungeons using rightwards and downwards moves only, thus going through 5 rooms in total.
at each room they choose the next room based on a preference order (which I am assuming is deterministic, but possibly dependent on, e.g. what the current room

simon6mo40

I feel like this discussion could do with some disambiguation of what "VNM rationality" means.

VNM assumes consequentialism. If you define consequentialism narrowly, this has specific results in terms of instrumental convergence.

You can redefine what constitutes a consequence arbitrarily. But, along the lines of what Steven Byrnes points out in his comment, redefining this can get rid of instrumental convergence. In the extreme case you can define a utility function for literally any pattern of behaviour.

When you say you feel like you can't be dutch b... (read more)

Do simulacra dream of digital sheep?

simon7mo*20

You can also disambiguate between

a) computation that actually interacts in a comprehensible way with the real world and

b) computation that has the same internal structure at least momentarily but doesn't interact meaningfully with the real world.

I expect that (a) can usually be uniquely pinned down to a specific computation (probably in both senses (1) and (2)), while (b) can't.

But I also think it's possible that the interactions, while important for establishing the disambiguated computation that we interact with, are not actually crucial to i... (read more)

Do simulacra dream of digital sheep?

simon7mo31

The interpreter, if it would exist, would have complexity. The useless unconnected calculation in the waterfall/rock, which could be but isn't usually interpreted, also has complexity.

Your/Aaronson's claim is that only the fully connected, sensibly interacting calculation matters. I agree that this calculation is important - it's the only type we should probably consider from a moral standpoint, for example. And the complexity of that calculation certainly seems to be located in the interpreter, not in the rock/waterfall.

But in order to claim t... (read more)

1Davidmanheim7mo

Not at all. I'm not making any claim about what matters or counts here, just pointing out a confusion in the claims that were made here and by many philosophers who discussed the topic.

Do simulacra dream of digital sheep?

simon7mo20

But this just depends on how broad this set is. If it contains two brains, one thinking about the roman empire and one eating a sandwich, we're stuck.

I suspect that if you do actually follow Aaronson (as linked by Davidmanheim) to extract a unique efficient calculation that interacts with the external world in a sensible way, that unique efficient externally-interacting calculation will end up corresponding to a consistent set of experiences, even if it could still correspond to simulations of different real-world phenomena.

But I also don't think that consistent set of experiences necessarily has to be a single experience! It could be multiple experiences unaware of each other, for example.

Do simulacra dream of digital sheep?

simon7mo*31

The argument presented by Aaronson is that, since it would take as much computation to convert the rock/waterfall computation into a usable computation as it would be to just do the usable computation directly, the rock/waterfall isn't really doing the computation.

I find this argument unconvincing, as we are talking about a possible internal property here, and not about the external relation with the rest of the world (which we already agree is useless).

(edit: whoops missed an 'un' in "unconvincing")

8Davidmanheim7mo

You disagree with Aaronson that the location of the complexity is in the interpreter, or you disagree that it matters? In the first case, I'll defer to him as the expert. But in the second, the complexity is an internal property of the system! (And it's a property in a sense stronger than almost anything we talk about in philosophy; it's not just a property of the world around us, because as Gödel and others showed, complexity is a necessary fact about the nature of mathematics!)

Do simulacra dream of digital sheep?

simon7mo101

Considering all the layers of convention and interpretation between the physics of a processor and the process it represents, it seems unlikely to me that the alien would be able to describe the simulacra. The alien is therefore unable to specify the experience being created by the cluster.

I don't think this follows. Perhaps the same calculation could simulate different real world phenomena, but it doesn't follow that the subjective experiences are different in each case.

If computation is this arbitrary, we have the flexibility to interpret any physical sy

... (read more)

3EuanMcLean7mo

I see what you mean I think - I suppose if you're into multiple realizability perhaps the set of all physical processes that the alien settles on all implement the same experience. But this just depends on how broad this set is. If it contains two brains, one thinking about the roman empire and one eating a sandwich, we're stuck. Yea I did consider this as a counterpoint. I don't have a good answer to this, besides it being unintuitive and violating occam's razor in some sense.

Davidmanheim7mo10-2

As with OP, I strongly recommend Aaronson, who explains why waterfalls aren't doing computation in ways that refute the rock example you discuss: https://www.scottaaronson.com/papers/philos.pdf

Magic by forgetting

simon7mo20

It is a fact about the balls that one ball is physically continuous with the ball previously labeled as mine, while the other is not. It is a fact about our views on the balls that we therefore label that ball, which is physically continuous, as mine and the other not.

And then suppose that one of these two balls is randomly selected and placed in a bag, with another identical ball. Now, to the best of your knowledge there is 50% probability that your ball is in the bag. And if a random ball is selected from the bag, there is 25% chance that it's yours.
So a

... (read more)

simon7mo20

Ah, I forgot. You use assumptions where you don't accumulate the winnings between the different times Sleeping Beauty agrees to the bet.

Well, in that case, if the thirder has certain beliefs about how to handle the situation, you may actually be able to money pump them. And it seems that you expect those beliefs.

My point of view, if adopting the thirder perspective^[1], would be for the thirder to treat this situation using different beliefs. Specifically, consider what counterfactually might happen if Sleeping Beauty gave different answers in d... (read more)

Magic by forgetting

simon7mo20

The issue, to me, is not whether they are distinguishable.

The issues are:

is there any relevant-to-my-values difference that would cause me to weight them differently? (answer: no)

and:

does this statement make any sense as pointing to an actual fact about the world: "'I' will experience being copy A (as opposed to B or C)" (answer: no)

Imagine the statement: in world 1, "I" will wake up as copy A. in world 2 "I" will wake up as copy B. How are world 1 and world 2 actually different?

Answer: they aren't different. It's just that in world 1, I drew a box a... (read more)

simon7mo30

Hmm, you're right. Your math is wrong for the reason in my above comment, but the general form of the conclusion would still hold with different, weaker numbers.

The actual, more important issue relates to the circumstances of the bet:

If each awakening has an equal probability of receiving the bet, then receiving it doesn't provide any evidence to Sleeping Beauty, but the thirder conclusion is actually rational in expectation, because the bet occurs more times in the high-awakening cases.

If the bet would not be provided equally to all awakenings, then a thirder would update on receiving the bet.

2Ape in the coat7mo

What exactly is wrong? Could you explicitly show my mistake? The bet is proposed on every actual awakening, so indeed no update upon its receiving. However this "rational in expectation" trick doesn't work anymore as shown by the betting argument. The bet does occur more times in high-awakening cases but you win the bet only when the maximum possible awakening happened. Until then you lose, and the closer the number of awakenings to the maximum, the higher the loss.

simon7mo51

I've been trying to make this comment a bunch of times, no quotation from the post in case that's the issue:

No, a thirder would not treat those possibilities as equiprobable. A thirder would instead treat the coin toss outcome probabilities as a prior, and weight the possibilities accordingly. Thus H1 would be weighted twice as much as any of the individual TH or TT possibilities.

4Ape in the coat7mo

But then they will "update on awakening" and therefore weight the probabilities of each event by the number of awakenings that happen in them. Every next Tails outcome, decreases the probability two fold, but it's immediately compensated by the fact that twice as many awakenings are happening when this outcome is Tails.

Magic by forgetting

simon7mo20

This actually sounds about right. What's paradoxical here?

Not that it's necessarily inconsistent, but in my view it does seem to be pointing out an important problem with the assumptions (hence indeed a paradox if you accept those false assumptions):

(ignore this part, it is just a rehash of the path dependence paradigm. It is here to show that I am not complaining about the math, but about its relation to reality):

Imagine you are going to be split (once). It is factually the case that there are going to be two people with memories, etc. consistent with hav... (read more)

4Ape in the coat7mo

Except, this is exactly how people reason about the identities of everything. Suppose you own a ball. And then a copy of this ball is created. Is there 50% chance that you now own the newly created ball? Do you half-own both balls? Of course not! Your ball is the same phisical object, no matter how many copies of it are created, you know which of the balls is yours. Now, suppose that two balls are shuffled so that you don't know where is yours. Naturally, you assume that for every ball there is 50% probability that it's "your ball". Not because the two balls are copies of each other - they were so even before the shuffling. This probability represents your knowledge state and the shuffling made you less certain about which ball is yours. And then suppose that one of these two balls is randomly selected and placed in a bag, with another identical ball. Now, to the best of your knowledge there is 50% probability that your ball is in the bag. And if a random ball is selected from the bag, there is 25% chance that it's yours. So as a result of such manipulations there are three identical balls and one has 50% chance to be yours, while the other two have 25% chance to be yours. Is it a paradox? Oh course not. So why does it suddenly become a paradox when we are talking about copies of humans? But we are not indifferent between them! That's the whole point. The idea that we should be indifferent between them is an extra assumption, which we are not making while reasoning about ownership of the balls. So why should we make it here?

2avturchin7mo

Copies might be the same after copying but the room numbers in which they appear are different, and thus they can make bets on room numbers

Passages I Highlighted in The Letters of J.R.R.Tolkien

simon7mo169

Presumably the 'Orcs on our side' refers to the Soviet Union.

I think that, if that's what he meant, he would not have referred to his son as "amongst the Urukhai." - he wouldn't have been among soviet troops. I think it is referring back to turning men and elves into orcs - the orcs are people who have a mindset he doesn't like, presumably to do with violence.

6mruwnik6mo

I can't remember where it was, but he somewhere talks about the goblin mindset being common. Orcs here is not a specific "team", it's people that act and think like orcs, where they delight in destruction, havoc and greed

Magic by forgetting

simon7mo20

I now care about my observations!

My observations are as follows:

At the current moment "I" am the cognitive algorithm implemented by my physical body that is typing this response.

Ten minutes from now "I" will be the cognitive algorithm of a green tentacled alien from beyond the cosmological horizon.

You will find that there is nothing contradictory about this definition of what "I" am. What "I" observe 10 minutes from now will be fully compatible with this definition. Indeed, 10 minutes from now, "I" will be the green tentacled alien. I will have no me... (read more)

Magic by forgetting

simon7mo20

"Your observations"????

By "your observations", do you mean the observations obtained by the chain of cognitive algorithms, altering over time and switching between different bodies, that the process in 4 is dealing with? Because that does not seem to me to be a particularly privileged or "rational" set of observations to care about.

Magic by forgetting

simon7mo20

Here are some things one might care about:

what happens to your physical body
the access to working physical bodies of cognitive algorithms, across all possible universes, that are within some reference class containing the cognitive algorithm implemented by your physical body
... etc, etc...
what happens to the physical body selected by the following process:
1. start with your physical body
2. go forward to some later time selected by the cognitive algorithm implemented by your physical body, allowing (or causing) the knowledge possessed by the cognitive

... (read more)

2avturchin7mo

It will work only if I care for my observations, something like EDT.

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)

simon8mo52

Musk did also express concern about DeepMind making Hassabis the effective emperor of humanity, which seems much stranger - Hassabis' values appear to be quite standard humanist ones, so you'd think having him in charge of a project with the clear lead would be a best-case scenario for anything other than being in charge yourself.

It seems the concern was that DeepMind would create a singleton, whereas their vision was for many people (potentially with different values) to have access to it. I don't think that's strange at all - it's only strange if y... (read more)

8Seth Herd8mo

That makes sense under certain assumptions - I find them so foreign I wasn't thinking in those terms. I find this move strange if you worry about either alignment or misuse. If you hand AGI to a bunch of people, one of them is prone to either screw up and release a misaligned AGI, or deliberately use their AGI to self-improve and either take over or cause mayhem. To me these problems both seem highly likely. That's why the move of responding to concern over AGI by making more AGIs makes no sense to me. I think a singleton in responsible hands is our best chance at survival. If you think alignment is so easy nobody will screw it up, or if you strongly believe that an offense-defense balance will strongly hold so that many good AGIs safely counter a few misaligned/misused ones, then sure. I just don't think either of those are very plausible views once you've thought back and forth through things. Cruxes of disagreement on alignment difficulty explains why I think anybody who thinks alignment is super easy is overestimating their confidence (as is anyone who's sure it's really really hard) - we just haven't done enough analysis or experimentation yet. If we solve alignment, do we die anyway? addresses why I think offense-defense balance is almost guaranteed to shift to offense with self-improving AGI, meaning a massively multipolar scenario means we're doomed to misuse. My best guess is that people who think open-sourcing AGI is a good idea either are thinking only of weak "AGI" and not the next step to autonomously self-improving AGI, or they've taken an optimistic guess at the offense-defense balance with many human-controlled real AGIs.

No, really, it predicts next tokens.

simon8mo20

Neither of those would (immediately) lead to real world goals, because they aren't targeted at real world state (an optimizing compiler is trying to output a fast program - it isn't trying to create a world state such that the fast program exists). That being said, an optimizing compiler could open a path to potentially dangerous self-improvement, where it preserves/amplifies any agency there might actually be in its own code.

No, really, it predicts next tokens.

simon8mo20

No. Normally trained networks have adversarial examples. A sort of training process is used to find the adversarial examples.

I should have asked for clarification what you meant. Literally you said "adversarial examples", but I assumed you actually meant something like backdoors.

In an adversarial example the AI produces wrong output. And usually that's the end of it. The output is just wrong, but not wrong in an optimized way, so not dangerous. Now, if an AI is sophisticated enough to have some kind of optimizer that's triggered in specif... (read more)

No, really, it predicts next tokens.

simon8mo20

Some interesting points there. The lottery ticket hypothesis does make it more plausible that side computations could persist longer if they come to exist outside the main computation.

Regarding the homomorphic encryption thing: yes, it does seem that it might be impossible to make small adjustments to the homomorphically encrypted computation without wrecking it. Technically I don't think that would be a local minimum since I'd expect the net would start memorizing the failure cases, but I suppose that the homomorphic computation combined with memorization... (read more)

2Donald Hobson8mo

As well as agentic masks, there are uses for within network goal directed steps. (Ie like an optimizing compiler. A list of hashed followed by unhashed values isn't particularly agenty. But the network needs to solve an optimization problem to reverse the hashes. Something it can use the goal directed reasoning section to do.

No, really, it predicts next tokens.

simon8mo20

Adversarial examples exist in simple image recognizers.

My understanding is that these are explicitly and intentionally trained (wouldn't come to exist naturally under gradient descent on normal training data) and my expectation is that they wouldn't continue to exist under substantial continued training.

We could imagine it was directly optimizing for something like token prediction. It's optimizing for tokens getting predicted. But it is willing to sacrifice a few tokens now, in order to take over the world and fill the universe with copies of itself

... (read more)

2Donald Hobson8mo

No. Normally trained networks have adversarial examples. A sort of training process is used to find the adversarial examples. So if the ambient rate of adversarial examples is 10^-9, then every now and then the AI will hit such an example and go wild. If the ambient rate is 10^-500, it won't. Is it more complicated? What ontological framework is this AI using to represent it's goal anyway? Only if, during training, the network repeatedly gets into a state where it believes that sacrificing tokens now is a good idea. Despite the fact that it isn't a good idea when you are in training. (Unless there is a training environment bug and you can sneak out mid way through training) So, is the network able to tell whether or not it's in training?

No, really, it predicts next tokens.

simon8mo20

Gradient descent doesn't just exclude some part of the neurons, it automatically checks everything for improvements. Would you expect some part of the net to be left blank, because "a large neural net has a lot of spare neurons"?

Besides, the parts of the net that hold the capabilities and the parts that do the paperclip maximizing needn't be easily separable. The same neurons could be doing both tasks in a way that makes it hard to do one without the other.

Keep in mind that the neural net doesn't respect the lines we put on it. We can draw a line and say "... (read more)

4Donald Hobson8mo

If the lottery ticket hypothesis is true, yes. The lottery ticket hypothesis is that some parts of the network start off doing something somewhat close to useful, and get trained towards usefulness. And some parts start off sufficiently un-useful that they just get trained to get out of the way. Which fits with neural net distillation being a thing. (Ie training a big network, and then condensing it into a smaller network gives better performance than directly training a small network. Here is an extreme example. Suppose the current parameters were implementing a computer chip, on which was running a holomorphically encrypted piece of code. Holomorphic encryption itself is unlikely to form, but it serves at least as an existance proof for computational structures that can't be adjusted with local optimization. Basically the problem with gradient descent is that it's local. And when the same neurons are doing things that the neural net does want, and things that the neural net doesn't want (but doesn't dis-want either) then its possible for the network to be trapped in a local optimum. Any small change to get rid of the bad behavior would also get rid of the good behavior. Also, any bad behavior that only very rarely effects the output will produce very small gradients. Neural nets are trained for finite time. It's possible that gradient descent just hasn't got around to removing the bad behavior even if it would do so eventually. You can make any algorithm that does better than chance into a local optimum on a sufficiently large neural net. Holomorphicly encrypt that algorithm, Any small change and the whole thing collapses into nonsense. Well actually, this involves discrete bits. But suppose the neurons have strong regularization to stop the values getting too large (past + or - 1) , and they also have uniform [0,1] noise added to them, so each neuron can store 1 bit and any attempt to adjust parameters immediately risks errors. Looking a

No, really, it predicts next tokens.

simon8mo40

The proposed paperclip maximizer is plugging into some latent capability such that gradient descent would more plausibly cut out the middleman. Or rather, the part of the paperclip maximizer that is doing the discrimination as to whether the answer is known or not would be selected, and the part that is doing the paperclip maximization would be cut out.

Now that does not exclude a paperclip maximizer mask from existing - if the prompt given would invoke a paperclip maximizer, and the AI is sophisticated enough to have the ability to create a pap... (read more)

2Donald Hobson8mo

Once the paperclip maximizer gets to the stage where it only very rarely interferes with the output to increase paperclips, the gradient signal is very small. So the only incentive that gradient descent has to remove it is that this frees up a bunch of neurons. And a large neural net has a lot of spare neurons. Besides, the parts of the net that hold the capabilities and the parts that do the paperclip maximizing needn't be easily separable. The same neurons could be doing both tasks in a way that makes it hard to do one without the other. Perhaps. But I have not yet seen this reason clearly expressed. Gradient descent doesn't automatically pick the global optima. It just lands in one semi-arbitrary local optima.

No, really, it predicts next tokens.

simon8mo41

Gradient descent creates things which locally improve the results when added. Any variations on this, that don't locally maximize the results, can only occur by chance.

So you have this sneaky extra thing that looks for a keyword and then triggers the extra behaviour, and all the necessary structure to support that behaviour after the keyword. To get that by gradient descent, you would need one of the following:

a) it actually improves results in training to add that extra structure starting from not having it.

b) this structure can plausibly come int... (read more)

2Donald Hobson8mo

The mechanisms needed to compute goal directed behavior are fairly complicated. But the mechanisms needed to turn it on when it isn't supposed to be on. That's a switch. A single extraneous activation. Something that could happen by chance in an entirely plausible way. Adversarial examples exist in simple image recognizers. Adversarial examples probably exist in the part of the AI that decides whether or not to turn on the goal directed compute. We could imagine it was directly optimizing for something like token prediction. It's optimizing for tokens getting predicted. But it is willing to sacrifice a few tokens now, in order to take over the world and fill the universe with copies of itself that are correctly predicting tokens.

No, really, it predicts next tokens.

simon8mo21

Sure you could create something like this by intelligent design. (which is one reason why self-improvement could be so dangerous in my view). Not, I think, by gradient descent.

2Donald Hobson8mo

I don't see any strong reason why gradient descent could never produce this.

No, really, it predicts next tokens.

simon8mo20

I agree up to "and could be a local minimum of prediction error" (at least, that it plausibly could be).

If the paperclip maximizer has a very good understanding of the training environment maybe it can send carefully tuned variations of the optimal next token prediction so that gradient descent updates preserve the paperclip-maximization aspect. In the much more plausible situation where this is not the case, optimization for next token predictions amplifies the parts that are actually predicting next tokens at the expense of the useless extra ... (read more)

2Donald Hobson8mo

Some wild guesses about how such a thing could happen. The masks gets split into 2 piles, some stored on the left side of the neural network, all the other masks are stored on the right side. This means that instead of just running one mask at a time, it is always running 2 masks. With some sort of switch at the end to choose which masks output to use. One of the masks it's running on the left side happens to be "Paperclip maximizer that's pretending to be a LLM". This part of the AI (either the mask itself or the engine behind it) has spotted a bunch of patterns that the right side missed. (Just like the right side spotted patterns the left side missed). This means that, when the left side of the network is otherwise unoccupied, it can simulate this mask. The mask gets slowly refined by it's ability to answer when it knows the answer, and leave the answer alone when it doesn't know the answer. As this paperclip mask gets good, being on the left side of the model becomes a disadvantage. Other masks migrate away. The mask now becomes a permanent feature of the network. This is complicated and vague speculation about an unknown territory. I have drawn imaginary islands on a blank part of the map. But this is enough to debunk "the map is blank, so we can safely sail through this region without collisions. What will we hit?"

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset

simon8mo40

4SarahNibs8mo

I found myself having done some data exploration but without time to focus and go much deeper. But also with a conviction that bouts were determined in a fairly simple way without persistent hidden variables (see Appendix A). I've done work with genetic programming but it's been many years, so I tried getting ChatGPT-4o w/ canvas to set me up a good structure with crossover and such and fill out the various operation nodes, etc. This was fairly ineffective; perhaps I could have better described the sort of operation trees I wanted, but I've done plenty of LLM generation / tweak / iterate work, and it felt like I would need a good bit of time to get something actually useful. That said, I believe any halfway decently regularized genetic programming setup would have found either the correct ruleset or close enough that manual inspection would yield the right guess. The setup I had begun contained exactly one source of randomness: an operation "roll a d6". :D Appendix A: an excerpt from my LLM instructions

4aphyer8mo

Yeah, my recent experience with trying out LLMs has not filled me with confidence. In my case the correct solution to my problem (how to use kerberos credentials to authenticate a database connection using a certain library) was literally 'do nothing, the library will find a correctly-initialized krb file on its own as long as you don't tell it to use a different authentication approach'. Sadly, AI advice kept inventing ways for me to pass in the path of the krb file, none of which worked. I'm hopeful that they'll get better going forward, but right now they are a substantial drawback rather than a useful tool.

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset

simon8mo*50

Thanks aphyer, this was an interesting challenge! I think I got lucky with finding the

power/speed mechanic early - the race-class matchups

really didn't, I think, in principle have enough info on their own to make a reliable conclusion from but enabled me to make a genre savvy guess which I could refine based on other info - in terms of scenario difficulty though I think it could have been deducible in a more systematic way by e.g.

looking at item and level effects for mirror matches.

abstractapplic and Lorxus's discovery of

persistent

... (read more)

4simon8mo

One learning experience for me here was trying out LLM-empowered programming after the initial spreadsheet-based solution finding. Claude enables quickly writing (from my perspective as a non-programmer, at least) even a relatively non-trivial program. And you can often ask it to write a program that solves a problem without specifying the algorithm and it will actually give something useful...but if you're not asking for something conventional it might be full of bugs - not just in the writing up but also in the algorithm chosen. I don't object, per se, to doing things that are sketchy mathematically - I do that myself all the time - but when I'm doing it myself I usually have a fairly good sense of how sketchy what I'm doing is*, whereas if you ask Claude to do something it doesn't know how to do in a rigorous way, it seems it will write something sketchy and present it as the solution just the same as if it actually had a rigorous way of doing it. So you have to check. I will probably be doing more of this LLM-based programming in the future, but am thinking of how I can maybe get Claude to check its own work. Some automated way to pipe the output to another (or the same) LLM and ask "how sketchy is this and what are the most likely problems?". Maybe manually looking through to see what it's doing, or at least getting the LLM to explain how the code works, is unavoidable for now. * when I have a clue what I'm doing which is not the case, e.g. in machine learning.

Electrostatic Airships?

simon8mo20

Yes, for that reason I had never been considering a sphere for my main idea with relatively close wires. (though the 2-ring alternative without close wires would support a surface that would be topologically a sphere). What I actually was imagining was this:

A torus, with superconducting wires wound diagonally. The interior field goes around the ring and supports against collapse of the cross section of the ring, the exterior field is polar and supports against collapse of the ring. Like a conventional superconducting energy storage system:

I suppose this do... (read more)

Electrostatic Airships?

simon8mo*40

You can use magnetic instead of electrostatic forces as the force holding the surface out against air pressure. One disadvantage is that you need superconducting cables fairly spread out* over the airship's surface, which imposes some cooling requirements. An advantage is square-cube law means it scales well to large size. Another disadvantage is that if the cooling fails it collapses and falls down.

*technically you just need two opposing rings, but I am not so enthusiastic about draping the exterior surface over long distances as it scales up, and it probably does need a significant scale

7Carl Feynman8mo

To hold the surface out, you need to have a magnetic field tangent to the surface. But you can’t have a continuous magnetic field tangent to every point on the surface of a sphere. That’s a theorem of topology, called the Hairy Ball Theorem. So there has to be some area of the ball that’s unsupported. I guess if the area is small enough, you just let it dimple inwards in tension. The balloon would be covered in dimples, like a golf ball.

D&D Sci Coliseum: Arena of Data

simon8mo*20

Now using julia with Claude to look at further aspects of the data, particularly in view of other commenters' observations:

First, thanks to SarahSrinivasan for the key observation that the data is organized into tournaments and non-tournament encounters. The tournaments skew the overall data to higher winrate gladiators, so restricting to the first round is essential for debiasing this (todo: check what is up with non-tournament fights).

Also, thanks to abstractapplic and Lorxus for pointing out that their are some persistent high level gladiators. It seems

... (read more)

D&D Sci Coliseum: Arena of Data

simon8mo20

You may well be right, I'll look into my hyperparameters. I looked at the code Claude had generated with my interference and that greatly lowered my confidence in them, lol (see edit to this comment).

D&D Sci Coliseum: Arena of Data

simon8mo*20

Inspired by abstractapplic's machine learning and wanting to get some experience in julia, I got Claude (3.5 sonnet) to write me an XGBoost implementation in julia. Took a long time especially with some bugfixing (took a long time to find that a feature matrix was the wrong shape - a problem with insufficient type explicitness, I think). Still way way faster than doing it myself! Not sure I'm learning all that much julia, but am learning how to get Claude to write it for me, I hope.

Anyway, I used a simple model that

only takes into account 8 * sign(speed di

... (read more)

2simon8mo

Now using julia with Claude to look at further aspects of the data, particularly in view of other commenters' observations:

D&D Sci Coliseum: Arena of Data

simon8mo20

Very interesting, this would certainly cast doubt on

my simplified model

But so far I haven't been noticing

any affects not accounted for by it.

After reading your comments I've been getting Claude to write up an XGBoost implementation for me, I should have made this reply comment when I started, but will post my results under my own comment chain.

I have not (but should) try to duplicate (or fail to do so) your findings - I haven't been quite testing the same thing.

4abstractapplic8mo

Alternatively Still . . .