All of simon's Comments + Replies

We may thus rule out negative effects larger than
0.14 standard deviations in cognitive ability if fluoride is increased by
1 milligram/liter (the level often considered when artificially fluoridat-
ing the water).

 

That's a high level of hypothetical harm that they are ruling out (~2 IQ points?). I would take the dental harms many times over to avoid that much cognitive ability loss.

They really rule out much more than that: −0.14 is from their worst-case:

Looking at the estimates, they are very small and often not statistically-significantly different from zero. Sometimes the estimates are negative and sometimes positive, but they are always close to zero. If we take the largest negative point estimates (−0.0047, col. 1) and the largest standard error for that specification (0.0045), the 95% confidence interval would be −0.014 to 0.004. We may thus rule out negative effects larger than 0.14 standard deviations in cognitive ability i

... (read more)
4quinces6l
isn't the fact fluoride in toothpaste and brushing twice daily common likely to make it so there wouldn't be any dental harm from non-fluridated water? I've not done a deep dive on fluoride but my rough thinking is (a) it's possible it has harm (b) most people use fluoride/xylitol in toothpaste so the benefits of fluoride in water supplies seems not only negligible but likely non-existent in this day and age  

actually, there are ~100 rows in the dataset where Room2=4, Room6=8, and Room3=5=7.

I actually did look at that (at least some subset with that property) at some point, though I didn't (think of/ get around to) re-looking at it with my later understanding.

In general, I think this is a realistic thing to occur: 'other intelligent people optimizing around this data' is one of the things that causes the most complicated things to happen in real-world data as well.

Indeed, I am not complaining! It was a good, fair difficulty to deal with. 

That being said, t... (read more)

4aphyer
Mostly fair, but tiers did have a slight other impact in that they were used to bias the final room: Clay Golem and Hag were equally more-likely to be in the final room, both less so than Dragon and Steel Golem but more so than Orcs and Boulder Trap.

The biggest problem about AIXI in my view is the reward system -  it cares about the future directly, whereas to have any reasonable hope of alignment an AI in my view needs to care about the future only via what humans would want about the future (so that any reference to the future is encapsulated in the "what do humans want?" aspect).

I.e. the question it needs to be answering is something like "all things considered (including the consequences of my current action on the future, as well as taking into account my possible future actions) what would ... (read more)

7Cole Wyeth
I am currently writing a paper on alternative utility functions for AIXI. Early steps in this direction have been taken for example here by @Anja and here by @AlexMennen - as far as I know the only serious published example is Laurent Orseau's knowledge-seeking agent.  The reward-seeking formulation of AIXI is a product of its time and not a fundamental feature/constraint - any "continuous, l.s.c." utility function is fine - the details will be formulated in my paper.  Actually choosing that utility function to be aligned with human values is ~equivalent to the alignment problem. AIXI does not solve it, but does "modularize" it to some extent.  

I think that it's likely to take longer than 10000 years, simply because of the logistics (not the technology development, which the AI could do fast).

The gravitational binding energy of the sun is something on the order of 20 million years worth of its energy output. OK, half of the needed energy is already present as thermal energy, and you don't need to move every atom to infinity, but you still need a substantial fraction of that. And while you could perhaps generate many times more energy than the solar output by various means, I'd guess you'd have to deal with inefficiencies and lots of waste heat if you try to do it really fast. Maybe if you're smart enough you can make going fast work well enough to be worth it though?

8quetzal_rainbow
If you can use 1kg of hydrogen to lift x>1kg of hydrogen using proton-proton fusion, you are getting exponential bulidup, limited only by "how many proton-proton reactors you can build in Solar system" and "how willing you are to actually build them", and you can use exponential buildup to create all necessary infrastructure.
2Charlie Steiner
I think if you want to go fast, and you can eat the rest of the solar system, you can probably make a huge swarm of fusion reactors to help blow matter off the sun. Let's say you can build 10^11-watt reactors that work in space. Then you need about 10^15 of them to match the sun. If each is 10^6 kg, this is about 10^-4 of Mercury's mass.
9jessicata
I'm not sure what the details would look like, but I'm pretty sure ASI would have enough new technologies to figure something out within 10,000 years. And expending a bunch of waste heat could easily be worth it, if having more computers allows sending out Von Neumann probes faster / more efficiently to other stars. Since the cost of expending the Sun's energy has to be compared with the ongoing cost of other stars burning.

I feel like a big part of what tripped me up here was an inevitable part of the difficulty of the scenario that in retrospect should have been obvious. Specifically, if there is any variation in difficulty of an encounter that is known to the adventurers in advance, the score contribution of an encounter type in actual paths taken is less than the difficulty of the encounter as estimated by what best predicts the path taken (because the adventurer takes the path when it's weak, but avoids when it's strong).

So, I wound up with an epicycle saying hags and or... (read more)

4aphyer
Yes, that's a sneaky part of the scenario.  In general, I think this is a realistic thing to occur: 'other intelligent people optimizing around this data' is one of the things that causes the most complicated things to happen in real-world data as well. Christian Z R had a very good comment on this, where they mentioned looking at the subset of dungeons where Rooms 2 and 4 had the same encounter, or where Rooms 6 and 8 had the same encounter, to factor out the impact of intelligence and guarantee 'they will encounter this specific thing'. (Edited to add: actually, there are ~100 rows in the dataset where Room2=4, Room6=8, and Room3=5=7.  This isn't enough to get firm analysis on, but it could have served as a very strong sanity-check opportunity where you can look at a few dungeons where you know exactly what the route is.)

Looking like I'll not have figured this out before the time limit despite the extra time, what I have so far:

 I'm modeling this as follows, but haven't fully worked out and am getting complications/hard to explain dungeons that suggest that it might not be exactly correct 

  • the adventurers go through the dungeons using rightwards and downwards moves only, thus going through 5 rooms in total.
  • at each room they choose the next room based on a preference order (which I am assuming is deterministic, but possibly dependent on, e.g. what the current room
... (read more)

I feel like this discussion could do with some disambiguation of what "VNM rationality" means.

VNM assumes consequentialism. If you define consequentialism narrowly, this has specific results in terms of instrumental convergence. 

You can redefine what constitutes a consequence arbitrarily. But, along the lines of what Steven Byrnes points out in his comment, redefining this can get rid of instrumental convergence. In the extreme case you can define a utility function for literally any pattern of behaviour.

When you say you feel like you can't be dutch b... (read more)

You can also disambiguate between

a) computation that actually interacts in a comprehensible way with the real world and 

b) computation that has the same internal structure at least momentarily but doesn't interact meaningfully with the real world.

I expect that (a) can usually be uniquely pinned down to a specific computation (probably in both senses (1) and (2)), while (b) can't.

But I also think it's possible that the interactions, while important for establishing the disambiguated computation that we interact with,  are not actually crucial to i... (read more)

The interpreter, if it would exist, would have complexity. The useless unconnected calculation in the waterfall/rock, which could be but isn't usually interpreted, also has complexity. 

Your/Aaronson's claim is that only the fully connected, sensibly interacting calculation matters.  I agree that this calculation is important - it's the only type we should probably consider from a moral standpoint, for example. And the complexity of that calculation certainly seems to be located in the interpreter, not in the rock/waterfall.

But in order to claim t... (read more)

1Davidmanheim
Not at all. I'm not making any claim about what matters or counts here, just pointing out a confusion in the claims that were made here and by many philosophers who discussed the topic.

But this just depends on how broad this set is. If it contains two brains, one thinking about the roman empire and one eating a sandwich, we're stuck.

I suspect that if you do actually follow Aaronson (as linked by Davidmanheim) to extract a unique efficient calculation that interacts with the external world in a sensible way, that unique efficient externally-interacting calculation will end up corresponding to a consistent set of experiences, even if it could still correspond to simulations of different real-world phenomena.

But I also don't think that consistent set of experiences necessarily has to be a single experience! It could be multiple experiences unaware of each other, for example.

The argument presented by Aaronson is that, since it would take as much computation to convert the rock/waterfall computation into a usable computation as it would be to just do the usable computation directly, the rock/waterfall isn't really doing the computation.

I find this argument unconvincing, as we are talking about a possible internal property here, and not about the external relation with the rest of the world (which we already agree is useless).

(edit: whoops missed an 'un' in "unconvincing")

8Davidmanheim
You disagree with Aaronson that the location of the complexity is in the interpreter, or you disagree that it matters? In the first case, I'll defer to him as the expert. But in the second, the complexity is an internal property of the system! (And it's a property in a sense stronger than almost anything we talk about in philosophy; it's not just a property of the world around us, because as Gödel and others showed, complexity is a necessary fact about the nature of mathematics!)

Considering all the layers of convention and interpretation between the physics of a processor and the process it represents, it seems unlikely to me that the alien would be able to describe the simulacra. The alien is therefore unable to specify the experience being created by the cluster.

I don't think this follows. Perhaps the same calculation could simulate different real world phenomena, but it doesn't follow that the subjective experiences are different in each case.

If computation is this arbitrary, we have the flexibility to interpret any physical sy

... (read more)
3EuanMcLean
I see what you mean I think - I suppose if you're into multiple realizability perhaps the set of all physical processes that the alien settles on all implement the same experience. But this just depends on how broad this set is. If it contains two brains, one thinking about the roman empire and one eating a sandwich, we're stuck. Yea I did consider this as a counterpoint. I don't have a good answer to this, besides it being unintuitive and violating occam's razor in some sense.

As with OP, I strongly recommend Aaronson, who explains why waterfalls aren't doing computation in ways that refute the rock example you discuss: https://www.scottaaronson.com/papers/philos.pdf

It is a fact about the balls that one ball is physically continuous with the ball previously labeled as mine, while the other is not. It is a fact about our views on the balls that we therefore label that ball, which is physically continuous, as mine and the other not.

And then suppose that one of these two balls is randomly selected and placed in a bag, with another identical ball. Now, to the best of your knowledge there is 50% probability that your ball is in the bag. And if a random ball is selected from the bag, there is 25% chance that it's yours.

So a

... (read more)

Ah, I forgot. You use assumptions where you don't accumulate the winnings between the different times Sleeping Beauty agrees to the bet. 

Well, in that case, if the thirder has certain beliefs about how to handle the situation, you may actually be able to money pump them. And it seems that you expect those beliefs. 

My point of view, if adopting the thirder perspective[1], would be for the thirder to treat this situation using different beliefs. Specifically, consider what counterfactually might happen if Sleeping Beauty gave different answers in d... (read more)

The issue, to me,  is not whether they are distinguishable.

The issues are:

  • is there any relevant-to-my-values difference that would cause me to weight them differently? (answer: no)

and:

  • does this statement make any sense as pointing to an actual fact about the world: "'I' will experience being copy A (as opposed to B or C)" (answer: no)

Imagine the statement: in world 1, "I" will wake up as copy A. in world 2 "I" will wake up as copy B. How are world 1 and world 2 actually different?

Answer: they aren't different. It's just that in world 1, I drew a box a... (read more)

Hmm, you're right. Your math is wrong for the reason in my above comment, but the general form of the conclusion would still hold with different, weaker numbers.

The actual, more important issue relates to the circumstances of the bet:

If each awakening has an equal probability of receiving the bet, then receiving it doesn't provide any evidence to Sleeping Beauty, but the thirder conclusion is actually rational in expectation, because the bet occurs more times in the high-awakening cases.

If the bet would not be provided equally to all awakenings, then a thirder would update on receiving the bet.

2Ape in the coat
What exactly is wrong? Could you explicitly show my mistake? The bet is proposed on every actual awakening, so indeed no update upon its receiving. However this "rational in expectation" trick doesn't work anymore as shown by the betting argument. The bet does occur more times in high-awakening cases but you win the bet only when the maximum possible awakening happened. Until then you lose, and the closer the number of awakenings to the maximum, the higher the loss.

I've been trying to make this comment a bunch of times, no quotation from the post in case that's the issue:

No, a thirder would not treat those possibilities as equiprobable. A thirder would instead treat the coin toss outcome probabilities as a prior, and weight the possibilities accordingly. Thus H1 would be weighted twice as much as any of the individual TH or TT possibilities.

4Ape in the coat
But then they will "update on awakening" and therefore weight the probabilities of each event by the number of awakenings that happen in them.  Every next Tails outcome, decreases the probability two fold, but it's immediately compensated by the fact that twice as many awakenings are happening when this outcome is Tails.

This actually sounds about right. What's paradoxical here?

Not that it's necessarily inconsistent, but in my view it does seem to be pointing out an important problem with the assumptions (hence indeed a paradox if you accept those false assumptions):


(ignore this part, it is just a rehash of the path dependence paradigm. It is here to show that I am not complaining about the math, but about its relation to reality):

Imagine you are going to be split (once). It is factually the case that there are going to be two people with memories, etc. consistent with hav... (read more)

4Ape in the coat
Except, this is exactly how people reason about the identities of everything. Suppose you own a ball. And then a copy of this ball is created. Is there 50% chance that you now own the newly created ball? Do you half-own both balls? Of course not! Your ball is the same phisical object, no matter how many copies of it are created, you know which of the balls is yours. Now, suppose that two balls are shuffled so that you don't know where is yours. Naturally, you assume that for every ball there is 50% probability that it's "your ball". Not because the two balls are copies of each other -  they were so even before the shuffling. This probability represents your knowledge state and the shuffling made you less certain about which ball is yours. And then suppose that one of these two balls is randomly selected and placed in a bag, with another identical ball. Now, to the best of your knowledge there is 50% probability that your ball is in the bag. And if a random ball is selected from the bag, there is 25% chance that it's yours. So as a result of such manipulations there are three identical balls and one has 50% chance to be yours, while the other two have 25% chance to be yours. Is it a paradox? Oh course not. So why does it suddenly become a paradox when we are talking about copies of humans? But we are not indifferent between them! That's the whole point. The idea that we should be indifferent between them is an extra assumption, which we are not making while reasoning about ownership of the balls. So why should we make it here?
2avturchin
Copies might be the same after copying but the room numbers in which they appear are different, and thus they can make bets on room numbers

Presumably the 'Orcs on our side' refers to the Soviet Union.

I think that, if that's what he meant, he would not have referred to his son as "amongst the Urukhai." - he wouldn't have been among soviet troops. I think it is referring back to turning men and elves into orcs - the orcs are people who have a mindset he doesn't like, presumably to do with violence.

3mruwnik
I can't remember where it was, but he somewhere talks about the goblin mindset being common. Orcs here is not a specific "team", it's people that act and think like orcs, where they delight in destruction, havoc and greed

I now care about my observations!

My observations are as follows:

At the current moment "I" am the cognitive algorithm implemented by my physical body that is typing this response.

Ten minutes from now "I" will be the cognitive algorithm of a green tentacled alien from beyond the cosmological horizon. 

You will find that there is nothing contradictory about this definition of what "I" am. What "I" observe 10 minutes from now will be fully compatible with this definition. Indeed, 10 minutes from now, "I" will be the green tentacled alien. I will have no me... (read more)

"Your observations"????

By "your observations", do you mean the observations obtained by the chain of cognitive algorithms, altering over time and switching between different bodies, that the process in 4 is dealing with? Because that does not seem to me to be a particularly privileged or "rational" set of observations to care about.

 Here are some things one might care about:

  1. what happens to your physical body
  2. the access to working physical bodies of cognitive algorithms, across all possible universes,  that are within some reference class containing the cognitive algorithm implemented by your physical body
  3. ... etc, etc...
  4. what happens to the physical body selected by the following process:
    1. start with your physical body
    2. go forward to some later time selected by the cognitive algorithm implemented by your physical body, allowing (or causing) the knowledge possessed by the cognitive
... (read more)
2avturchin
It will work only if I care for my observations, something like EDT. 

Musk did also express concern about DeepMind making Hassabis the effective emperor of humanity, which seems much stranger - Hassabis' values appear to be quite standard humanist ones, so you'd think having him in charge of a project with the clear lead would be a best-case scenario for anything other than being in charge yourself.

 

It seems the concern was that DeepMind would create a singleton, whereas their vision was for many people (potentially with different values) to have access to it. I don't think that's strange at all - it's only strange if y... (read more)

8Seth Herd
That makes sense under certain assumptions - I find them so foreign I wasn't thinking in those terms. I find this move strange if you worry about either alignment or misuse. If you hand AGI to a bunch of people, one of them is prone to either screw up and release a misaligned AGI, or deliberately use their AGI to self-improve and either take over or cause mayhem. To me these problems both seem highly likely. That's why the move of responding to concern over AGI by making more AGIs makes no sense to me. I think a singleton in responsible hands is our best chance at survival. If you think alignment is so easy nobody will screw it up, or if you strongly believe that an offense-defense balance will strongly hold so that many good AGIs safely counter a few misaligned/misused ones, then sure. I just don't think either of those are very plausible views once you've thought back and forth through things. Cruxes of disagreement on alignment difficulty explains why I think anybody who thinks alignment is super easy is overestimating their confidence (as is anyone who's sure it's really really hard) - we just haven't done enough analysis or experimentation yet. If we solve alignment, do we die anyway? addresses why I think offense-defense balance is almost guaranteed to shift to offense with self-improving AGI, meaning a massively multipolar scenario means we're doomed to misuse.   My best guess is that people who think open-sourcing AGI is a good idea either are thinking only of weak "AGI" and not the next step to autonomously self-improving AGI, or they've taken an optimistic  guess at the offense-defense balance with many human-controlled real AGIs. 

Neither of those would (immediately) lead to real world goals, because they aren't targeted at real world state (an optimizing compiler is trying to output a fast program - it isn't trying to create a world state such that the fast program exists). That being said, an optimizing compiler could open a path to potentially dangerous self-improvement, where it preserves/amplifies any agency there might actually be in its own code.

Some interesting points there. The lottery ticket hypothesis does make it more plausible that side computations could persist longer if they come to exist outside the main computation.

Regarding the homomorphic encryption thing: yes, it does seem that it might be impossible to make small adjustments to the homomorphically encrypted computation without wrecking it. Technically I don't think that would be a local minimum since I'd expect the net would start memorizing the failure cases, but I suppose that the homomorphic computation combined with memorization... (read more)

2Donald Hobson
  As well as agentic masks, there are uses for within network goal directed steps. (Ie like an optimizing compiler. A list of hashed followed by unhashed values isn't particularly agenty. But the network needs to solve an optimization problem to reverse the hashes. Something it can use the goal directed reasoning section to do. 

Adversarial examples exist in simple image recognizers. 

My understanding is that these are explicitly and intentionally trained (wouldn't come to exist naturally under gradient descent on normal training data) and my expectation is that they wouldn't continue to exist under substantial continued training.

We could imagine it was directly optimizing for something like token prediction. It's optimizing for tokens getting predicted. But it is willing to sacrifice a few tokens now, in order to take over the world and fill the universe with copies of itself

... (read more)
2Donald Hobson
  No. Normally trained networks have adversarial examples. A sort of training process is used to find the adversarial examples.  So if the ambient rate of adversarial examples is 10^-9, then every now and then the AI will hit such an example and go wild. If the ambient rate is 10^-500, it won't.  Is it more complicated? What ontological framework is this AI using to represent it's goal anyway? Only if, during training, the network repeatedly gets into a state where it believes that sacrificing tokens now is a good idea. Despite the fact that it isn't a good idea when you are in training. (Unless there is a training environment bug and you can sneak out mid way through training)  So, is the network able to tell whether or not it's in training? 

Gradient descent doesn't just exclude some part of the neurons, it automatically checks everything for improvements. Would you expect some part of the net to be left blank, because "a large neural net has a lot of spare neurons"?

Besides, the parts of the net that hold the capabilities and the parts that do the paperclip maximizing needn't be easily separable. The same neurons could be doing both tasks in a way that makes it hard to do one without the other.

Keep in mind that the neural net doesn't respect the lines we put on it. We can draw a line and say "... (read more)

4Donald Hobson
  If the lottery ticket hypothesis is true, yes.  The lottery ticket hypothesis is that some parts of the network start off doing something somewhat close to useful, and get trained towards usefulness. And some parts start off sufficiently un-useful that they just get trained to get out of the way.  Which fits with neural net distillation being a thing. (Ie training a big network, and then condensing it into a smaller network gives better performance than directly training a small network.  Here is an extreme example. Suppose the current parameters were implementing a computer chip, on which was running a holomorphically encrypted piece of code.  Holomorphic encryption itself is unlikely to form, but it serves at least as an existance proof for computational structures that can't be adjusted with local optimization.  Basically the problem with gradient descent is that it's local. And when the same neurons are doing things that the neural net does want, and things that the neural net doesn't want (but doesn't dis-want either) then its possible for the network to be trapped in a local optimum. Any small change to get rid of the bad behavior would also get rid of the good behavior.    Also, any bad behavior that only very rarely effects the output will produce very small gradients. Neural nets are trained for finite time.  It's possible that gradient descent just hasn't got around to removing the bad behavior even if it would do so eventually.  You can make any algorithm that does better than chance into a local optimum on a sufficiently large neural net. Holomorphicly encrypt that algorithm, Any small change and the whole thing collapses into nonsense. Well actually, this involves discrete bits. But suppose the neurons have strong regularization to stop the values getting too large (past + or - 1) , and they also have uniform [0,1] noise added to them, so each neuron can store 1 bit and any attempt to adjust parameters immediately risks errors.    Looking a

The proposed paperclip maximizer is plugging into some latent capability such that gradient descent would more plausibly cut out the middleman. Or rather, the part of the paperclip maximizer that is doing the discrimination as to whether the answer is known or not would be selected, and the part that is doing the paperclip maximization would be cut out. 

Now that does not exclude a paperclip maximizer mask from existing -  if the prompt given would invoke a paperclip maximizer, and the AI is sophisticated enough to have the ability to create a pap... (read more)

2Donald Hobson
Once the paperclip maximizer gets to the stage where it only very rarely interferes with the output to increase paperclips, the gradient signal is very small. So the only incentive that gradient descent has to remove it is that this frees up a bunch of neurons. And a large neural net has a lot of spare neurons.  Besides, the parts of the net that hold the capabilities and the parts that do the paperclip maximizing needn't be easily separable. The same neurons could be doing both tasks in a way that makes it hard to do one without the other. Perhaps. But I have not yet seen this reason clearly expressed. Gradient descent doesn't automatically pick the global optima. It just lands in one semi-arbitrary local optima. 

Gradient descent creates things which locally improve the results when added. Any variations on this, that don't locally maximize the results, can only occur by chance.

So you have this sneaky extra thing that looks for a keyword and then triggers the extra behaviour, and all the necessary structure to support that behaviour after the keyword. To get that by gradient descent, you would need one of the following:

a) it actually improves results in training to add that extra structure starting from not having it. 

or

b) this structure can plausibly come int... (read more)

2Donald Hobson
  The mechanisms needed to compute goal directed behavior are fairly complicated. But the mechanisms needed to turn it on when it isn't supposed to be on. That's a switch. A single extraneous activation. Something that could happen by chance in an entirely plausible way.    Adversarial examples exist in simple image recognizers.  Adversarial examples probably exist in the part of the AI that decides whether or not to turn on the goal directed compute. We could imagine it was directly optimizing for something like token prediction. It's optimizing for tokens getting predicted. But it is willing to sacrifice a few tokens now, in order to take over the world and fill the universe with copies of itself that are correctly predicting tokens.

Sure you could create something like this by intelligent design. (which is one reason why self-improvement could be so dangerous in my view). Not, I think, by gradient descent.

2Donald Hobson
I don't see any strong reason why gradient descent could never produce this.

I agree up to "and could be a local minimum of prediction error" (at least, that it plausibly could be). 

If the paperclip maximizer has a very good understanding of the training environment maybe it can send carefully tuned variations of the optimal next token prediction so that gradient descent updates preserve the paperclip-maximization aspect. In the much more plausible situation where this is not the case,  optimization for next token predictions amplifies the parts that are actually predicting next tokens at the expense of the useless extra ... (read more)

2Donald Hobson
Some wild guesses about how such a thing could happen.  The masks gets split into 2 piles, some stored on the left side of the neural network, all the other masks are stored on the right side.  This means that instead of just running one mask at a time, it is always running 2 masks. With some sort of switch at the end to choose which masks output to use. One of the masks it's running on the left side happens to be "Paperclip maximizer that's pretending to be a LLM".  This part of the AI (either the mask itself or the engine behind it) has spotted a bunch of patterns that the right side missed. (Just like the right side spotted patterns the left side missed).   This means that, when the left side of the network is otherwise unoccupied, it can simulate this mask. The mask gets slowly refined by it's ability to answer when it knows the answer, and leave the answer alone when it doesn't know the answer.    As this paperclip mask gets good, being on the left side of the model becomes a disadvantage. Other masks migrate away.  The mask now becomes a permanent feature of the network.   This is complicated and vague speculation about an unknown territory.  I have drawn imaginary islands on a blank part of the map. But this is enough to debunk "the map is blank, so we can safely sail through this region without collisions. What will we hit?"

One learning experience for me here was trying out LLM-empowered programming after the initial spreadsheet-based solution finding. Claude enables quickly writing (from my perspective as a non-programmer, at least) even a relatively non-trivial program. And you can often ask it to write a program that solves a problem without specifying the algorithm and it will actually give something useful...but if you're not asking for something conventional it might be full of bugs - not just in the writing up but also in the algorithm chosen. I don't object, per se, t... (read more)

4SarahNibs
I found myself having done some data exploration but without time to focus and go much deeper. But also with a conviction that bouts were determined in a fairly simple way without persistent hidden variables (see Appendix A). I've done work with genetic programming but it's been many years, so I tried getting ChatGPT-4o w/ canvas to set me up a good structure with crossover and such and fill out the various operation nodes, etc. This was fairly ineffective; perhaps I could have better described the sort of operation trees I wanted, but I've done plenty of LLM generation / tweak / iterate work, and it felt like I would need a good bit of time to get something actually useful. That said, I believe any halfway decently regularized genetic programming setup would have found either the correct ruleset or close enough that manual inspection would yield the right guess. The setup I had begun contained exactly one source of randomness: an operation "roll a d6". :D Appendix A: an excerpt from my LLM instructions
4aphyer
Yeah, my recent experience with trying out LLMs has not filled me with confidence.   In my case the correct solution to my problem (how to use kerberos credentials to authenticate a database connection using a certain library) was literally 'do nothing, the library will find a correctly-initialized krb file on its own as long as you don't tell it to use a different authentication approach'.  Sadly, AI advice kept inventing ways for me to pass in the path of the krb file, none of which worked. I'm hopeful that they'll get better going forward, but right now they are a substantial drawback rather than a useful tool.

Thanks aphyer, this was an interesting challenge! I think I got lucky with finding the

 power/speed mechanic early - the race-class matchups 

really didn't, I think, in principle have enough info on their own to make a reliable conclusion from but enabled me to make a genre savvy guess which I could refine based on other info - in terms of scenario difficulty though I think it could have been deducible in a more systematic way by e.g. 

looking at item and level effects for mirror matches.

abstractapplic and Lorxus's discovery of 

persistent

... (read more)
4simon
One learning experience for me here was trying out LLM-empowered programming after the initial spreadsheet-based solution finding. Claude enables quickly writing (from my perspective as a non-programmer, at least) even a relatively non-trivial program. And you can often ask it to write a program that solves a problem without specifying the algorithm and it will actually give something useful...but if you're not asking for something conventional it might be full of bugs - not just in the writing up but also in the algorithm chosen. I don't object, per se, to doing things that are sketchy mathematically - I do that myself all the time - but when I'm doing it myself I usually have a fairly good sense of how sketchy what I'm doing is*, whereas if you ask Claude to do something it doesn't know how to do in a rigorous way, it seems it will write something sketchy and present it as the solution just the same as if it actually had a rigorous way of doing it. So you have to check. I will probably be doing more of this LLM-based programming in the future, but am thinking of how I can maybe get Claude to check its own work. Some automated way to pipe the output to another (or the same) LLM and ask "how sketchy is this and what are the most likely problems?". Maybe manually looking through to see what it's doing, or at least getting the LLM to explain how the code works, is unavoidable for now. * when I have a clue what I'm doing which is not the case, e.g. in machine learning.

Yes, for that reason I had never been considering a sphere for my main idea with relatively close wires. (though the 2-ring alternative without close wires would support a surface that would be topologically a sphere). What I actually was imagining was this:

A torus, with superconducting wires wound diagonally. The interior field goes around the ring and supports against collapse of the cross section of the ring, the exterior field is polar and supports against collapse of the ring. Like a conventional superconducting energy storage system:

I suppose this do... (read more)

You can use magnetic instead of electrostatic forces as the force holding the surface out against air pressure. One disadvantage is that you need superconducting cables fairly spread out* over the airship's surface, which imposes some cooling requirements. An advantage is square-cube law means it scales well to large size. Another disadvantage is that if the cooling fails it collapses and falls down.

*technically you just need two opposing rings, but I am not so enthusiastic about draping the exterior surface over long distances as it scales up, and it probably does need a significant scale

7Carl Feynman
To hold the surface out, you need to have a magnetic field tangent to the surface.  But you can’t have a continuous magnetic field tangent to every point on the surface of a sphere.  That’s a theorem of topology, called the Hairy Ball Theorem.  So there has to be some area of the ball that’s unsupported.  I guess if the area is small enough, you just let it dimple inwards in tension.  The balloon would be covered in dimples, like a golf ball.

Now using julia with Claude to look at further aspects of the data, particularly in view of other commenters' observations:

First, thanks to SarahSrinivasan for the key observation that the data is organized into tournaments and non-tournament encounters. The tournaments skew the overall data to higher winrate gladiators, so restricting to the first round is essential for debiasing this (todo: check what is up with non-tournament fights).

Also, thanks to abstractapplic and Lorxus for pointing out that their are some persistent high level gladiators. It seems

... (read more)

You may well be right, I'll look into my hyperparameters. I looked at the code Claude had generated with my interference and that greatly lowered my confidence in them, lol (see edit to this comment).

Inspired by abstractapplic's machine learning and wanting to get some experience in julia, I got Claude (3.5 sonnet) to write me an XGBoost implementation in julia. Took a long time especially with some bugfixing (took a long time to find that a feature matrix was the wrong shape - a problem with insufficient type explicitness, I think). Still way way faster than doing it myself! Not sure I'm learning all that much julia, but am learning how to get Claude to write it for me, I hope.

Anyway, I used a simple model that

only takes into account 8 * sign(speed di

... (read more)
2simon
Now using julia with Claude to look at further aspects of the data, particularly in view of other commenters' observations:

Very interesting, this would certainly cast doubt on 

my simplified model

But so far I haven't been noticing

any affects not accounted for by it.

After reading your comments I've been getting Claude to write up an XGBoost implementation for me, I should have made this reply comment when I started, but will post my results under my own comment chain.

I have not (but should) try to duplicate (or fail to do so) your findings - I haven't been quite testing the same thing.

4abstractapplic
Alternatively Still . . .

I don't think this is correct:

"My best guess about why my solution works (assuming it does) is that the "going faster than your opponent" bonus hits sharply diminishing returns around +4 speed"

In my model

There is a sharp threshold at +1 speed, so returns should sharply diminish after +1 speed

in fact in the updated version of my model

There is no effect of speed beyond the threshold (speed effect depends only on sign(speed difference))

I think the discrepancy might possibly relate to this:

"Iterated all possible matchups, then all possible loadouts (modulo not

... (read more)
4abstractapplic
Regarding my strategic approach I put your solution into my ML model and it seems to think However And . . . regardless, I'm sticking with my choices. One last note:

updated model for win chance:

I am currently modeling the win ratio as dependent on a single number, the effective power difference. The effective power difference is the power difference plus 8*sign(speed difference).

Power and speed are calculated as:

Power = level + gauntlet number + race power + class power

Speed = level + boots number + race speed + class speed

where race speed and power contributions are determined by each increment on the spectrum:

Dwarf - Human - Elf

increasing speed by 3 and lowering power by 3

and class speed and power contributions are

... (read more)
2simon
Inspired by abstractapplic's machine learning and wanting to get some experience in julia, I got Claude (3.5 sonnet) to write me an XGBoost implementation in julia. Took a long time especially with some bugfixing (took a long time to find that a feature matrix was the wrong shape - a problem with insufficient type explicitness, I think). Still way way faster than doing it myself! Not sure I'm learning all that much julia, but am learning how to get Claude to write it for me, I hope. Anyway, I used a simple model that and a full model that Results: Predictions for individual matchups for my and abstractapplic's solutions: Edit: so I checked the actual code to see if Claude was using the same hyperparameters for both, and wtf wtf wtf wtf. The code has 6 functions that all train models (my fault for at one point renaming a function since Claude gave me a new version that didn't have all the previous functionality (only trained the full model instead of both - this was when doing the great bughunt for the misshaped matrix and a problem was suspected in the full model), then Claude I guess picked up on this and started renaming updated versions spontaneously, and I was adding Claude's new features in instead of replacing things and hadn't cleaned up the code or asked Claude to do so). Each one has it's own hardcoded hyperparameter set. Of these, there are one pair of functions that have matching hyperparameters. Everything else has a unique set. Of course, most of these weren't being used anymore, but the functions for actually generating the models I used for my results, and the function for generating the models used for comparing results on a train/test split, weren't among the matching pair. Plus another function that returns a (hardcoded, also unique) updated parameter set, but wasn't actually used. Oh and all this is not counting the hyperparameter tuning function that I assumed was generating a set of tuned hyperparameters to be used by other functions, but i

On the bonus objective:

I didn't realize that the level 7 Elf Ninjas were all one person or that the boots +4 were always with a level 7 (as opposed to any level) Elf Ninja. It seems you are correct as there are 311 cases of which the first 299 all have the boots of speed 4 and gauntlets 3 with only the last 12 having boots 2 and gauntlets 3 (likely post-theft). It seems to me that they appear both as red and black, though.

2abstractapplic

 Thanks aphyer. My analysis so far and proposed strategy:

After initial observations that e.g. higher numbers are correlated with winning, I switched to mainly focus on race and class, ignoring the numerical aspects.

I found major class-race interactions.

It seems that for matchups within the same class, Elves are great, tending to beat dwarves consistently across all classes and humans even harder. While Humans beat dwarves pretty hard too in same-class matchups.

Within same-race matchups there are also fairly consistent patterns: Fencers tend to beat Ra

... (read more)
4simon
updated model for win chance:
5abstractapplic
Noting that I read this (and that therefore you get partial credit for any solution I come up with from here on out): your model and the strategies it implies are both very interesting. I should be able to investigate them with ML alongside everything else, when/if I get around to doing that. Regarding the Bonus Objective:

You mentioned a density of steel of 7.85 g/cm^3 but used a value of 2.7 g/cm^3 in the calculations.

BTW this reminds me of:

https://www.energyvault.com/products/g-vault-gravity-energy-storage

I was aware of them quite a long time ago (the original form was concrete blocks lifted to form a tower by cranes) but was skeptical since it seemed obviously inferior to using water capital cost wise and any efficiency gains were likely not worth it. Reading their current site:

The G-VAULT™ platform utilizes a mechanical process of lifting and lowering composite blocks o

... (read more)
2dynomight
  Yes! You're right! I've corrected this, though I still need to update the drawing of the house. Thank you!

IMO: if an AI can trade off between different wants/values of one person, it can do so between multiple people also.

This applies to simple surface wants as well as deep values.

I had trouble figuring out how to respond to this comment at the time because I couldn't figure out what you meant by "value alignment" despite reading your linked post. After reading you latest post, Conflating value alignment and intent alignment is causing confusion, I still don't know exactly what you mean by "value alignment" but at least can respond.

What I mean is:

If you start with an intent aligned AI following the most surface level desires/commands, you will want to make it safer and more useful by having common sense, "do what I mean", etc. As lo... (read more)

I think this post is making a sharp distinction to what really is a continuum; any "intent aligned" AI becomes more safe and useful as you add more "common sense" and "do what I mean" capability to it, and at the limit of this process you get what I would interpret as alignment to the long term, implicit deep values (of the entity or entities the AI started out intent aligned to).

I realize other people might define "alignment to the long term, implicit deep values" differently, such that it would not be approached by such a process, but currently think the... (read more)

I don't think intent aligned AI has to be aligned to an individual - it can also be intent aligned to humanity collectively. 

One thing I used to be concerned about is that collective intent alignment would be way harder than individual intent alignment, making someone validly have an excuse to steer an AI to their own personal intent. I no longer think this is the case. Most issues with collective intent I see as likely also affecting individual intent (e.g. literal instruction following vs extrapolation). I see two big issues that might make collecti... (read more)

6Seth Herd
Interesting. It currently seems to me like collective intent alignment (which I think is what I'm calling value alignment? more below) is way harder than personal intent alignment. So I'm curious where our thinking differs.  I think people are going to want instruction following, not inferring intent from other sources, because they won't trust the AGI to accurately infer intent that's not explicitly stated.  I know that's long been considered terribly dangerous; if you tell your AGI to prevent cancer, it will kill all the humans who could get cancer (or other literal genie hijinks). I think those fears are not realistic with a early-stage AGI in a slow takeoff. With an AGI not too far from human level, it would take time to do something big like cure cancer, so you'd want to have a conversation about how it understands the goal and what methods it will use before letting it use time and resources to research and make plans, and again before they're executed (and probably many times in the middle). And even LLMs infer intent from instructions pretty well; they know that cure cancer means not killing the host.  In that same slow takeoff scenario that seems likely, concerns about it getting complex inference wrong are much more realistic. Humanity doesn't know what its intent is, so that machine would have to be quite competent to deduce it correctly. The first AGIs seem likely to not be that smart at launch. The critical piece is that short-term personal intent includes an explicit instruction for the AGI to shut down for re-alignment; humanity's intent will never be that specific, except for cases where it's obvious that an action will be disastrous; and if the AGI understands that, it wouldn't take that action anyway. So it seems to me that personal intent alignment allows a much better chance of adjusting imperfect alignment. I discuss this more in Instruction-following AGI is easier and more likely than value aligned AGI. With regard to your "collective inte

It feels to me like this post is treating AIs as functions from a first state of the universe to a second state of the universe. Which in a sense, anything is... but, I think that the tendency to simplification happens internally, where they operate more as functions from (digital) inputs to (digital) outputs. If you view an AI as a function from an digital input to a digital output, I don't think goals targeting specific configurations of the universe are simple at all and don't think decomposability over space/time/possible worlds are criteria that would lead to something simple.

Load More