All of Gurkenglas's Comments + Replies

Re first, yep, I missed that :(. M does sound like a more worthy barrier than U. Do you have a working example of a (U,M) where some state machine performs well in a manner that's hard to detect?

Re second, I realized that this only allows discrete utilities but didn't think to therefore try a π' that does an exhaustive search over policies ^^. (I assume you are setting "uncomputable to measure performance because that involves the Solomonoff prior" aside here.) Even so, undecidability of whether 000... and 111... get the same utility sounds like a bug. Wha... (read more)

Regarding 17.4.Open:

Consider π' which try all state machines up to a size and imitate the one that performs best on (U,M); this would tighten the O(nlogn) bound to O(BB^-1(n)).

This fails because your utility functions return constructive real numbers, which don't implement comparison. I suggest that you make it possible to compare utilities.[1]

In which case we get: Within every decidable machine class where every member halts, agents are uncomputably smol.

 

  1. ^

    Such as by

    making P(s,s') return the order of U(s) and U(s').

2Vanessa Kosoy
First, it's uncomputable to measure performance because that involves the Solomonoff prior. You can approximate it if you know some bits of Chaitin's constant, but that brings a penalty into the description complexity. Second, I think that saying that comparison is computable means that the utility is only allowed to depend on a finite number of time steps, it rules out even geometric time discount. For such utility functions, the optimal policy has finite description complexity, so g is upper bounded. I doubt that's useful.

If you didn't feel comfortable running it overnight, why did you publish the instructions for replicating it?

2niplav
I had a conversation with Claude 3.6 Sonnet about this, and together we concluded that the worry was overblown. I should've added that in, together with a justification.
4kave
Looks like the base url is supposed to be niplav.site. I'll change that now (FYI @niplav)

I'm hoping more for some stepping stones between the pre-theoretic concept of "structural" and the fully formalized 99%-clause. If we could measure structuralness more directly we should be able to get away with less complexity in the rest of the conjecture.

7Eric Neyman
Thanks, this is a good question. My suspicion is that we could replace "99%" with "all but exponentially small probability in n". I also suspect that you could replace it with 1−ε, with the stipulation that the length of π (or the running time of V) will depend on ε. But I'm not exactly sure how I expect it to depend on ε -- for instance, it might be exponential in 1/ε. My basic intuition is that the closer you make 99% to 1, the smaller the number of circuits that V is allowed to say "look non-random" (i.e. are flagged for some advice π). And so V is forced to do more thorough checks ("is it actually non-random in the sort of way that could lead to P being true?") before outputting 1.   99% is just a kind-of lazy way to sidestep all of these considerations and state a conjecture that's "spicy" (many theoretical computer scientists think our conjecture is false) without claiming too much / getting bogged down in the details of how the "all but a small fraction of circuits" thing depends on n or the length of π or the runtime of V.

Ultimately, though, we are interested in finding a verifier that accepts or rejects  based on a structural explanation of the circuit; our no-coincidence conjecture is our best attempt to formalize that claim, even if it is imperfect.

Can you say more about what made you decide to go with the 99% clause? Did you consider any alternatives?

3Alibi
Reading the post, I also felt like 99% was kind of an arbitrary number. I would have expected it to be something like: for all $\epsilon > 0$ there exists a $V$ such that ... $1-\epsilon$ of random circuits satisfy ...

This does go in the direction of refuting it, but they'd still need to argue that linear probes improve with scale faster than they do for other queries; a larger model means there are more possible linear probes to pick the best from.

3Matrice Jacobine
I don't see why it should improve faster. It's generally held that the increase in interpretability in larger models is due to larger models having better representations (that's why we prefer larger models in the first place), why should it be any different in scale for normative representations?

I had that vibe from the abstract, but I can try to guess at a specific hypothesis that also explains their data: Instead of a model developing preferences as it grows up, it models an Assistant character's preferences from the start, but their elicitation techniques work better on larger models; for small models they produce lots of noise.

7Matrice Jacobine
This interpretation is straightforwardly refuted (insofar as it makes any positivist sense) by the success of the parametric approach in "Internal Utility Representations" being also correlated with model size.

Ah, oops. I think I got confused by the absence of L_2 syntax in your formula for FVU_B. (I agree that FVU_A is more principled ^^.)

2StefanHex
Oops, fixed!

https://github.com/jbloomAus/SAELens/blob/main/sae_lens/evals.py#L511 sums the numerator and denominator separately, if they aren't doing that in some other place probably just file a bug report?

2StefanHex
I think this is the sum over the vector dimension, but not over the samples. The sum (mean) over samples is taken later in this line which happens after the division metrics[f"{metric_name}"] = torch.cat(metric_values).mean().item() Edit: And to clarify, my impression is that people think of this as alternative definitions of FVU and you got to pick one, rather than one being right and one being a bug. Edit2: And I'm in touch with the SAEBench authors about making a PR to change this / add both options (and by extension probably doing the same in SAELens); though I won't mind if anyone else does it!

Thanks, edited. If we keep this going we'll have more authors than users x)

2MondSemmel
You're making a very generous offer of your time and expertise here. However, to me your post still feels way, way more confusing than it should be. Suggestions & feedback: * Title: "Get your math consultations here!" -> "I'm offering free math consultations for programmers!" or similar. * Or something else entirely. I'm particularly confused how your title (math consultations) leads into the rest of the post (debuggers and programming). * First paragraph: As your first sentence, mention your actual, concrete offer (something like "You screenshare as you do your daily tinkering, I watch for algorithmic or theoretical squiggles that cost you compute or accuracy or maintainability." from your original post, though ideally with much less jargon). Also your target audience: math people? Programmers? AI safety people? Others? * "click the free https://calendly.com/gurkenglas/consultation link" -> What you mean is: "click this link for my free consultations". What I read is a dark pattern à la: "this link is free, but the consultations are paid". Suggested phrasing: something like "you can book a free consultation with me at this link" * Overall writing quality * Assuming all your users would be as happy as the commenters you mentioned, it seems to me like the writing quality of these posts of yours might be several levels below your skill as a programmer and teacher. In which case it's no wonder that you don't get more uptake. * Suggestion 1: feed the post into an LLM and ask it for writing feedback. * Suggestion 2: imagine you're a LW user in your target audience, whoever that is, and you're seeing the post "Get your math consultations here!" in the LW homepage feed, written by an unknown author. Do people in your target audience understand what your post is about, enough to click on the post if they would benefit from it? Then once they click and read the first paragraph, do they understand what it's about and click on the link if they would benefit f

Account settings let you set mentions to notify you by email :)

The action space is too large for this to be infeasible, but at a 101 level, if the Sun spun fast enough it would come apart, and angular momentum is conserved so it's easy to add gradually.

Can this program that you've shown to exist be explicitly constructed?

I'd like to do either side of this! Which I say in public to have an opportunity to advertise that https://www.lesswrong.com/posts/MHqwi8kzwaWD8wEQc/would-you-like-me-to-debug-your-math remains open.

Hang up a tear-off calendar?

(You can find his ten mentions of that ~hashtag via the looking glass on thezvi.substack.com. huh, less regular than I thought.)

Zvi's AI newsletter, latest installment https://www.lesswrong.com/posts/LBzRWoTQagRnbPWG4/ai-93-happy-tuesday, has a regular segment Pick Up the Phone arguing against this.

1rosehadshar
Thanks! Fwiw I agree with Zvi on "At a minimum, let’s not fire off a starting gun to a race that we might well not win, even if all of humanity wasn’t very likely to lose it, over a ‘missile gap’ style lie that we are somehow not currently in the lead."
1Tom Davidson
I think the argument for combining separate US and Chinese projects into one global project is probably stronger than the argument for centralising US development. That's because racing between US companies can potentially be handled by USG regulation, but racing between US and China can't be similarly handled.  OTOH, the 'info security' benefits of centralisation mostly wouldn't apply
3rosehadshar
My main take here is that it seems really unlikely that the US and China would agree to work together on this.  
0Tyler Tracy
I like a global project idea more, but I think it still has issues. * A global project would likely eliminate the racing concerns. * A global project would have fewer infosec issues. Hopefully, most state actors who could steal the weights are bought into the project and wouldn't attack it. * Power concentration seems worse since more actors would have varying interests. Some countries would likely have ideological differences and might try to seize power over the project. Various checks and balances might be able to remedy this.
3AnthonyC
In some ways, this would be better if you can get universal buy-in, since there wouldn't be a race for completion. There might be a race for alignment to particular subgroups? Which could be better or worse, depending. Also, securing it against bringing insights and know-how back to a clandestine single-nation competitor seems like it would be very difficult. Like, if we had this kind of project being built, do I really believe there won't be spies telling underground data centers and teams of researchers in Moscow and Washington everything it learns? And that governments will consistently put more effort into the shared project than the secret one?
1Seth Herd
That would seem to be better. As long a Putin and similar don't get root access to an AGI as a result.

https://www.google.com/search?q=spx futures

I was specifically looking at Nov 5th 0:00-6:00, which twitched enough to show aliveness, while manifold and polymarket moved in smooth synchrony.

As the prediction markets on Trump winning went from ~50% to ~100% over 6 hours, S&P 500 futures moved less than the rest of the time. Why?

2aphyer
Were whichever markets you're looking at open at this time? Most stuff doesn't trade that much out of hours.

The public will Goodhart any metric you hand over to it. If you provide evaluation as a service, you will know how many attempts an AI lab made at your test.

If you say heads every time, half of all futures contain you; likewise with tails.

3Dana
I've updated my comment. You are correct as long as you pre-commit to a single answer beforehand, not if you are making the decision after waking up. The only reason pre-committing to heads works, though, is because it completely removes the Tuesday interview from the experiment. She will no longer be awoken on Tuesday, even if the result is tails. So, this doesn't really seem to be in the spirit of the experiment in my opinion. I suppose the same pre-commit logic holds if you say the correct response gets (1/coin-side-wake-up-count) * value per response though.
1Ape in the coat
Probability is not some vaguely defined similarity cluster like "sound". It's a mathematical function that has specific properties. Not all of them are solely about betting. We can dissolve the semantic disagreement between halfers and thirders and figure out that they are talking about two different functions p and p' with subtly different properties while producing the same betting odds.  This in itself, however, doesn't resolve the actual question: which of these functions fits the strict mathematical notion of probability for the Sleeping Beauty experiment and which doesn't. This question has an answer.
Answer by Gurkenglas185

What is going to be done with these numbers? If Sleeping Beauty is to gamble her money, she should accept the same betting odds as a thirder. If she has to decide which coinflip result kills her, she should be ambivalent like a halfer.

2Ape in the coat
Betting argument are tangential here. https://www.lesswrong.com/posts/cvCQgFFmELuyord7a/beauty-and-the-bets The disagreement is how to factorise expected utility function into probability and utility, not which bets to make. This disagreement is still tangible, because the way you define your functions have meaningfull consequences for your mathematical reasoning.
3Dana
Halfer makes sense if you pre-commit to a single answer before the coin-flip, but not if you are making the decisions independently after each wake-up event. If you say heads, you have a 50% chance of surviving when asked on Monday, and a 0% chance of surviving when asked on Tuesday. If you say tails, you have a 50% chance of surviving Monday and a 100% chance of surviving Tuesday.
2DragonGod
I mean I think the "gamble her money" interpretation is just a different question. It doesn't feel to me like a different notion of what probability means, but just betting on a fair coin but with asymmetric payoffs. The second question feels closer to actually an accurate interpretation of what probability means.

Your experiment is contaminated: If a piece of training document said that AI texts are overly verbose, and then announced that the following is a piece of AI-written text, it'd be a natural guess that the document would continue with overly verbose text, and so that's what an autocomplete engine will generate.

Due to RLHF, AI is no longer cleanly modelled as an autocomplete engine, but the point stands. For science, you could try having AI assist in the writing of an article making the opposite claim :).

4Richard_Kennaway
I did that once. I have not tried asking GPT-4o to write concisely, but then, the “writers” of the articles I have in mind clearly haven’t. There are a few within the last few days. I’m sure people can guess which they are.
3dirk
In my experience, they talk like that regardless of the claim being made unless I specifically prompt for a different writing style (which has mixed success).

Among monotonic, boolean quantifiers that don't ignore their input, exists is maximal because it returns true as often as possible; forall is minimal because it returns true as rarely as possible.

For concreteness, let's say the basic income is the same in every city, same for a paraplegic or Elon Musk. Anyone who can vote gets it, it's a dividend on your share of the country.

I am surprised at section 3; I don't remember anyone who seriously argues that women should be dependent on men. By amusing coincidence, my last paragraph makes your reasoning out of scope; you can abolish women's suffrage in a separate bill.

In section 5, you are led astray by assuming a fixed demand for labor. You notice that we have yet to become obsolete. Well, of course: Fo... (read more)

1Zero Contradictions
That’s because almost nobody views humans through a biological realism worldview. For more info, see: Understanding Biological Realism, https://zerocontradictions.net/#bio-realism. In this case, Family and Society in particular is probably the best introduction, out of each of the essays in the list. https://thewaywardaxolotl.blogspot.com/2014/04/family-and-society.html I’m also not the author of that essay (otherwise it would have my name on it), but I do agree with it, aside from a few caveats. Anyway, women’s suffrage is irrelevant to what the essay was explaining. It does not propose to abolish woman suffrage, nor does the author advocate for that. As for cars replacing horses, humanity would’ve been wealthier, more prosperous, and more eco-friendly if walkable cities and high-speed rail were built instead of cars. But I understand the point that you were making. https://www.reddit.com/r/fuckcars/wiki/faq

factor out alpha

⌊x⌋ is floor(x), the greatest integer that's at most x.

2Zvi
Yeah, I didn't see the symbol properly, I've edited.

People with sufficiently good models of each other to use them in their social protocols.

I'd call those absences of drawbacks, not benefits - you would have had them without the job.

hazel110

The other side of this post is to look at what various jobs cost. TIme and effort are the usual costs, but some jobs ask for things like willingness to deal with bullshit (a limited resource!), emotional energy, on-call readiness, various kinds of sensory or moral discomfort, and other things. 

2Viliam
Haha, that's absolutely correct! But without the job I wouldn't get paid. So I guess the standard deal is getting paid in return for a set of things, and I dream about getting paid for a subset. I mean, in theory, the employer should care about getting the work done, being there to fix the bugs and provide support, being available in case something else happens, and maybe a few more things... but spending most of my time in an open space is just unnecessary suffering for an introverted person, and a financial expense for the employer, so... haha, nope. For some reason it is important to be surrounded by other people, even when I happen to be the only person on my project (or the only team member not from India).

I was alone in a room of computers, and I had set out to take no positive action but grading homework. I ended up sitting and pacing and occasionally moving the mouse in the direction it would need to go next. What I remember of what my mind was on was the misery of the situation.

4Declan Molony
I concur. The crux, for me, is whether or not I want to do the particular task.  If I want to do the task, say writing, but I'm not feeling motivated, then enough time being bored will eventually create for me the conditions to be more interested in writing than in staying bored. If I do not want to do the task, say my taxes, then boredom or doing nothing may actually be preferable. In this case, boredom is not a sufficient motivator and I need to cognitively reframe how I'm thinking of the task and how to approach it. I wrote about this in a previous post, Facts vs Interpretations—An Exercise in Cognitive Reframing. Bludgeoning myself with normative "shoulds/oughts" is, in my opinion, a subpar coping mechanism compared to reframing my thoughts to better align with the task so that I'll want to do it.

I tried that for a weekend once. I did nothing.

2Inosen Infinity
I'm curious to know more. Could you describe your environment and your actions in more detail? Were you in a place with absolutely nothing to do or was there at least something to turn your attention to?  How did you spend that day -- were you, say, staring at a blank wall or lying on a sofa or walking around the room or something else? And what was in your mind? Even if you accomplished nothing that day, did you perhaps think of some ideas on your topics of interest?

It has been pointed out to me that no, what this presumably means is the past decisions of the patients.

 

Q2 Is it ethically permissible to consider an individual’s past decisions when determining their
access to medical resources?

Gurkenglas5843

You assume the conclusion:

A lot of the AI alignment success seems to me stem from the question of whether the problem is easy or not, and is not very elastic to human effort.

AI races are bad because they select for contestants that put in less alignment effort.

7niplav
I do assume that not being in a race lowers the probability of doom by 5%, and that MAGIC can lower it by more than two shannon (from 10% to 2%). Maybe it was a mistake of mine to put the elasticity front and center, since this is actuall quite elastic. I guess it could be more elastic than that, but my intuition is skeptical.

Sure, he's trying to cause alarm via alleged excerpts from his life. Surely society should have some way to move to a state of alarm iff that's appropriate, do you see a better protocol than this one?

Recall that every vector space is the finitely supported functions from some set to ℝ, and every Hilbert space is the square-integrable functions from some measure space to ℝ.

I'm guessing that similarly, the physical theory that you're putting in terms of maximizing entropy lies in a large class of "Bostock" theories such that we could put each of them in terms of maximizing entropy, by warping the space with respect to which we're computing entropy. Do you have an idea of the operators and properties that define a Bostock theory?

that thing about affine transformations

If the purpose of a utility function is to provide evidence about the behavior of the group, we can preprocess the data structure into that form: Suppose Alice may update the distribution over group decisions by ε. Then the direction she pushes in is her utility function, and the constraints "add up to 100%" and "size ε" cancel out the "affine transformation" degrees of freedom. Now such directions can be added up.

Let's investigate whether functions must necessarily contain an agent in order to do sufficiently useful cognitive work. Pick some function of which an oracle would let you save the world.

Hmmmm. What if I said "an enumeration of the first-order theory of (union(Q,{our number}),<)"? Then any number can claim to be equal to one of the constants.

If Earth had intelligent species with different minds, an LLM could end up identical to a member of at most one of them.

Is the idea that "they seceded because we broke their veto" is more of a casus belli than "we can't break their veto"?

Sure! Fortunately, while you can use this to prove any rational real innocent of being irrational, you can't use this to prove any irrational real guilty of being irrational, since every first-order formula can only check against finitely many constants.

2AlexMennen
Something that I think it unsatisfying about this is that the rationals aren't previleged as a countable dense subset of the reals; it just happens to be a convenient one. The completions of the diadic rationals, the rationals, and the algebraic real numbers are all the same. But if you require that an element of the completion, if equal to an element of the countable set being completed, must eventually certify this equality, then the completions of the diadic rationals, rationals, and algebraic reals are all constructively inequivalent.

Chaitin's constant, right. I should have taken my own advice and said "an enumeration of all properties of our number that can be written in the first-order logic (Q,<)".

2AlexMennen
This means that, in particular, if your real happens to be rational, you can produce the fact that it is equal to some particular rational number. Neither Cauchy reals nor Dedekind reals have this property.

Oh, I misunderstood the point of your first paragraph. What if we require an enumeration of all rationals our number is greater than?

2jessicata
With just that you could get upper bounds for the real. You could get some lower bounds by showing all rationals in the enumeration are greater than some rational, but this isn't always possible to do, so maybe your type includes things that aren't real numbers with provable lower bounds. If you require both then we're back at the situation where, if there's a constructive proof that the enumerations min/max to the same value, you can get a Cauchy real out of this, and perhaps these are equivalent.

If you want to transfer definitions into another context (constructive, in this case), you should treat such concrete, intuitive properties as theorems, not axioms, because the abstract formulation will generalize further. (remark: "close" is about distances, not order.)

If constructivism adds a degree of freedom in the definition of convergence, I'd try to use it to rescue the theorem that the Dedekindorder and Cauchydistance structures on ℚ agree about the completion. Potential rewards include survival of the theory built on top and evidence about the ide... (read more)

Load More