Regarding 17.4.Open:
Consider π' which try all state machines up to a size and imitate the one that performs best on (U,M); this would tighten the O(nlogn) bound to O(BB^-1(n)).
This fails because your utility functions return constructive real numbers, which don't implement comparison. I suggest that you make it possible to compare utilities.[1]
In which case we get: Within every decidable machine class where every member halts, agents are uncomputably smol.
Such as by
making P(s,s') return the order of U(s) and U(s').
If you didn't feel comfortable running it overnight, why did you publish the instructions for replicating it?
I'm hoping more for some stepping stones between the pre-theoretic concept of "structural" and the fully formalized 99%-clause. If we could measure structuralness more directly we should be able to get away with less complexity in the rest of the conjecture.
Ultimately, though, we are interested in finding a verifier that accepts or rejects based on a structural explanation of the circuit; our no-coincidence conjecture is our best attempt to formalize that claim, even if it is imperfect.
Can you say more about what made you decide to go with the 99% clause? Did you consider any alternatives?
This does go in the direction of refuting it, but they'd still need to argue that linear probes improve with scale faster than they do for other queries; a larger model means there are more possible linear probes to pick the best from.
I had that vibe from the abstract, but I can try to guess at a specific hypothesis that also explains their data: Instead of a model developing preferences as it grows up, it models an Assistant character's preferences from the start, but their elicitation techniques work better on larger models; for small models they produce lots of noise.
Ah, oops. I think I got confused by the absence of L_2 syntax in your formula for FVU_B. (I agree that FVU_A is more principled ^^.)
https://github.com/jbloomAus/SAELens/blob/main/sae_lens/evals.py#L511 sums the numerator and denominator separately, if they aren't doing that in some other place probably just file a bug report?
Thanks, edited. If we keep this going we'll have more authors than users x)
Thanks, edited. Performance is not the only benefit, see https://www.lesswrong.com/posts/MHqwi8kzwaWD8wEQc/would-you-like-me-to-debug-your-math?commentId=CrC2
Account settings let you set mentions to notify you by email :)
The action space is too large for this to be infeasible, but at a 101 level, if the Sun spun fast enough it would come apart, and angular momentum is conserved so it's easy to add gradually.
Can this program that you've shown to exist be explicitly constructed?
I'd like to do either side of this! Which I say in public to have an opportunity to advertise that https://www.lesswrong.com/posts/MHqwi8kzwaWD8wEQc/would-you-like-me-to-debug-your-math remains open.
Hang up a tear-off calendar?
(You can find his ten mentions of that ~hashtag via the looking glass on thezvi.substack.com. huh, less regular than I thought.)
Zvi's AI newsletter, latest installment https://www.lesswrong.com/posts/LBzRWoTQagRnbPWG4/ai-93-happy-tuesday, has a regular segment Pick Up the Phone arguing against this.
Why not just one global project?
https://www.google.com/search?q=spx futures
I was specifically looking at Nov 5th 0:00-6:00, which twitched enough to show aliveness, while manifold and polymarket moved in smooth synchrony.
The public will Goodhart any metric you hand over to it. If you provide evaluation as a service, you will know how many attempts an AI lab made at your test.
If you say heads every time, half of all futures contain you; likewise with tails.
What is going to be done with these numbers? If Sleeping Beauty is to gamble her money, she should accept the same betting odds as a thirder. If she has to decide which coinflip result kills her, she should be ambivalent like a halfer.
Your experiment is contaminated: If a piece of training document said that AI texts are overly verbose, and then announced that the following is a piece of AI-written text, it'd be a natural guess that the document would continue with overly verbose text, and so that's what an autocomplete engine will generate.
Due to RLHF, AI is no longer cleanly modelled as an autocomplete engine, but the point stands. For science, you could try having AI assist in the writing of an article making the opposite claim :).
Ask something only they would know.
Among monotonic, boolean quantifiers that don't ignore their input, exists is maximal because it returns true as often as possible; forall is minimal because it returns true as rarely as possible.
For concreteness, let's say the basic income is the same in every city, same for a paraplegic or Elon Musk. Anyone who can vote gets it, it's a dividend on your share of the country.
I am surprised at section 3; I don't remember anyone who seriously argues that women should be dependent on men. By amusing coincidence, my last paragraph makes your reasoning out of scope; you can abolish women's suffrage in a separate bill.
In section 5, you are led astray by assuming a fixed demand for labor. You notice that we have yet to become obsolete. Well, of course: Fo...
factor out alpha
⌊x⌋ is floor(x), the greatest integer that's at most x.
People with sufficiently good models of each other to use them in their social protocols.
I'd call those absences of drawbacks, not benefits - you would have had them without the job.
The other side of this post is to look at what various jobs cost. TIme and effort are the usual costs, but some jobs ask for things like willingness to deal with bullshit (a limited resource!), emotional energy, on-call readiness, various kinds of sensory or moral discomfort, and other things.
I was alone in a room of computers, and I had set out to take no positive action but grading homework. I ended up sitting and pacing and occasionally moving the mouse in the direction it would need to go next. What I remember of what my mind was on was the misery of the situation.
I tried that for a weekend once. I did nothing.
It has been pointed out to me that no, what this presumably means is the past decisions of the patients.
Q2 Is it ethically permissible to consider an individual’s past decisions when determining their
access to medical resources?
You assume the conclusion:
A lot of the AI alignment success seems to me stem from the question of whether the problem is easy or not, and is not very elastic to human effort.
AI races are bad because they select for contestants that put in less alignment effort.
Sure, he's trying to cause alarm via alleged excerpts from his life. Surely society should have some way to move to a state of alarm iff that's appropriate, do you see a better protocol than this one?
Recall that every vector space is the finitely supported functions from some set to ℝ, and every Hilbert space is the square-integrable functions from some measure space to ℝ.
I'm guessing that similarly, the physical theory that you're putting in terms of maximizing entropy lies in a large class of "Bostock" theories such that we could put each of them in terms of maximizing entropy, by warping the space with respect to which we're computing entropy. Do you have an idea of the operators and properties that define a Bostock theory?
that thing about affine transformations
If the purpose of a utility function is to provide evidence about the behavior of the group, we can preprocess the data structure into that form: Suppose Alice may update the distribution over group decisions by ε. Then the direction she pushes in is her utility function, and the constraints "add up to 100%" and "size ε" cancel out the "affine transformation" degrees of freedom. Now such directions can be added up.
Let's investigate whether functions must necessarily contain an agent in order to do sufficiently useful cognitive work. Pick some function of which an oracle would let you save the world.
Hmmmm. What if I said "an enumeration of the first-order theory of (union(Q,{our number}),<)"? Then any number can claim to be equal to one of the constants.
If Earth had intelligent species with different minds, an LLM could end up identical to a member of at most one of them.
Is the idea that "they seceded because we broke their veto" is more of a casus belli than "we can't break their veto"?
Sure! Fortunately, while you can use this to prove any rational real innocent of being irrational, you can't use this to prove any irrational real guilty of being irrational, since every first-order formula can only check against finitely many constants.
Chaitin's constant, right. I should have taken my own advice and said "an enumeration of all properties of our number that can be written in the first-order logic (Q,<)".
Oh, I misunderstood the point of your first paragraph. What if we require an enumeration of all rationals our number is greater than?
If you want to transfer definitions into another context (constructive, in this case), you should treat such concrete, intuitive properties as theorems, not axioms, because the abstract formulation will generalize further. (remark: "close" is about distances, not order.)
If constructivism adds a degree of freedom in the definition of convergence, I'd try to use it to rescue the theorem that the Dedekindorder and Cauchydistance structures on ℚ agree about the completion. Potential rewards include survival of the theory built on top and evidence about the ide...
Re first, yep, I missed that :(. M does sound like a more worthy barrier than U. Do you have a working example of a (U,M) where some state machine performs well in a manner that's hard to detect?
Re second, I realized that this only allows discrete utilities but didn't think to therefore try a π' that does an exhaustive search over policies ^^. (I assume you are setting "uncomputable to measure performance because that involves the Solomonoff prior" aside here.) Even so, undecidability of whether 000... and 111... get the same utility sounds like a bug. Wha... (read more)