LESSWRONG
LW

All of ArthurB's Comments + Replies

Implications of the inference scaling paradigm for AI safety

Interestingly o1-pro is not available for their team plan which offers the guarantee that they do not train on your data. I'm pretty sure they are losing money on o1-pro and it's available purely to gather data.

1Zedmor1mo

That's exactly o1 full is not available in API as well. Why not make money on something you already have?

Fake Utility Functions

ArthurB1y20

Popular with Silicon Valley VCs 16 years later: just maximize the rate of entropy creation🤦🏻‍♂️

Air Conditioner Test Results & Discussion

ArthurB1y60

#e/ac

Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent

ArthurB2y20

We have a winner! laserfiche's entry is the best (and only, but that doesn't mean it's not good quality) submission, and they win $5K.

Code and demo will be posted soon.

1laserfiche2y

Thank you Arthur. I'd like to offer my help on continuing to develop this project, and helping any of the other teams (@ccstan99, @johnathan, and others) on their projects. We're all working towards the same thing. PM me, and let me know if there are any other forums (Discord, Slack, etc) where people are actively working on or need programming help for AI risk mitigation.

Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent

ArthurB2y10

Exactly. As for the cost issue, the code can be deployed as:

- Twitter bots (registered as such) so the deployer controls the cost

- A webpage that charges you a small payment (via crypto or credit card) to run 100 queries. Such websites can actually be generated by ChatGPT4 so it's an easy lift. Useful for people who truly want to learn or who want to get good arguments for online argumentation

- A webpage with captchas and reasonable rate limits to keep cost small

Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent

ArthurB2y66

In general yes, here no. My impression from reading LW is that many people suffer from a great deal of analysis paralysis and are taking too few chances, especially given that the default isn't looking great.

There is such a thing as doing a dumb thing because it feels like doing something (e.g. let's make AI Open!) but this ain't it. The consequences of this project are not going to be huge (talking to people) but you might get a nice little gradient read as to how helpful it is and iterate from there.

Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent

ArthurB2y50

It should be possible to ask content owners for permission and get pretty far with that.

Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent

ArthurB2y40

AFAIK what character.ai does is fine tuning, with their own language models, which aren't at parity with ChatGPT. Using a better language model will yield better answers but, MUCH MORE IMPORTANTLY, what I'm suggesting is NOT fine tuning.

What I'm suggesting gives you an answer that's closer to a summary of relevant bits of LW, Arbital, etc. The failure mode is much more likely to be that the answer is irrelevant or off the mark than it being at odds with prevalent viewpoints on this platform.

Think more interpolating over an FAQ, and less reproducing someone's cognition.

Humans are very reliable agents

ArthurB3y300

The US has around one traffic fatality per 100 million miles driven; if a human driver makes 100 decisions per mile

A human driver does not make 100 "life or death decisions" per mile. They make many more decisions, most of which can easily be corrected, if wrong, by another decision.

The statistic is misleading though in that it includes people who text, drunk drivers, tired drivers. The performance of a well rested human driver that's paying attention to the road is much, much higher than that. And that's really the bar that matters for self driving car, you don't want a car that is doing better than the average driver who - hey you never know - could be a drunk.

6Douglas_Knight3y

Yes, the median driver is much better than the mean driver. But what matters is the mean, not the median. If we can replace all drivers by robots, what matters is whether the robot is better than the mean human. Of course, it's not all at once. It's about the marginal change. What matters is the mean taxi driver, who probably isn't drunk. Another margin is the expansion of taxis: if robotaxis are cheaper than human taxis, the expansion of taxis may well be replacing drunk or tired drivers.

Godzilla Strategies

ArthurB3y122

Fixing hardware failures in software is literally how quantum computing is supposed to work, and it's clearly not a silly idea.

Generally speaking, there's a lot of appeal to intuition here, but I don't find it convincing. This isn't good for Tokyo property prices? Well maybe, but how good of a heuristic is that when Mechagodzilla is on its way regardless.

2lc3y

Quantum computing is not a silly idea in principle, in that it couldn't be done, it is just much harder for our first, critical try.

AGI Ruin: A List of Lethalities

ArthurB3y31

In addition

There aren't that many actors in the lead.
Simple but key insights in AI (e.g doing backprop, using sensible weight initialisation) have been missed for decades.

If the right tail for the time to AGI by a single group can be long and there aren't that many groups, convincing one group to slow down / paying more attention to safety can have big effects.

How big of an effect? Years doesn't seem off the table. Eliezer suggests 6 months dismissively. But add a couple years here and a couple years there, and pretty soon you're talking about the pos... (read more)

8lc3y

It's of use at least inasmuch as it increases my life expectancy.

AGI Ruin: A List of Lethalities

ArthurB3y9-1

There are IMO in-distribution ways of successfully destroying much of the computing overhang. It's not easy by any means, but on a scale where "the Mossad pulling off Stuxnet" is 0 and "build self replicating nanobots" is 10, I think it's is closer to a 1.5.

How I Lost 100 Pounds Using TDT

ArthurB14y90

Indeed, there is nothing irrational (in an epistemic way) about having hyperbolic time preference. However, this means that a classical decision algorithm is not conducive to achieving long term goals.

One way around this problem is to use TDT, another way is to modify your preferences to be geometric.

A geometric time preference is a bit like a moral preference... it's a para-preference. Not something you want in the first place, but something you benefit from wanting when interacting with other agents (including your future self).

preferences:decision theory :: data:code

ArthurB14y00

The second dot point is part of the problem description. You're saying it's irrelevant, but you can't just parachute a payoff matrix where causality goes backward in time.

Find any example you like, as long as they're physically possible, you'll either have the payoff tied to your decision algorithm (Newcomb's) or to your preference set (Solomon's).

preferences:decision theory :: data:code

ArthurB14y10

I'm making a simple, logical argument. If it's wrong, it should be trivial to debunk. You're relying on an outside view to judge; it is pretty weak.

As I've clearly said, I'm entirely aware that I'm making a rather controversial claim. I never bother to post on lesswrong, so I'm clearly not whoring for attention or anything like that. Look at it this way, in order to present my point despite it being so unorthodox, I have to be pretty damn sure it's solid.

preferences:decision theory :: data:code

ArthurB14y00

That's certainly possible, it's also possible that you do not understand the argument.

To make things absolutely clear, I'm relying on the following definition of EDT

Policy that picks action a = argmax( Sum( P( Wj | W, ai ). U( Wj ), j ) , i ) Where {ai} are the possible actions, W is the state of the world, P( W' | W, a ) the probability of moving to state of the world W' after doing a, and U is the utility function.

I believe the argument I made in the case of Solomon's problem is the clearest and strongest, would you care to rebut it?

I've challenged yo... (read more)

0wedrifid14y

The combination: * Uncontraversial understanding by academic orthodoxy * General position by those on lesswrong * My parsing of your post * Observation of your attempts to back up your argument when it was not found to be persuasive by myself or others ... is sufficient to give rather high confidence levels. It really is a huge claim you are making, to dismiss the understanding of basically the rest of the world regarding how CDT and EDT apply to the trivial toy problems that were designed to test them. There is altogether too much deduction of causal mechanisms involved in your "EDT" reasoning. And the deductions involved rely on a premise (the second dot point) that just isn't a part of either the problem or 'genes'.

preferences:decision theory :: data:code

ArthurB14y00

Yes, the causality is from the decision process to the reward. The decision process may or may not be known to the agent, but its preferences are (data can be read, but the code can only be read if introspection is available).

You can and should self-modify to prefer acting in such a way that you would benefit from others predicting you would act a certain way. You get one-boxing behavior in Newcomb's and this is still CDT/EDT (which are really equivalent, as shown).

Yes, you could implement this behavior in the decision algorithm itself, and yes this is very much isomorphic. Evolution's way to implement better cooperation has been to implement moral preferences though, it feels like a more natural design.

0wedrifid14y

I suggest that what was 'shown' was that you do not understand the difference between CDT and EDT.

preferences:decision theory :: data:code

ArthurB14y00

Typo, I do mean that EDT two boxes.

preferences:decision theory :: data:code

ArthurB14y20

According to wikipedia, the definition of EDT is

Evidential decision theory is a school of thought within decision theory according to which the best action is the one which, conditional on your having chosen it, gives you the best expectations for the outcome.

This is not the same as "being a randomly chosen member of a group of people..." and I've explained why. The information about group membership is contained in the filtration.

preferences:decision theory :: data:code

ArthurB14y-20

You're saying EDT causes you not to chew gum because cancer gives you EDT? Where does the gum appear in the equation?

preferences:decision theory :: data:code

ArthurB14y00

The claim is generally that EDT chooses not to chew gum.

1Oscar_Cunningham14y

Thanks, fixed.

preferences:decision theory :: data:code

ArthurB14y00

No it can't. If you use a given decision theory, your actions are entirely determined by your preferences and your sensory inputs.

2Oscar_Cunningham14y

wedrifid might well be making the point that your genes determine your choice, via your decision theory. i.e. Your genes give you EDT, and then EDT makes you not chew gum. I'm not sure how that affects the argument though.

preferences:decision theory :: data:code

ArthurB14y-10

But that's not how EDT works - your modification amounts to a totally different algorithm, which you've conveniently named "EDT".

EDT measures expected value after the action has been taken, but the output of EDT has no reason to be ignored by EDT if it is relevant to the calculation.

...then Omega's prediction is that EDT will two-box and oops - goodbye prize.

It loses, but it is generally claimed that EDT one boxes.

preferences:decision theory :: data:code

ArthurB14y20

This case is handled in the previous sentence. If this is your actual decision, and your actual decision is the product of a decision algorithm, then your decision algorithm is not EDT.

To put it another way, is your decision to chew gum determined by EDT our by your genes? Pick one.

1wedrifid14y

It can be both. Causation is not exclusionary. I'm suggesting that you are mistaken about the aforementioned handling.

Outlawing Anthropics: An Updateless Dilemma

ArthurB15y-10

As it's been pointed out, this is not an anthropic problem, however there still is a paradox. I'm may be stating the obvious, but the root of the problem is that you're doing something fishy when you say that the other people will think the same way and that your decision will theirs.

The proper way to make a decision is to have a probability distribution on the code of the other agents (which will include their prior on your code). From this I believe (but can't prove) that you will take the correct course of action.

Newcomb like problem fall in the same category, the trick is that there is always a belief about someone's decision making hidden in the problem.

Mathematical simplicity bias and exponential functions

ArthurB16y10

Hum no you haven't. The approximation depends on the scale of course.

Mathematical simplicity bias and exponential functions

ArthurB16y00

Indeed.

But I may have gotten "scale" wrong here. If we scale the error at the same time as we scale the part we're looking at, then differentiability is necessary and sufficient. If we're concerned about approximating the function, on a smallish part, then continuous is what we're looking for.

Mathematical simplicity bias and exponential functions

ArthurB16y10

ok, but with this definition of "approximate", a piecewise linear function with finitely many pieces cannot approximate the Weierstrass function.

The original question is whether a continuous function can be approximated by a linear function at a small enough scale. The answer is yes.

If you want the error to decrease linearly with scale, then continuous is not sufficient of course.

-2SforSingularity16y

I think we have just established that the answer is no... for the definition of "approximate" that you gave...

Mathematical simplicity bias and exponential functions

ArthurB16y10

I defined approximate in an other comment.

Approximate around x : for every epsilon > 0, there is a neighborhood of x over which the absolute difference between the approximation and the approximation function is always lower than epsilon.

Adding a slop to a small segment doesn't help or hurt the ability to make a local approximation, so continuous is both sufficient and necessary.

1SforSingularity16y

ok, but with this definition of "approximate", a piecewise linear function with finitely many pieces cannot approximate the Weierstrass function. Furthermore, two nonidentical functions f and g cannot approximate each other. Just choose, for a given x, epsilon less than f(x) and g(x); then no matter how small your neighbourhood is, |f(x) - g(x)| > epsilon.

Mathematical simplicity bias and exponential functions

ArthurB16y20

that is because our eyes cannot see nowhere differentiable functions

That is because they are approximated by piecewise linear functions.

Consider that when you look at a "picture" of the Weierstrass function and pick a point on it, you would swear to yourself that the curve happens to be "going up" at that point. Think about that for a second: the function isn't differentialble - it isn't "going" anywhere at that point!

It means on any point you can't make a linear approximation whose precision increases like the inverse of the scale, it doesn't mean you can't approximate.

0SforSingularity16y

taboo "approximate" and restate.

Mathematical simplicity bias and exponential functions

ArthurB16y10

No he's right. The Weierstrass function can be approximated with a piecewise linear function. It's obvious, pick N equally spaced points and join then linearly. For N big enough, you won't see the difference. It means that is is becoming infinitesimally small as N gets bigger.

-1SforSingularity16y

that is because our eyes cannot see nowhere differentiable functions, so a "picture" of the Weierstrass function is some piecewise linear function that is used as a human-readable symbol for it. Consider that when you look at a "picture" of the Weierstrass function and pick a point on it, you would swear to yourself that the curve happens to be "going up" at that point. Think about that for a second: the function isn't differentialble - it isn't "going" anywhere at that point!

0SforSingularity16y

that's because you can't "see" the The Weierstrass function in the first place, because our eyes cannot see functions that are everywhere (or almost everywhere) nondifferentiable. When you look at a picture of the The Weierstrass function on google image search, you are looking at a piecewise linear approaximation of it. Hence, if you compare what you see on google image search with a piecewise linear approaximation of it, they will look the same...

Mathematical simplicity bias and exponential functions

ArthurB16y20

Question is, what do you mean "approximately".

If you mean, for any error size, the supremum of distance between the linear approximation and the function is lower than this error for all scales smaller than a given scale, then a necessary and sufficient condition is "continuous". Differentiable is merely sufficient.

When the function is differentiable, you can make claims on how fast the error decreases asymptotically with scale.

0Johnicholas16y

And if you use the ArthurB definition of "approximately" (which is an excellent definition for many purposes), then a piecewise constant function would do just as well.

Why You're Stuck in a Narrative

ArthurB16y20

An explanation cannot increase your knowledge.Your knowledge can only increase by observation. Increasing your knowledge is a decision theory problem (exploration/exploitation for example).

Phlogiston explains why some categories of things burn and some don't. Phlogiston predicts that dry wood will always burn when heated to a certain temperature. Phlogiston explains why different kind of things burn as opposed to sometime burn and sometimes not burn. It explains that if you separate a piece of woods in smaller pieces, every smaller piece will also burn.

T... (read more)

1[anonymous]16y

Edited my reply to correct and clarify (though I'll pass on debating the merits of phlogiston theory). After re-reading your original comment (it took me a while to parse it) I generally agree with your points. In particular I think "The bug is discarding the rest of the probability distribution" is a good way of summarizing the problem, and something I'll be mulling over.

Why You're Stuck in a Narrative

ArthurB16y60

It seems to me that a narrative is generally a maximum likelihood explanation behind an event. If you observe two weird events, an explanation that links them is more likely than an explanation that doesn't. That's why causality is such a great explanation mechanism. I don't think making narratives is a bug. The bug is discarding the rest of the probability distribution... we are bad are remembering complex multimodal distributions.

Sometimes, a narrative will even add unnecessary details and it looks like a paradox (the explanation would be more likely wit... (read more)

2[anonymous]16y

That portion could probably stand to be clarified - at the very least I should provide a link to what I'm referring to: http://yudkowsky.net/rational/technical The point is to make your explanations have the possibility to increase your knowledge, rather than just satisfy your explanation-itch. If they can equally explain all outcomes, they aren't really explanations. To use Eliezer's favorite example, phlogiston "feels" like an explanation for why things burn - but it doesn't actually effect what you expect to see happen in the world.

Shut Up And Guess

ArthurB16y00

Yes, if there are two or more options and the score function depends only on the probability assigned to the correct outcome, then the only proper function is log. You can see that with the equation I gave

f0' (x) = (k - x.f1' (x))/(1-x)

for f0 = 0, it means x.f1'(x) = -k thus f1(x) = -k ln(x) + c (necessary condition)

Then you have to check that -k ln(x) + c indeed works for some k and c, that is left as an exercise for the reader ^^

Shut Up And Guess

ArthurB16y10

No.

I am assuming the student has a distribution in mind and we want to design a scoring rule where the best strategy to maximize the expected score is to write in the distribution you have in mind.

If there are n options and the right answer is i and you give log(n p_i) / log(n) points to the student, then his incentive is to write in the exact distribution. On the other hand, if you give him say p_i* point, his incentive would be to write in "1" for the most likely answer and 0 otherwise.

Another way to score is not to give point only on p_i but ... (read more)

0Benya16y

Ok, so you're saying the total score the student gets is f1(q_i*) + Sum_(i /= i*) f0(q_i)? I didn't understand that from your original post, sorry. So does "(if) he score for a wrong answer was 0 (...) the only proper score function is the log" mean that if there are more than two options, log is the only proper score function that depends only on the probability assigned to the correct outcome, not on the way the rest of the probability mass is distributed among the other options? Or am I still misunderstanding?

Fairness and Geometry

ArthurB16y00

The equivalence class of the utility function should be the set of monotonous function of a canonical element.

However, what von Neumann-Morgenstern shows under mild assumptions is that for each class of utility functions, there is a subset of utility functions generated by the affine transforms of a single canonical element for which you can make decisions by computing expected utility. Therefore, looking at the set of all affine transforms of such an utility function really is the same as looking at the whole class. Still, it doesn't make utility commensurable.

Fairness and Geometry

ArthurB16y-10

A speck in Adam's eye vs Eve being tortured is not a utility comparison but a happiness comparison. Happiness is hard to compare but can be compared because it is a state, utility is an ordering function. There is no utility meter.

Shut Up And Guess

ArthurB16y00

You're correct. In the previous post given, it was somehow assumed that the score for a wrong answer was 0. In that case, the only proper score function is the log.

If you have a score function f1(q) for the right answer f0(q) for the wrong answer, and there are n possible choices, the right p are critical only if

f0' (x) = (k - x.f1' (x))/(1-x)

if we set f1(x) = 1 - (1-x)^p we can set f0(x) = -(1-x)^p + (1-x)^(p-1) * p/(p-1)

for p = 2, we find f0(x) = -(1-x)^2 + 2(1-x) = 1 - x^2 this is Brier score for p = 3, we find f0(x) = -(1-x)^3 + (1-x)^2 3/2 = x^3 - 3x^2/2

1-(1-x)^3 and x^3-3*x^2/2 shall be known as ArthurB's score

0Benya16y

I'm not following your calculations exactly, so please correct me if I'm misunderstanding, but it seems that you are assuming that the student chooses an option and a confidence for that option? My understanding was that the student chooses a probability distribution over all options and is scored on that. As for how to extend the Brier score to more than two options, I'm not sure whether there's a standard way to do that, but one could always limit oneself to true/false questions... (in the log case you simply score log q_i, where q_i is the probability the student put on the correct answer, of course)

Shut Up And Guess

ArthurB16y20

Good thing with a log score rule is that if the student try to maximize the expected score, they should write in their belief.

For the same reason, when confronted with a set of odds on the outcome of an event, betting on each outcome in proportion to your belief will maximize the log of the expected gain (regardless of what the current odds are)

3Benya16y

Unless I'm misunderstanding something, this is true for the Brier score, too: http://en.wikipedia.org/wiki/Scoring_rule#Proper_score_functions

Timeless Decision Theory: Problems I Can't Solve

ArthurB16y00

I confess I do not grasp the problem well enough to see where the problem lies in my comment. I am trying to formalize the problem, and I think the formalism I describe is sensible.

Once again, I'll reword it but I think you'll still find it too vague : to win, one must act rationally and the set of possible action includes modifying one's code.

The question was

My timeless decision theory only functions in cases where the other agents' decisions can be viewed as functions of one argument, that argument being your own choice in that particular case - eith

... (read more)

0Vladimir_Nesov16y

Thanks, this comment makes your point clearer. See cousin_it's post Re-formalizing PD.

Shut Up And Guess

ArthurB16y30

On this page, the cumulative refers to the probability of obtaining at most p successes. You want to run it with 30 and 9 which gives you the right answer, 2.14%

Or you could put in 30 and 20 which gives you the complement.

What is lower than 1% is the probability of getting 8 or less right answers.

1Scott Alexander16y

Oh, I see how they did that. Thanks. Original post edited.

Timeless Decision Theory: Problems I Can't Solve

ArthurB16y00

Well, if you want practicality, I think Omega problems can be disregarded, they're not realistic. It seems that the only feature needed for the real world is the ability to make trusted promises as we encounter the need to make them.

If we are not concerned with practicality but the theoretical problem behind these paradoxes, the key is that other agents make prediction on your behavior, which is the same as saying they have a theory of mind, which is simply a belief distribution over your own code.

To win, you should take the actions that make their belief... (read more)

0Vladimir_Nesov16y

Again, if you can state same with precision, it could be valuable, while on this level my reply is "So?".

Shut Up And Guess

ArthurB16y70

I think you got your math wrong

If you get 20 out of 30 questions wrong, you are break even, therefore the probability of losing points by guessing is

Sum( (i 30), i = 21..30) / 2^30 ~ 2.14% > 1%

0Scott Alexander16y

You're probably right, because I haven't done a problem like this since forever, but help me figure out what I did wrong. I found a binomial distribution calculator (this is binomial distribution, right?), entered 30 trials, 21 "successes", (counting a false answer as a success, and agreeing with you that 20 is break even so you need 21 to do worse than even) and .5 probability of success, and it said the cumulative probability was .9919... against, therefore <1%.

Timeless Decision Theory: Problems I Can't Solve

ArthurB16y00

Instead of assuming that other will behave as a function of our choice, we look at the rest of the universe (including other sentient being, including Omega) as a system where our own code is part of the data.

Given a prior on physics, there is a well defined code that maximizes our expected utility.

That code wins. It one boxes, it pays Omega when the coin falls on heads etc.

I think this solves the infinite regress problem, albeit in a very unpractical way,

0Vladimir_Nesov16y

This doesn't sound obviously wrong, but is too vague even for an informal answer.

Media bias

ArthurB16y30

I find that going to the original paper generally does the trick. When an idea is new, the author will spell the details more carefully.

SilasBarta mentions difficulty with Boltzmann machines, Ackley et al.'s article is actually quite detailed, including a proof of the learning algorithm http://tinyurl.com/q4azfl

Dialectical Bootstrapping

ArthurB16y10

It would be interesting to try the experiment with Versed. You remove the dialectical aspect (steps 2,3,4) but you keep the wisdom of the crowd aspect.

1Johnicholas16y

By "Versed", are you referring to the drug Midazolam? Is there a particular reason that you picked that drug rather than, say, alcohol?

The Wrath of Kahneman

ArthurB16y00

There's an ambiguity here. You're talking about valuing something like world justice, I was talking about valuing acting justly. In particular, I believe that if optimal deterrence is unjust, it is also unjust to seek it.

Why does this relate to the subject again? Well, my point is we should not change our sense of justice. It's tautological.

The Wrath of Kahneman

ArthurB16y00

Your decision making works as a value scale, morality not so much.There is a subset of actions you can take which are just. If you do not give a high weight in acting justly, you're a dangerous person.

2[anonymous]16y

The reverse is probably more true. If I give a high weight to acting justly I'll grab the nearest Claymore, get some blue face paint and scream "You can take my life but you can not take my freedom!" If I don't value justice I'll suck up to the new power and grab my piece of the new pie. That's a role someone was bound to fill. I'll be irrelevant. People who value justice highly are implicitly harder to intimidate. They're harder to shame into compliance. They are less willing to subbordinate their Just wrath to gains in social standing. Sure, they don't steal cookies, but they're dangerous.

6thomblake16y

Thank you.

The Wrath of Kahneman

ArthurB16y00

When you say we "should" change our sense of justice, you're making a normative statement because no specific goal is specified.

In this case, it seems wrong. Our sense of justice is part of our morality, therefore we should not change it.

"We should seek justice" is tautological. If justice and optimal deterrence are contradictory, then we should not seek optimal deterrence.

1[anonymous]16y

I have no premise "if something is part of our morality we shouldn't change it". No it isn't. See Thomblake's reply. I for one feel no particular attachement to justice over optimal deterrence. In fact, in many situations I actively give the latter precedence. You can keep your 'shoulds' while I go ahead and win my Risk games.

4thomblake16y

"Justice" is said in many ways. Yes, it tends to be normative; however, values can be weighed against one another. I value candy, but "I should seek candy" is far from tautological. Justice, in particular, rides rather far down my hierarchy of values.