All of SomeoneYouOnceKnew's Comments + Replies

This was an extremely enjoyable read.

Good fun.

My comment may be considered low effort, but this is a fascinating article. Thank you for posting it.

While I find the Socrates analogy vivid and effective, I propose considering critics on posts under the same bucket as lawyers. Where Socrates had a certain set of so-called principles-- choosing to die for arbitrary reasons, I find that most people are not half as dogmatic as Socrates, and so the analogy/metaphor seems to slip short.

While my post is sitting at negative two, and no comments or feedback... Modeling commenters as if they were lawyers might be better? When the rules lawyers have to follow shows up, lawyers (usually) do change their behavior, ... (read more)

Part of the problem with verifying this is that the number of machine learning people who got into machine learning due to lesswrong. We need more machine learning people whom were able to come to doom conclusions of their own accord, independent of hpmor etc, as a control group.

As far as I can tell, the number worried about doom overlap 1:1 with lesswrong posters/readers, and if it was such a threat, we'd expect there to be some number of people coming to the conclusions independently/of their own accord.

Is knowing someone's being an asshole an aspect of hyperstition?

1zenbu zenbu zenbu zenbu
This is difficult to pick apart. Can you say more? I could imagine, say, thinking of someone's contributions as assholeish introducing sentiment to the conversation in a way that cascades into making the whole thing even worse. In fact, I think I've seen that happen over and over again. I can imagine a framing of that as hyperstitional. 

I met an ex-Oracle sales guy-turned medium-bigwig at other companies once. 

He justified it by calling it "selling ahead", and it started because the reality is that if you tell customers no, you don't get the deal. They told the customers they would have requested features. The devs would later get notice when the deal was signed, and no one on management ever complained, and everyone else on his "team" was doing it.

How do we measure intent? 

Unless you mean to say a person who actively and verbally attempts to shun the truth?

1M. Y. Zuo
Through their actions. 

Any preferred critiques of Graham's Hierarchy of Disagreements?

https://en.m.wikipedia.org/wiki/File:Graham's_Hierarchy_of_Disagreement-en.svg

Extra layers or reorderings?

1Yoav Ravid
Varieties Of Argumentative Experience by Scott Alexander

Does the data note whether the shift is among new machine learning researchers? Among those who have a p(Doom) > 5%, I wonder how many would come to that conclusion without having read lesswrong or the associated rationalist fiction.

6Zach Stein-Perlman
The dataset is public and includes a question "how long have you worked in" the "AI research area [you have] worked in for the longest time," so you could check something related!

I'm avoiding terms like "epistemic" and "consequential" and such in this answer, and instead attempting to give a colloquial one, to what I think is the spiritual question. 

(I'm also deliberately avoiding iterating over the harms of blind traditionalism and religious thinking. Assuming since you're atheist, and you don't reject most of the criticisms of religion)

(Also also, I am being brief. For more detail I would point you at the library, to go reading on Christianity's role for the rise of the working and uneducated classes in the 1600s-1800s, and ... (read more)

Perhaps "term" is the wrong, ahem, term. 

Maybe you want "metrics"? There's lots of non-GDP metrics that could be used to track ai's impact on the world. 

Instead of the failure mode of saying "well, GDP didn't track typists being replaced with computers," maybe the flipside question is "what metrics would have shown typists being replaced?"

1Tricular
Looks like a pretty good alternative, thanks! But I just realized that goals actually have some properties that I care about that themes don't have - they really narrow your focus.

What material policy changes are being advocated for, here? I am having trouble imagining how this won't turn into a witch-hunt.

If you find yourself leaning into conspiracy theories, one should consider whether they're stuck in a particular genre and need to artificially inject more variety into their intellectual, media, and audio diets.

Confirmation bias leads to one feeling like every song has the same melody, but there are many other modes of thought, and, imo, sliding into a slot that checks more boxes with <ingroup> is an indicator our information feeds/sensors are bad more than that we are stumbling on truth.

Not gonna lie, I lost track of the argument on this line of comments, but pushing back on word-bloat is good.

Thanks! Though, hm.

Now I'm noodling how one would measure goodharting.

What kind of information would you look out for, that would make you change your mind about alignment-by-default?

What information would cause you to inverse again? What information would cause you to adjust 50% down? 25%?

I know that most probability mass is some measure of gutfeel, and I don't want to introduce nickel-and-diming, more get a feel for what information you're looking for.

2Noosphere89
A major way I could be more pessimistic is if the deceptive alignment is 1% likely post was wrong in several aspects, and if that happened, I'd probably revise my beliefs back to where it was originally, at 30-60%. Another way I could be wrong is if evidence came out that there are more ways for models to be deceptive than the standard scenario of deceptive alignment. Finally, if the natural abstractions hypothesis didn't hold, or if goodharting was empiricaly shown to get worse with model capabilities, then I'd update towards quite a bit more pessimism, on the order of at least 20-30% less confidence than I currently hold.

Do you believe encouraging the site maintainers to implement degamification techniques on the site would help with your criticisms?

When you can predict that beliefs won't update towards convergence, you're predicting a mutual lack of respect and a mutual lack of effort to figure out whose lack of respect is misplaced.

 

Are you saying that the interlocutors should instead change to attempting to resolve their lack of mutual respect? 

5jimmy
Whether it's worth working to resolve any disagreement over appropriate levels of respect is going to depend on the context, but certainly below a certain threshold object level discourse becomes predictably futile. And certainly high levels of respect are *really nice*, and allow for much more efficient communication because people are actually taking each other seriously and engaging with each other's perspective. There's definitely important caveats, but I generally agree with the idea that mutual respect and the ability to sort out disagreements about the appropriate level of respect are worth deliberately cultivating. Certainly if I am in a disagreement that I'd like to actually resolve and I'm not being taken as seriously as I think I ought to be, I'm going to seek to understand why, and see if I can't pass their "ideological test" on the matter.

As a relatively new person to lesswrong, I agree.

The number of conversations which I've read which end in either party noticeably updating one way or the other have been relatively rare. The one point I'm not sure if I agree with is being able to predict a particular disagreement is a problem?

I suppose being able to predict the exact way in which your interlocutors will disagree is the problem? If you can foresee someone disagreeing in a particular way, and then accounting for it in your argument, and then they disagree anyway, in the exact way you tried to address, that's generally just bad faith.

(though sometimes I do skim posts, by god)

1jimmy
Introducing "arguments" and "bad faith" can complicate and confuse things, and neither are necessary. As a simple model, say we're predicting whether the next ball drawn from an urn is black, and we've each seen our own set of draws. When I learn that your initial prediction is a higher probability than mine, I can infer that you've seen a higher ratio of black than I have, so in order to take that into account I should increase my own probability of black. But how much? Maybe I don't know how many draws you've witnessed. On the next iteration, maybe they say "Oh shoot, you said 30%? In that case I'm going to drop my guess from 95% to 35%". In that case, they're telling you that they expect you've seen many more draws than them. Alternatively, they could say "I guess I'll update from 95% to 94%", telling you the opposite. If you knew in advance which side of your new estimate they were likely to end up on, then you could have taken that into account last time, and updated further/less far accordingly until you can't expect to know what you will learn next time. If you *know* that they're going to stick to 95% and not update based on your guess, then you know they don't view your beliefs as saying much. If *that* doesn't change your mind and make you think "Wow, they must really know the answer then!" and update to 95%, then you don't view their beliefs as saying much either. When you can predict that beliefs won't update towards convergence, you're predicting a mutual lack of respect and a mutual lack of effort to figure out whose lack of respect is misplaced.

I try to ask myself whether the tenor of what I'm saying overshadows definitional specificity, and how I can provide a better mood or angle. If my argument is not atonal - if my points line up coherently, such that a willing ear will hear, definitionalist debates should slide on by.

As a descriptivist, rather than a prescriptivist, it really sucks to have to fall back on Socratic methods of pre-establishing definitions, except in highly-technical locations.

Thus, I prefer to avoid arguments which hinge on definitions altogether. This doesn't preclude example... (read more)

2[anonymous]
TAI seems like a partially good example for illustrating my point: I agree that it's crucial that people have the same thing in mind when debating about TAI in a discussion, but I also think it's important to recognize that the goal of the discussion is (probably!) not "how should everyone everywhere define TAI" and instead is probably something like "when will we first see 'TAI.'" In that case, you should just choose whichever definition of TAI makes for a good, productive discussion, rather than trying to forcefully hammer out "the definition" of TAI. I say partially good, however, because thankfully the term TAI has not taken such historically established root in people's minds and in dictionaries, so I think (hope!) most people accept there is not "a (single) definition." Words like "science," "leadership," "Middle East," and "ethics," however... not the same story 😩🤖

I don't think there's any place quite like lesswrong on the entire internet. It's a lot of fun to read, but it tends to be pretty one-note, and even if there is discord in lesswrong's song, it's far more controlled, Eru Illuvitar's hand can yet be felt, if not seen. (edit: that is to say, it's all the same song)

For the most part, people are generally-tolerant of Christians. There is even a Catholic who teaches (taught?) at the Center For Applied Rationality, and there's a few other rationalist-atheists who hopped to christianity, though I can't remember th... (read more)

I don't agree, but for a separate reason from trevor.

Highly-upvoted posts are a signal of what the community agrees with or disagrees with, and I think being able to more easily track down karma would cause reddit-style internet-points seeking. How many people are hooked on Twitter likes/view counts?

Or "ratio'd".

Making it easier to track these stats would be counterproductive, imo.

It seems pretty standard knowledge among pollsters that even the ordering of questions can change a response. It seems pretty blatantly obvious that if we know who a commenter is, that we will extend them more or less charity.

Even if the people maintaining the site don't want to hide votes + author name on comments and posts, it would be nice if user name + votes were moved to the bottom. I would like to at least be presented with the option to vote after I have read a comment, not before.

7gwern
https://github.com/mherreshoff/lw-antikibitzer

Re: Papers- I'm aware of papers like you're alluding to, though I haven't been that impressed.

The reason why I don't want a scratch-space, is because I view scratch space and context equivalent to giving the ai a notecard that it can peek at. I'm not against having extra categories or asterisks for the different kinds of ai for the small test.

Thinking aloud and giving it scratch space would mean it's likely to be a lot more tractable for interpretability and alignment research, I'll grant you that.

I appreciate the feedback, and I will think about your points more, though I'm not sure if I will agree.

Given my priors + understanding of startups/silicon valley culture, it sounds more like Openai started to leave the startup phase and is entering into the "profit-seeking" phase of running the company. After they had the entire rug pulled under them by stable diffusion, I would expect them to get strategic whiplash, and then decide to button things up.

The addition of a non-compete clause within their terms of service and the deal with Microsoft seems to hint towards that. They'll announce GPT-3.5, and "next-gen language models", but it doesn't match my priors that they would hold back GPT-4 if they had it.

Time may tell, however!

What I'm asking with this particular test is, can an ai play blindfold chess, without using a context in order to recant every move in the game?

 https://en.wikipedia.org/wiki/Blindfold_chess

2faul_sname
What exactly do you mean by "without using a context"? If you mean "without the fine-tuned language model ever dumping the context into the output stream in practice in inference mode", I would be extremely surprised if that was not possible. If you mean "without the fine-tuned language model being trained to be able to dump the context into the output stream at any point", I'm less confident. For the sake of clarity, the approach I am trying is to fine-tune an openai language model (specifically babbage, since I'm not made of money) to simulate a command-line chess program, adding one command at a time, including several commands (get-square-content, is-legal-move, etc) that will ideally never show up in inference mode. If things go as I expect, the end result will be a "language" model which would look like the following to interact with [player] c4 [computer] e5 [player] g3 [computer] nf6 [player] bg2 [computer] d5 [player] bh1 Illegal move: h1 is occupied [player] bxh1 Illegal move: cannot capture a piece of your own color [player] cxd5 [computer] Nxd5 (this could be further fine-tuned to always chop off the first three lines, so it just starts with [player] always). If control returns to the user whenever the model outputs [player], that becomes a chess chatbot. Would a fine-tuned openai babbage that produced output like the above, and did not output illegal moves (outside of circumstances like "game too long for context window" or "prompt injection") count as an instance of the thing you believe is not possible, or am I misunderstanding? (note: my expectation is that such a thing is possible but would be wildly inefficient, since it would have to reconstruct the board state for each token it wants to output, and also not likely to play good chess. But I think legal chess would probably still be an attainable goal).

I'm confused. What I'm referring to here is https://en.wikipedia.org/wiki/Blindfold_chess

I'm not sure why we shouldn't expect an ai to be able to do well at it?

2paulfchristiano
But humans play blindfold chess much slower than they read/write moves, they take tons of cognitive actions between each move. And at least when I play blindfold chess I need to lean heavily on my visual memory, and I often need to go back over the game so far for error-correction purposes, laboriously reading and writing to a mental scratchspace. I don't know if better players do that. But an AI can do completely fine at the task by writing to an internal scratchspace. You are defining a restriction on what kind of AI is allowed, and I'm saying that human cognition probably doesn't satisfy the analogous restrictions. I think to learn to play blindfold chess humans need to explicitly think about cognitive strategies, and the activity is much more similar to equipping an LM with the ability to write to its own context and then having it reason aloud about how to use that ability.

My proposed experiment / test is trying to avoid analogizing humans, but rather scope out places where the ai can't do very well. I'd like to avoid accidentally overly-narrow-scoping the vision of the tests. It won't work with an ai network where the weights are reset every time.

An alternative, albeit massively-larger-scale experiment might be:

Will a self-driving car ever be able to navigate from one end of a city to another, using street signs and just learning the streets by exploring it?

A test of this might be like the following: 

  1. Randomly generate
... (read more)
8paulfchristiano
Just seems worth flagging that humans couldn't do the chess test, and that there's no particular reason to think that transformative AI could either.

I just wanted to break up the mood a bit. Reading everything here is like listening to a band stuck in the same key.

Unfortunately, what I am proposing is not possible with current language models, as they don't work like that.

4faul_sname
Interesting. My mental model says that by fine tuning a language model to be ready to output the contents of any square of a chess board, it should be possible to make it keep a model of that chess board even without making it output that model in inference mode. I think I will have to try it out to see if it works.

I was rereading and noticed places where I was getting a bit too close to insulting the reader. I've edited a couple places and will try to iron out the worst spots.

If it's bad enough I'll retract the article and move it back to drafts or something. Idk.

3npostavs
I don't understand what's the point of all the swearing? It's just kind of annoying to read.

So long as the "buffer" is a set of parameters/weights/neurons, that would fit my test.

1[anonymous]
it makes sense for the buffer to be searchable.  So at any given time only some information is actually provided as an input parameter to the model, but as it "thinks" serially it can order a search. For example, a task like "write a large computer program".  It cannot remember all the variable names and interfaces to the other parts of the program it is not working on, but needs to locate them whenever it calls on them.

I wonder how long it's going to be until you can get an LLM which can do the following with 100% accuracy. 

I don't care about the ai winning or losing, in fact, I would leave that information to the side. I don't care if this test is synthetic, either. What I want is:

  1. The ai can play chess in a way that can play as normal humans do - obeys rules, uses pieces normally, etc.
  2. The ai holds within it the entire state of the chess board, and doesn't need a context in order to keep within it the entire state of the board. (ie, it's playing blind chess and does
... (read more)
0ReaderM
Not sure what you mean by 100 percent accuracy and of course, you probably already know this but 3.5 Instruct Turbo plays chess at about 1800 ELO fulfilling your constraints (and has about 5 illegal moves (potentially less) in 8205) https://github.com/adamkarvonen/chess_gpt_eval
7paulfchristiano
The chess "board vision" task is extraordinarily hard for humans who are spending 1 second per token and not using an external scratchspace. It's not trivial for an untrained human even if they spend multiple seconds per token. (I can do it only by using my visual field, e.g. it helps me massively to be looking at a blank 8 x 8 chessboard because it gives a place for the visuals to live and minimizes off-by-one errors.) Humans would solve this prediction task by maintaining an external representation of the state of the board, updating that representation on each move, and then re-reading the representation each time before making a prediction. I think GPT-3.5 will also likely do this if asked to use external tools to make a prediction about the next move. (And of course when we actually play chess we just do it by observing the state of the board, as represented to us by the chess board or chess program, prior to making each move.) It seems like a mistake to analogize a forward pass of the transformer to a human using external tools, if you want to make meaningful comparisons. You might learn something from such a test, but you wouldn't learn much about how AI performance compares to human performance, or when AI might have a transformative impact.
4faul_sname
This feels to me like the sort of thing that should be possible to do using openai's fine tuning api.
2[anonymous]
The obvious requirement is for the AI to have a second buffer it can write to as 'scratch' to keep track of the information for tasks like this.  
3Mitchell_Porter
Start by doing this for tic-tac-toe first. 

don't post any version of it that says "I'm sure this will be downvoted"

For sure. The actual post I make will not demonstrate my personal insecurities.

what bet?

I will propose a broad test/bet that will shed light on my claims or give some places to examine.

I think the lesswrong community is wrong about x-risk and many of the problems about ai, and I've got a draft longform with concrete claims that I'm working on...

But I'm sure it'll be downvoted because the bet has goalpost-moving baked in, and lots of goddamn swearing, so that makes me hesitant to post it.

9the gears to ascension
if you think it's low quality, post it, and warn that you think it might be low quality, but like, maybe in less self-dismissive phrasing than "I'm sure it'll be downvoted". I sometimes post "I understand if this gets downvoted - I'm not sure how high quality it is" types of comments. I don't think those are weird or bad, just try to be honest in both directions, don't diss yourself unnecessarily. And anyway, this community is a lot more diverse than you think. it's the rationalist ai doomers who are rationalist ai doomers - not the entire lesswrong alignment community. Those who are paying attention to the research and making headway on the problem, eg wentworth, seem considerably more optimistic. The alarmists have done a good job being alarmists, but there's only so much being an alarmist to do before you need to come back down to being uncertain and try to figure out what's actually true, and I'm not impressed with MIRI lately at all.
4gjm
"the bet" -- what bet? A word of advice: don't post any version of it that says "I'm sure this will be downvoted". Saying that sort of thing is a reliable enough signal of low quality that if your post is actually good then it will get a worse reception than it deserves because of it.

Something possibly missing from the list, is breadth of first-hand experience amidst other cultures. Getting older and meeting people and really getting to know them in such a short lifespan is really, really hard!

And I don't just mean meeting people in the places we already live. Getting out of our towns and countries and living in their worlds? Yeah you can't really do that. Sure, you might be able to move to <Spain> or <the Phillippines> for a couple years, but then you come home.

It's not just death here, but the breadth of experiences we ca... (read more)

Feel free to delete this if it feels off-topic, but on a meta note about discussion norms, I was struck by that meme about C code. Basically, the premise that there is higher code quality when there is swearing.

I was also reading discussions in the linux mailinglists- the discussions there are clear, concise, and frank. And occasionally, people still use scathing terminology and feedback.

I wonder if people would be interested in setting up a few discussion posts where specific norms get called out to "participate in good faith but try to break these specif... (read more)

I read it as "People would use other forms of money for trade if the government fiat ever turns into monopoly money"

As a matter of pure category, yeah, it's more advanced than "don't make stuff up".

I usually see these kinds of guides as an implicit "The community is having problems with these norms"

If you were to ask me "what's the most painful aspect about comments on lesswrong?", it's reading comments that go on for 1k words a piece and neither commenter ever agrees, and it's probably the most spooky part for me as a lurker, and made me hesitant to participation. 

So I guess I misread the intent of the post and why it was boosted? I dunno, are these not proposals ... (read more)

Can you link to another conversation on this site where this occurs?

(Porting and translating comment here, because this post is great):

Goddamn I wish people would just tell me when the fuck they're not willing to fucking budge. It's a fucking waste of time for all parties if we just play ourselves to exhaustion. Fuck, it's okay to not update all at once, goddamn Rome wasn't built in a day.

I propose another discussion norm: committing to being willing to have a crisis of faith in certain discussions and if not, de-stigmatizing admitting when you are, in fact, unwilling to entertain certain ideas or concepts, and participants respecting those.

2Duncan Sabien (Deactivated)
Seems good, but seems like probably not a basic norm? Feels more advanced than "foundational."

I'm going to be frank, and apologize for taking so long to reply, but this sounds like a classic case of naivete and overconfidence.

It's routinely demonstrated that stats can be made to say whatever we want and conclude whatever the person who made them wants, and via techniques like the ones used in p-hacking etc, it should eventually become evident that economics are not exempt from similar effects.

Add in the replication crisis, and you have a recipe for disaster. As such, the barriers you need to clear: "this graph about economics- a field known for att... (read more)

It's details like these that you point out here, which make me SUPER hesitant when reading people making claims about correlating GDP/economy-based metrics with anything else.

What's the original charts base definitions, assumptions, and their error bars? What's their data sources, what assumptions are they making? To look at someone's charts over GDP and then extrapolate and finally go "tech has made no effect", feels naive and short-sighted, at least, from a rational perspective- we should know that these charts tend not to convey as much meaning as we'd like.

Having a default base of being extremely skeptical of sweeping claims based on extrapolations on GDP metrics seems like a prudent default.

1DragonGod
Sounds like cope to me. It feels as if the data doesn't support your position and so you're making a retreat to scepticism. I think it's more likely that the position of yours that isn't supported by the data is just wrong.
Load More