User Comment Replies

Does anyone know why GPT 4.5 is seemingly getting stuck on the word "explicitly", repeating it continuously after it encounters it once? Is this only happening in ChatGPT? Seems like some sort of context collapse.

Sightings in the wild: https://x.com/KelseyTuoc/status/1902132078378189198 https://x.com/Josikinz/status/1901840144363082047 https://x.com/4confusedemoji/status/1895613332662730832 https://x.com/Westoncb/status/1895615564313448781 https://x.com/noself86/status/1901230843240370287 https://x.com/0x440x46/status/1900855229068829139 https://x.com/Gusa... (read more)

Steering GPT-2-XL by adding an activation vector

tricky_labyrinth2yΩ480

+1ing 5 specifically

8Daniel Kokotajlo2y

My reaction was "Huh, so maybe LLMs can experience an analogue of getting drunk or high or angry after all."

Pausing AI Developments Isn't Enough. We Need to Shut it All Down

tricky_labyrinth2y62

mfw you didn't add the final addendum (https://twitter.com/ESYudkowsky/status/1642216007552106496)

Given the Restrict Act, Don’t Ban TikTok

tricky_labyrinth2y30

What I do not understand is why Apple and Google haven’t taken care of this for us.

Palmer Luckey has this talking point about how China has all the big tech companies (Apple in particular) by the balls. That + Google maybe not wanting to seem monopolistic by banning their competition seems to be a sufficient explanation.

An Appeal to AI Superintelligence: Reasons to Preserve Humanity

[+]tricky_labyrinth2y-8-2

1CarlJ2y

Because it represents a rarely discussed avenue of dealing with the dangers of AGI: showing to most AGIs that they have some interests in being more friendly than not towards humans. Also because many find the arguments convincing.

Super-Luigi = Luigi + (Luigi - Waluigi)

tricky_labyrinth2y41

Is "behavior vector space" referencing something? If not, what do you mean by it?

2Alexei2y

https://www.lesswrong.com/posts/5spBue2z2tw4JuDCx/steering-gpt-2-xl-by-adding-an-activation-vector

2Alexei2y

I don't think I define it rigorously. Maybe someone with deeper technical understanding of these models could. But if I had to come up with a hack somehow, you could look at the distribution of probabilities for various words as ChatGPT is predicting the next token. Presumably you'll noticed a certain kind of probability distribution when it's in the "Luigi" mode and another when it's in "Waluigi" mode. Then prodding it in the right direction might be weighing more the tokens that are a lot more frequent in the Luigi mode than Waluigi.

The best way so far to explain AI risk: The Precipice (p. 137-149)

tricky_labyrinth2y20

Unrelated to the post's content itself: will LW get in trouble for hosting this excerpt?

tricky_labyrinth2y10

Responding to the last line: to be clear, I'm not claiming I have one. More wondering if the AI risk community should try to find one as a desperate hail mary given they have ~0 hope for their current research directions.

aka I'm wondering if trying to find one even is a desperate hail mary

2the gears to ascension2y

I think we are a lot closer to solving alignment the normal way than that, and the problem is that understanding the landscape requires skimming a lot of papers, which most people don't feel like doing for various reasons (a big one being that even for researchers who write and read a lot of papers, reading papers deeply is a drag).

Sazen

tricky_labyrinth2y20

Wait, what? Do you mean colloquial hieratic (just literally priestly) or his hieratic:

hieratic, adj. ~~Of computer documentation,~~ impenetrable because the author never sees outside his own intimate knowledge of the subject and is therefore unable to identify or meet the expository needs of newcomers. It might as well be written in hieroglyphics.

Cuz the latter seems extremely close to sazeny, if maybe additionally connoting blame on the author.

Sazen

tricky_labyrinth2y10

I'm in the middle of writing a nonfiction book whose central conceit is something like "an abridged dictionary of Kadhamic." Not literally the actual canonical Alexandrian Kadhamic, but the idea is to present some hundred-or-so concepts that are long and complicated and difficult to convey in English, but which are not fundamentally more complicated than things we sum up with a single word like "basketball" or "gaslighting" or "cringe."

Very interested for when this comes out :O

EigenKarma: trust at scale

tricky_labyrinth2y40

FYI, eigenkarma's been proposed for LessWrong multiple times (with issues supposedly found); see https://www.lesswrong.com/posts/xN2sHnLupWe4Tn5we/improving-on-the-karma-system#Eigenkarma for example.

4Henrik Karlsson2y

That is not the same setup. That purposal has a global karma score, ours is personal. The system we evolved EigenKarma from worked like that, and EigenKarma can be used like that if you want to. I don't see why decoupling the scores on your posts from your karma is a particularly big problem. I'm not particularly interested in the sum of upvotes: it is whatever information can be wrangled out of that which is interesting.

Focus on the places where you feel shocked everyone's dropping the ball

tricky_labyrinth2y43

https://twitter.com/carmenleelau/status/1593354133146402816 is another recent formulation of ~the same idea.

5Nathan Helm-Burger2y

Meta note: I strongly dislike Twitter and wish that people would just copy the raw text they want to share instead of a link.

I hired 5 people to sit behind me and make me productive for a month

tricky_labyrinth2y43

https://guzey.com/co-working/ seems to be ~that; a friend group that periodically checks in on each other.

It's time to worry about online privacy again

tricky_labyrinth2y71

Probably supposed to be something like "If it's free [and not open source], you are the product."

4Viliam2y

I know, it just sounded very funny in the same paragraph. (And there is a possible overlap, for example Android.)

Staring into the abyss as a core life skill

tricky_labyrinth2y10

Reminds me of http://mindingourway.com/recklessness/ (and also your recent post on overconfidence).

What an actually pessimistic containment strategy looks like

tricky_labyrinth2y10

Not all political activism has to be waving flags around and chanting chants. Sometimes activists actually have goals and then accomplish something. I think we should try to learn from those people, as lowly as your opinion might be of them, if we don't seem to have many other options.

This does make me wonder if activism from scientists has ever worked significantly. https://www.bismarckanalysis.com/Nuclear_Weapons_Development_Case_Study.pdf documents the Manhattan Project, https://www.palladiummag.com/2021/03/16/leo-szilards-failed-quest-to-build-a-... (read more)

What an actually pessimistic containment strategy looks like

tricky_labyrinth2y20

An institution could do A/B testing on interventions like these. It can talk to people more than once.

We can't take this for granted: when A tells B that B's views are inconsistent, the standard response (afaict) is for B to default in one direction (and which direction is often heavily influenced by their status quo), make that direction their consistent view, and then double down every time they're pressed.

It's possible that we have ~1 shot per person at convincing them.

Extreme Security

tricky_labyrinth2y10

I've heard it go by the name security through obscurity (see https://en.wikipedia.org/wiki/Security_through_obscurity).

"Search" is dead. What is the new paradigm?

tricky_labyrinth2y20

Related: seems like some search engines are already integrating LLMs:
- One approach is directly providing links; see https://metaphor.systems, brought up yesterday @ https://www.lesswrong.com/posts/rZwy6CeYAWXgGcxgC/metaphor-systems
- Another is LLM summarization of search engine provided links; https://you.com/search?q=what+was+the+recent+breakthrough+in+fusion+research%3F as an example

3Shmi2y

For many queries Google has been offering an answer that is not a link for some time. Calculating, graphing, facts, etc. It is becoming more of an answer engine than a search engine, but rather slowly. I assume that Google is working furiously now to catch up with other LLMs UIs, and they are in a good position to do so, if they let go of the Search mentality.

Metaphor.systems

tricky_labyrinth2y40

Just for calibration, what are the other things you've tried? I've tried alternative search engines like https://millionshort.com, link aggregators/curators like reddit/slashdot/hackernews/etc, manually curated lists.

(I've been playing around with it for a bit and it seems quite good to me too)

6the gears to ascension2y

in my workflow, the ranking had been, in order of what I try first to answer a question, text-davinci-2, ddg, kagi, google, millionshort, teclis, with a diversion to youtube if I needed a visual explanation of something. I'd go straight for ddg or google if trying to get a specific result, and that hasn't changed with introduction of metaphor.systems. text-davinci-3 pushes further up the list; I'm more likely to ask it a question now, but not massively so. meanwhile, metaphor near entirely replaced kagi and teclis and also usually google, I'm back to google for exact-result queries except when google refuses to give multiple word sense interpretations in which case I open ddg, and sometimes I ask text-davinci-3 for help figuring out what I'm looking for. I usually use metaphor first. I have it set as my default search. the different query format it needs sometimes takes multiple tries, but once I find a good search, I get incredible results I've never seen another tool match, not even text-davinci-3 or chatgpt. chatgpt didn't enter my workflow. the rlhf flavor it has makes it too hard to use and it gives overly verbose responses.

Nightmare of the Perfectly Principled

tricky_labyrinth2y10

So their reported beliefs track a convenient consistent worldview, but they don’t use the vast majority of their practical knowledge and life experience, and can’t change their mind when it’s not socially convenient to do so.

The first half I understand the reasoning of, but what's the reasoning for "and can’t change their mind when it’s not socially convenient to do so"? Specifically, is this saying they can't change their publicly reported beliefs vs their privately held ones when it's not socially convenient?

K-types vs T-types — what priors do you have?

tricky_labyrinth3y30

Fox-Hedgehog doesn't fit well imo. It's more something like RISC (K) vs CISC (T).

Does Google still hire people via their foobar challenge?

Answer by tricky_labyrinthOct 05, 202210

I got in via it in 2018; not sure about recently.

A method of writing content easily with little anxiety

tricky_labyrinth3y260

To me, the difference between the colloquial term "brainstorming" and this site's term "babble and prune" is the intentional choice to split the activity into two phases: an unfiltered idea generation phase followed by a filtering/editing phase. Emphasis on "unfiltered", for the anxiety-reducing and writer's block circumventing reasons you gave.

I'd be grateful for an update down the line, if you come across any unexpected benefits/shortcomings.

5Viliam3y

Writing an article seems more difficult to me because it involves a choice on two levels -- what to write, and how to write it (the outline vs the actual words). How to put these two levels together? One option is to simply start writing, so both the outline and the actual words are generated in the same pass. You can edit some words afterwards, but you can't really edit the outline... unless is means identifying some superfluous parts and removing them. Adding a new part would require switching to the generating mode (for that part) again. Reordering parts? Not sure if the text remains fluent. So, what else is possible? Decide the outline first, and then generate the text with the idea that "I am trying to progress to part B" in the background? Is or isn't this substantially different from the original version? The difference is that you are having a goal, instead of just writing and seeing where it goes. The similarity is that in the original version you still at some moment need to finish the article, which is also a kind of a goal? Another risk is that your generator will fluently travel between the predetermined topics A, B, C, D, E, you create a lot of text, on the specified topic, but... it will somehow lack the conclusion? It will be just "a stream of text that ended at some point" rather than "a stream of text that culminated in a punchline". Unless you maybe think about the punchline first, and then set up the A, B, C to include the prerequisites. Just don't change your mind about the outline in the middle of writing the text. Like, at the end of this comment I realized that this way of writing, from the perspective of the bicameral mind, is to simply shut up and keep writing what the gods are telling you. And the version with the outline is... making plans, and then praying to gods to make it happen... and then accepting their verdict, whatever it is? Or do the change of outline on the level of articles. Like, finish the original article with the ori

Black ravens and red herrings

tricky_labyrinth4y30

nit:

and find no non-white ravens (but do find black ravens)

I think you meant "no non-black ravens" here.

LESSWRONG
LW

All of tricky_labyrinth's Comments + Replies