Brendan Long - LessWrong

Questions for old LW members: how have discussions about AI changed compared to 10+ years ago?

I think alignment is easier than I used to, since we can kind-of look into LLM's and find the concepts, which might let us figure out the knobs we need to turn even though we don't know what they are right now (i.e. weirdly enough, there might be a "lie to humans" button and we can just prevent the AI from pushing it). I still think it's unclear if we'll actually do the necessary research fast enough though. Alignment-by-default also seems more likely than I would have expected, although it does seem to be getting worse as we make LLM's larger. I'm not really sure how this has changed within the community since people who don't think AI is a problem don't really post about it.
I think older posts about were mostly arguments about whether things could happen (could you make an oracle that's not an agent, could you keep the AI in a box, is AI even possible, etc.) and now that the AI doomers conclusively won all of those arguments, the discussions are more concrete (discussion of actually-existing AI features).
It depends on what you mean by easier, but my timelines are shorter than they used to be, and I think most people's are.
I'm definitely surprised that glorified decompression engines might be sufficient for AGI. The remaining problems don't really surprise me on top of knowing how they're trained^[1]. I'm guessing the evolutionary AI people are feeling very vindicated though.

^{^}
There's lots of coding training data and not very much training data for creating documents of a specific length. I think if we added a bunch of "Write ### words about X" training data the LLM's would suddenly be good at it.

By penalizing the reward hacks you can identify, you’re training the AI to find reward hacks you can’t detect, and to only do them when you won’t detect them.

I wonder if it would be helpful to penalize deception only if the CoT doesn't admit to it. It might be harder generate test data for this since it's less obvious, but hopefully you'd train the model to be honest in CoT?

I'm thinking of this like the parenting stategy of not punishing children for something bad if they admit unprompted that they did it. Blameless portmortems are also sort-of similar.

Why Should I Assume CCP AGI is Worse Than USG AGI?

Brendan Long9d135

Consider instead that Trump was elected with over 50% of the popular vote. Perhaps there are more fundamental cultural factors at play than the method used to count ballots.

Winning the popular vote in the current system doesn't tell you what would happen in a different system. This is the same mistake people make when they talk about who would have won if we didn't have an electoral college: If we had a different system, candidates would campaign differently and voters would vote differently.

Thoughts on the Double Impact Project

Brendan Long9d20

I doubt this organization could get 501(c) status since it's only purpose is to make political donations (and it only matters if the organization you donate to is 501(c), it doesn't matter if they then re-grant it to another charitable organization). I'm not an expert on this though.

What Makes an AI Startup "Net Positive" for Safety?

Brendan Long14d30

The value of the startup is only loosely correlated with being positive for AI safety (capabilities are valuable, but they're not the only valuable thing). Ideally the startup would be worth billions if and only if AI safety was solved.

Brendan Long's Shortform

Brendan Long14d20

I'd like to learn more Spanish words but have trouble sitting down to actually do language lessons, so I recently set my Claude "personal preferences" to:

Try to teach a random Spanish word in every conversation.

(This is the whole thing)

This has worked surprisingly well, and Claude usually either drops one word in Spanish with a translation midway through a response:

For your specific situation, I recommend a calibración (calibration) approach:

2. Accounting for concurrency: Ensure you're capturing all hilos (threads) involved in query execution, especially for parallel queries.

(From a conversation about benchmarking)

Or it ends the conversation with a fun fact:

¡Palabra en español! "Herramienta" - which means "tool" in Spanish, quite relevant to your search for tools to automate SSH known_hosts management.

La palabra española para hoy es "configurar" - which means "to configure" in English, fitting perfectly with our discussion about configurable thinking limits!

I don't know if this actually useful for learning, but it's fun and worked better than I expected.

My wife tried a similar prompt (although her preferences are much longer) and it made Claude sometimes respond entirely in Spanish, so this could probably be made more specific. If you run into that, maybe try "Response in English but try to teach a random Spanish word in every conversation" would work better?

How worker co-ops can help restore social trust

Brendan Long15d20

It used to be that we had a two-tiered citizenry: one class owned and controlled the nation’s government (the nobility) and one class merely worked for said nation (the laborers). Then we decided that the laborers should also partially own and control the government. However, this practice was not extended to the workplace, which remains in that classic hierarchy to this day; with one class owning and controlling the firm, while the other class merely works for it.

This is not true? There are no legal restrictions on what class of people can own and control firms. Many worker-owned co-ops exist^[1], and even among public corporations, around 40% of stock is held by workers in retirement accounts^[2]. In some industries, it's very common to receive stock as compensation too. A lot of small businesses are tautologically worker-owned since they only have one employee (the owner).

Just because we don't legally mandate that every business is a co-op doesn't mean they aren't legal and don't exist.

^{^}
I suspect very large worker-owned co-ops are uncommon since the value of a slice of ownership goes down as the size increases, but there's no legal restrictions on the size of a co-op.
^{^}
This is an underestimate of stock owned by workers since it doesn't include taxable savings, but it would be hard to separate wage labor from the labor of creating and running companies in taxable accounts. Retirement accounts should be representative of 'normal workers' since there are low per-person caps and it's hard to fund them with anything except wages.

Host Keys and SSHing to EC2

Brendan Long15d20

This post prompted me to look into more general purpose solutions to this, since it seems like "SSH into an IP that's known to be owned by a public cloud" should be fully automated at this point. We know which IP's are part of AWS and we can fetch the host keys securely using the AWS CLI (or helper tools like this). We should be able to do the same over HTTPS for GitHub, Azure, Google Cloud, etc.

It's surprising to me that no one seems to have made a general-purpose CLI or SSH plugin (if that's a thing) for this. Google Cloud has a custom CLI that does this but it obviously only works for their servers.

AI #112: Release the Everything

Brendan Long15d40

I think normal people sort files into folders (and understand filesystems) less than you'd expect. On second thought though, I think you're proposing something less confusing than I initially though. I think a general-purpose memory-category-tagging system would be way too confusing for users, but "you can create conversation categories and memory will only apply to other conversations in that category" is probably reasonable.

AI #112: Release the Everything

Brendan Long15d20

This sounds like the kind of thing power users would like but normal people would find confusing, like how Google+ was really cool for the nerds who were into it, but most people prefer to just have one list of friends on social networks.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments