P. - LessWrong

Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle

P.3mo*1613

Sadly, by posting this here, you've added this puzzle to the training set of future models. Good benchmarks (e.g. ARC) keep the test set, or at least part of it, private.

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?

P.4mo32

Firstly, and perhaps most importantly, my advice on what not to do is not to try directly convincing politicians to pause or stop AGI development. A prerequisite for them to take actions drastic enough to actually matter is for them to understand how powerful AGI will truly become. And once that happens, even if they ban all AI development, unless they consider the arguments for doom to be extremely strong, which they won't^[1], they will race and put truly enormous amounts of resources behind it, and that would be it for the species. Getting mid-sized business owners on board, on the other hand, might be a good idea due to the funding they could provide.

I don't think any of the big donors are good enough, so if you want to donate to other people's projects (or maybe become a co-founder), you could try finding interesting projects yourself on Manifund and the Nonlinear Network.

We know for a fact that alignment, at least for human-level intelligences, has a solution because people do actually care, at least in part, about each other. Therefore, it might be worth contacting Steven Byrnes and asking him whether he could usefully use more funding or what similar projects he recommends.

Outside AI, if the reason you care about existential risk isn't because you want to save the species, but because human extinction implies a lot of people will die, you could try looking into chemical brain preservation and how cheap it is. This could itself be a source of revenue, and you probably won't have any competitors (established cryonics orgs don't offer cheap brain preservation and I have asked and Tomorrow Biostasis isn't interested either).

I also personally have not completely terrible ideas for alignment research and weak (half an SD?) intelligence augmentation. If you're interested, we can discuss them via DMs.

Finally, if you do fund intelligence augmentation research, please consider whether to keep it secret, if feasible.

^{^}
Or maybe even if they do.

Refactoring cryonics as structural brain preservation

P.10mo100

Does OBP plan to eventually expand their services outside the USA? And how much would it cost if you didn’t subsidize it? Cost is a common complaint about cryonics so I could see you becoming much bigger than the cryonics orgs, but judging by the website you look quite small. Do you know why that is?

Open Thread Spring 2024

P.1y*100

Does anyone have advice on how I could work full-time on an alignment research agenda I have? It looks like trying to get a LTFF grant is the best option for this kind of thing, but if after working more time alone on it, it keeps looking like it could succeed, it’s likely that it would become too big for me alone, I would need help from other people, and that looks hard to get. So, any advice from anyone who’s been in a similar situation? Also, how does this compare with getting a job at an alignment org? Is there any org where I would have a comparable amount of freedom if my ideas are good enough?

Edit: It took way longer than I thought it would, but I've finally sent my first LTFF grant application! Now let's just hope they understand it and think it is good.

Decent plan prize announcement (1 paragraph, $1k)

P.1y*72

It depends on what you know about the model and the reason you have to be concerned in the first place (if it's just "somehow", that's not very convincing).

You might be worried that training it leads to the emergence of inner-optimizers, be them ones that are somehow "trying" to be good at prediction in a way that might generalize to taking real-life actions, approximating the searchy part of the humans they are trying to predict, or just being RL agents. If you are just using basically standard architectures with a lot more compute, these all seem unlikely. But if I were you, I might try to test its ability to perform well in a domain it has never seen, where humans start by performing poorly but very quickly learn what to do (think about video games with new mechanics). If it does well, you have a qualitatively new thing on your hands, don't deploy, study it instead. If a priori for some reason you think it could happen, and only a small subset of all the data is necessary to achieve that, do a smaller training run first with that data.

Or you might be worried about mostly external consequentialist cognition (think explicit textual it-then-elses). In that case, existing systems can already do it to some extent, and you should worry about how good its reasoning actually is, so perform capability evaluations. If it looks that there is some way of getting it to do novel research by any known method or that it's getting close, don't deploy, otherwise someone might figure out how to use it to do AI research, and then you get a singularity.

And in any case, you should worry about the effects your system will have on the AI race. Your AI might not be dangerous, but if it is a good enough lawyer or programmer that it starts getting many people out of their jobs, investment in AI research will increase a lot and someone will figure out how to create an actual AGI sooner than they would otherwise.

Edit: And obviously you should also test how useful it could be for people trying to do mundane harm (e.g. with existing pathogens) and, separately, there might not be a hard threshold on how good a model is at doing research that it starts being dangerous, so they might get there little by little and you would be contributing to that.

Edit in response to the second clarification: Downscale the relevant factors, like amount of training data, number of parameters and training time, or use a known-to-be-inferior architecture until the worrying capabilities go away. Otherwise, you need to solve the alignment problem.

Edit in response to Beth Barnes's comment: You should probably have people reviewing outputs to check the model behaves well, but if you actually think you need measures like "1000 workers with technical undergrad degrees, paid $50/hr" because you are worried it somehow kills you, then you simply shouldn't deploy it. It's absurd to have the need to check whether a commercial product is an existential threat, or anything close to that.

2023 Unofficial LessWrong Census/Survey

P.2y181

Done! There aren't enough mysterious old wizards.

Vote on Interesting Disagreements

P.2y80

You know of a technology that has at least a 10% chance of having a very big novel impact on the world (think the internet or ending malaria) that isn't included in this list, very similar, or downstream from some element of it: AI, mind uploads, cryonics, human space travel, geo-engineering, gene drives, human intelligence augmentation, anti-aging, cancer cures, regenerative medicine, human genetic engineering, artificial pandemics, nuclear weapons, proper nanotech, very good lie detectors, prediction markets, other mind-altering drugs, cryptocurrency, better batteries, BCIs, nuclear fusion, better nuclear fission, better robots, AR, VR, room-temperature superconductors, quantum computers, polynomial time SAT solvers, cultured meat, solutions to antibiotic resistance, vaccines to some disease, optical computers, artificial wombs, de-extinction and graphene.

Bad options included just in case someone thinks they are good.

Vote on Interesting Disagreements

P.2y170

Public mechanistic interpretability research is net positive in expectation.

Vote on Interesting Disagreements

P.2y20

Cultural values are something like preferences over pairs of social environments and things we actually care about. So it makes sense to talk about jointly optimizing them.

Vote on Interesting Disagreements

P.2y7-1

If we had access to a brain upload (and maybe a world simulator too) we could in principle extract something like a utility function, and the theory behind it relates more to agents in general than it does to humans in particular.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments