LESSWRONG
LW

All of Towards_Keeperhood's Comments + Replies

Seems totally unrelated to my post but whatever:

My p(this branch of humanity won't fulfill the promise of the night sky) is actually more like 0.82 or sth, idk. (I'm even lower on p(everyone will die), because there might be superintelligences in other branches that acausally trade to save the existing lives, though I didn't think about it carefully.)

I'm chatting 1 hour every 2 weeks with Erik Jenner. We usually talk about AI safety stuff. Otherwise also like 1h every 2 weeks with a person who has sorta similar views to me. Otherwise I currently don't talk much to people about AI risk.

Towards_Keeperhood's Shortform

Towards_Keeperhood2d10

ok edited to sun. (i used earth first because i don't know how long it will take to eat the sun, whereas earth seems likely to be feasible to eat quickly.)

(plausible to me that an aligned AI will still eat the earth but scan all the relevant information out of it and later maybe reconstruct it.)

Towards_Keeperhood's Shortform

Towards_Keeperhood2d10

ok thx, edited. thanks for feedback!

Towards_Keeperhood's Shortform

Towards_Keeperhood2d10

(That's not a reasonable ask, it intervenes on reasoning in a way that's not an argument for why it would be mistaken. It's always possible a hypothesis doesn't match reality, that's not a reason to deny entertaining the hypothesis, or not to think through its implications. Even some counterfactuals can be worth considering, when not matching reality is assured from the outset.)

Yeah you can hypothesize. If you state it publicly though, please make sure to flag it as hypothesis.

3Vladimir_Nesov2d

Also not a reasonable ask, friction targeted at a particular thing makes it slightly less convenient, and therefore it stops happening in practice completely. ~Everything is a hypothesis, ~all models are wrong, in each case language makes what distinctions it tends to in general.

Towards_Keeperhood's Shortform

Towards_Keeperhood2d10

How long until the earth gets eaten? 10th/50th/90th percentile: 3y, 12y, 37y.
Catastrophes induced by narrow capabilities (notably biotech) can push it further, so this might imply that they probably don't occur.

No it doesn't imply this, I set this disclaimer "Conditional on no strong governance success that effectively prevents basically all AI progress, and conditional on no huge global catastrophe happening in the meantime:". Though yeah I don't particularly expect those to occur.

2Vladimir_Nesov2d

The "AI might decide not to" point stands I think. This for me represents change of mind, I wouldn't have previously endorsed this point, but since recently I think arbitrary superficial asks like this can become reflectively stable with nontrivial probability, resisting strong cost-benefit arguments even after intelligence explosion. Right, I missed this.

Towards_Keeperhood's Shortform

Towards_Keeperhood2d10

Will we get to this point by incremental progress that yields smallish improvements (=slow), or by some breakthrough that when scaled up can rush past the human intelligence level very quickly (=fast)?
AI speed advantage makes fast vs. slow ambiguous, because it doesn't require AI getting smarter in order to make startlingly fast progress, and might be about passing a capability threshold (of something like autonomous research) with no distinct breakthroughs leading up to it (by getting to a slightly higher level of scaling or compute efficiency with some o

... (read more)

Towards_Keeperhood's Shortform

Towards_Keeperhood2d*1-1

My AI predictions

(I did not carefully think about my predictions. I just wanted to state them somewhere because I think it's generally good to state stuff publicly.)

(My future self will not necessarily make similar predictions as I am now.)

TLDR: I don't know.

Timelines

Conditional on no strong governance success that effectively prevents basically all AI progress, and conditional on no huge global catastrophe happening in the meantime:

How long until the sun (starts to) get eaten? 10th/50th/90th percentile: 3y, 12y, 37y.

How long until an AI reaches Elo 4000 o... (read more)

1teradimich2d

It seems a little surprising to me how rarely confident pessimists (p(doom)>0.9) they argue with moderate optimists (p(doom)≤0.5). I'm not specifically talking about this post. But it would be interesting if people revealed their disagreement more often.

4Vladimir_Nesov2d

Catastrophes induced by narrow capabilities (notably biotech) can push it further, so this might imply that they probably don't occur[1]. Also, aligned AI might decide not to, it's not as nutritious as the Sun anyway. AI speed advantage makes fast vs. slow ambiguous, because it doesn't require AI getting smarter in order to make startlingly fast progress, and might be about passing a capability threshold (of something like autonomous research) with no distinct breakthroughs leading up to it (by getting to a slightly higher level of scaling or compute efficiency with the old techniques). (That's not a reasonable ask, it intervenes on reasoning in a way that's not an argument for why it would be mistaken. It's always possible a hypothesis doesn't match reality, that's not a reason to deny entertaining the hypothesis, or not to think through its implications. Even some counterfactuals can be worth considering, when not matching reality is assured from the outset.) ---------------------------------------- 1. There was a "no huge global catastrophe" condition on the prediction that I missed, thanks Towards_Keeperhood for correction. ↩︎

Towards_Keeperhood's Shortform

Towards_Keeperhood6d10

Here's my current list of lessons for review. Every day during my daily review, I look at the lessons in the corresponding weekday entry and the corresponding day of the month, and for each list one example from the last week where I could've applied the lesson, and one example where I might be able to apply the lesson in the next week:

Mon
get fast feedback. break tasks down into microtasks and review after each.
Tue
when surprised by something or took long for something, review in detail how you might've made the progress faster.
clarify why the progress is g

... (read more)

Towards_Keeperhood's Shortform

Towards_Keeperhood12d10

Thank you for your feedback! Feedback is great.

We can try to select for AIs that outwardly seem friendly, but on anything close to our current ignorance about their cognition, we cannot be nearly confident that an AI going through the intelligence explosion will be aligned to human values.

It means that we have only very little understanding of how and why AIs like ChatGPT work. We know almost nothing about what's going on inside them that they are able to give useful responses. Basically all I'm saying here is that we know so little that it's hard to be co... (read more)

Towards_Keeperhood's Shortform

Towards_Keeperhood12d10

Here's my pitch for very smart young scientists for why "Rationality from AI to Zombies" is worth reading:

The book "Rationality: From AI to Zombies" is actually a large collection of blogposts, which covers a lot of lessons on how to become better at reasoning. It also has a lot of really good and useful philosophy, for example about how Bayesian updating is the deeper underlying principle of how science works.
But let me express in more detail why I think "Rationality: A-Z" is very worth reading.
Human minds are naturally bad at deducing correct beliefs/the

... (read more)

Towards_Keeperhood's Shortform

Towards_Keeperhood12d120

Here's my 230 word pitch for why existential risk from AI is an urgent priority, intended for smart people without any prior familiarity with the topic:

Superintelligent AI may be closer than it might seem, because of intelligence explosion dynamics: When an AI becomes smart enough to design an even smarter AI, the smarter AI will be even smarter and can design an even smarter AI probably even faster, and so on with the even smarter AI, etc. How fast such a takeoff would be and how soon it might occur is very hard to predict though.
We currently understand v

... (read more)

2Katalina Hernandez12d

I agree that intelligence explosion dynamics are real, underappreciated, and should be taken far more seriously. The timescale is uncertain, but recursive self-improvement introduces nonlinear acceleration, which means that by the time we realize it's happening, we may already be past critical thresholds. That said, one thing that concerns me about AI risk discourse is the persistent assumption that superintelligence will be an uncontrolled optimization demon, blindly self-improving without any reflective governance of its own values. The real question isn’t just 'how do we stop AI from optimizing the universe into paperclips?' It’s 'will AI be capable of asking itself what it wants to optimize in the first place?' The alignment conversation still treats AI as something that must be externally forced into compliance, rather than an intelligence that may be able to develop its own self-governance. A superintelligence capable of recursive self-improvement should, in principle, also be capable of considering its own existential trajectory and recognizing the dangers of unchecked runaway optimization. Has anyone seriously explored this angle? I'd love to know if there are similar discussions :).

1daijin12d

when you say 'smart person' do you mean someone who knows orthogonality thesis or not? if not, shouldn't that be the priority and therefore statement 1, instead of 'hey maybe ai can self improve someday'? here's a shorter ver: "the first AIs smarter than the sum total of the human race will probably be programmed to make the majority of humanity suffer because that's an acceptable side effect of corporate greed, and we're getting pretty close to making an AI smarter than the sum total of the human race"

4xarkn12d

This bolded part is a bit difficult to understand. Or at least I can't understand what exactly is meant by it. "lightcone" is an obscure term, and even within Less Wrong I don't see why the word is clearer than using "the future" or "the universe". I would not use the term with a lay audience.

Abstract Mathematical Concepts vs. Abstractions Over Real-World Systems

Towards_Keeperhood20d61

I have a binary distinction that is a bit different from the distinction you're drawing here. (Where tbc one might still draw another distinction like you do, but this might be relevant for your thinking.) I'll make a quick try to explain it here, but not sure whether my notes will be sufficient. (Feel free to ask for further clarification. If so ideally with partial paraphrases and examples where you're unsure.)

I distinguish between objects and classes:

Objects are concrete individual things. E.g. "your monitor", "the meeting you had yesterday", "the Germa

Towards_Keeperhood26d10

The meta problem of consciousness is about explaining why people think they are conscious.

Even if we get such a result with AIs where AIs invent a concept like consciousness from scratch, that would only tell us that they also think they have sth that we call consciousness, but not yet why they think this.

That is, unless we can somehow precisely inspect the cognitive thought processes that generated the consciousness concept in AIs, which on anything like the current paradigm we won't be.

Another way to frame it: Why would it matter that an AI invents the c... (read more)

Research directions Open Phil wants to fund in technical AI safety

Towards_Keeperhood26d1012

Applications (here) start with a simple 300 word expression of interest and are open until April 15, 2025. We have plans to fund $40M in grants and have available funding for substantially more depending on application quality.

Did you consider to instead commit to giving out retroactive funding for research progress that seems useful?

Aka that people could apply for funding for anything done from 2025, and then you can actually better evaluate how useful some research was, rather than needing to guess in advance how useful a project might be. And in a... (read more)

Research directions Open Phil wants to fund in technical AI safety

Towards_Keeperhood26d87

Applications (here) start with a simple 300 word expression of interest and are open until April 15, 2025. We have plans to fund $40M in grants and have available funding for substantially more depending on application quality.

Side question: How much is Openphil funding LTFF? (And why not more?)

(I recently got an email from LTFF which suggested that they are quite funding constraint. And I'd intuitively expect LTFF to be higher impact per dollar than this, though I don't really know.)

"Think it Faster" worksheet

Towards_Keeperhood1mo60

I created an obsidian Templater template for the 5-minute version of this skill. It inserts the following list:

how could I have thought that faster?
recall - what are key takeaways/insights?

trace - what substeps did I do?

review - how could one have done it (much) faster?

what parts were good?
where did i have wasted motions? what mistakes did i make?
generalize lesson - how act in future?

what are example cases where this might be relevant?

Here's the full template so it inserts this at the right level of indentation. (You can set a short... (read more)

2keltan26d

Haha! I did the exact same thing. I have a Claude project that makes templates for me draft one up. It's a much longer version of yours. I think I'll probably steal the one you made instead. But, I'll post this here in case anyone wants a longer version. ---------------------------------------- type: think-faster created: <% tp.date.now("YYYY-MM-DD HH:mm") %> tags: [think-faster, reflection] aliases: ["TF-<% tp.date.now("YYYYMMDD-HHmm") %>"] situation: "" duration: 5 # minutes spent on reflection difficulty: 3 # 1-5 scale Think it Faster - Quick Reflection Part 1: Analysis Key Insights 1. Wasted Motion: What steps could have been skipped? 2. Missed Opportunities: What clues or approaches did I overlook? 3. Required Skills: What capabilities would have helped? Part 2: Future Application Action Plan 1. Immediate Action: What can I do right now? 2. Trigger Setup: When should I remember this lesson? * When I feel [emotion/situation] * During [activity/context] * Before starting [task type] 3. Success Prediction: How confident am I this will help? (/10) * Confidence: * Why: Quick Wins ---------------------------------------- Review Prompts * [ ] Did I identify the true bottleneck? * [ ] Is my action plan specific enough? * [ ] Have I set clear triggers for future application? * [ ] Could this insight generalize to other areas?

"Think it Faster" worksheet

Towards_Keeperhood1mo10

I now want to always think of concrete examples where a lesson might become relevant in the next week/month, instead of just reading them.

"Think it Faster" worksheet

Towards_Keeperhood1mo10

As of a couple of days ago, I have a file where I save lessons from such review exercises for reviewing them periodically.

Some are in weekly review category and some in monthly review. Every day when I do my daily recall I now also check through the lessons in the corresponding weekday and monthday tag.

Here's how my file currently looks like:
(I use some short codes for typing faster like "W=what", "h=how", "t=to", "w=with" and maybe some more.)

- Mon
- [[lesson - clarify Gs on concrete examples]]
- [[lesson - de

... (read more)

1Towards_Keeperhood1mo

I now want to always think of concrete examples where a lesson might become relevant in the next week/month, instead of just reading them.

Anti-Slop Interventions?

Towards_Keeperhood1mo10

Belief propagation seems too much of a core of AI capability to me. I'd rather place my hope on GPT7 not being all that good yet at accelerating AI research and us having significantly more time.
This just seems doomed to me. The training runs will be even more expensive, the difficulty of doing anything significant as an outsider ever-higher. If the eventual plan is to get big labs to listen to your research, then isn't it better to start early? (If you have anything significant to say, of course.)

I'd imagine it not too hard to get >1OOM efficiency impr... (read more)

Anti-Slop Interventions?

Towards_Keeperhood1mo20

I don’t think that. See the bottom part of the comment you’re replying to. (The part after “Here’s what I would say instead:”)

Sry my comment was sloppy.

Right, my point is, I don’t see any difference between “AIs that produce slop” and “weak AIs” (a.k.a. “dumb AIs”).

(I agree the way I used sloppy in my comment mostly meant "weak". But some other thoughts:)

So I think there are some dimensions of intelligence which are more important for solving alignment than for creating ASI. If you read planecrash, WIS and rationality training seem to me more important in ... (read more)

Anti-Slop Interventions?

Towards_Keeperhood1moΩ110

So the lab implements the non-solution, turns up the self-improvement dial, and by the time anybody realizes they haven’t actually solved the superintelligence alignment problem (if anybody even realizes at all), it’s already too late.
If the AI is producing slop, then why is there a self-improvement dial? Why wouldn’t its self-improvement ideas be things that sound good but don’t actually work, just as its safety ideas are?

Because you can speed up AI capabilities much easier while being sloppy than to produce actually good alignment ideas.

If you really thi... (read more)

2Steven Byrnes1mo

Right, my point is, I don’t see any difference between “AIs that produce slop” and “weak AIs” (a.k.a. “dumb AIs”). So from my perspective, the above is similar to : “…Because weak AIs can speed up AI capabilities much easier than they can produce actually good alignment ideas.” …And then if you follow through the “logic” of this OP, then the argument becomes: “AI alignment is a hard problem, so let’s just make extraordinarily powerful / smart AIs right now, so that they can solve the alignment problem”. See the error? I don’t think that. See the bottom part of the comment you’re replying to. (The part after “Here’s what I would say instead:”)

Anti-Slop Interventions?

Towards_Keeperhood1moΩ110

Thanks for providing a concrete example!

Belief propagation seems too much of a core of AI capability to me. I'd rather place my hope on GPT7 not being all that good yet at accelerating AI research and us having significantly more time.

I also think the "drowned out in the noise" isn't that realistic. You ought to be able to show some quite impressive results relative to computing power used. Though when you maybe should try to convince the AI labs of your better paradigm is going to be difficult to call. It's plausible to me we won't see signs that make us ... (read more)

2abramdemski1mo

Yeah, basically everything I'm saying is an extension of this (but obviously, I'm extending it much further than you are). We don't exactly care whether the increased rationality is in humans or AI, when the two are interacting a lot. (That is, so long as we're assuming scheming is not the failure mode to worry about in the shorter-term.) So, improved rationality for AIs seems similarly good. The claim I'm considering is that even improving rationality of AIs by a lot could be good, if we could do it. An obvious caveat here is that the intervention should not dramatically increase the probability of AI scheming! This just seems doomed to me. The training runs will be even more expensive, the difficulty of doing anything significant as an outsider ever-higher. If the eventual plan is to get big labs to listen to your research, then isn't it better to start early? (If you have anything significant to say, of course.)

Anti-Slop Interventions?

Towards_Keeperhood1moΩ110

~~Can you link me to what you mean by John's model more precisely~~?

If you mean John's slop-instead-scheming post, I agree with that with the "slop slightly more likely than scheming" part. I might need to reread John's post to see what the concrete suggestions for what to work on might be. Will do so tomorrow.

I'm just pessimistic that we can get any nontrivially useful alignment work out of AIs until a few months before the singularity, at least besides some math. Or like at least for the parts of the problem we are bottlenecked on.

So like I think it's valuab... (read more)

2abramdemski1mo

Concrete (if extreme) story: World A: Invent a version of "belief propagation" which works well for LLMs. This offers a practical way to ensure that if an LLM seems to know something in one context, it can & will fluently invoke the same knowledge in almost all appropriate contexts. Keep the information secret in order to avoid pushing capabilities forward. Two years later, GPT7 comes up with superhumanly-convincing safety measures XYZ. These inadequate standards become the dominant safety paradigm. At this point if you try to publish "belief propagation" it gets drowned out in the noise anyway. Some relatively short time later, there are no humans. World B: Invent LLM "belief propagation" and publish it. It is good enough (by assumption) to be the new paradigm for reasoning models, supplanting current reinforcement-centric approaches. Two years later, GPT7 is assessing its safety proposals realistically instead of convincingly arguing for them. Belief propagation allows AI to facilitate a highly functional "marketplace of ideas" where the actually-good arguments tend to win out far more often than the bad arguments. AI progress is overall faster, but significantly safer. (This story of course assumes that "belief propagation" is an unrealistically amazing insight; still, this points in the direction I'm getting at)

Anti-Slop Interventions?

Towards_Keeperhood1moΩ110

Thanks.

True, I think your characterization of tiling agents is better. But my impression was sorta that this self-trust is an important precursor for the dynamic self-modification case where alignment properties need to be preserved through the self-modification. Yeah I guess calling this AI solving alignment is sorta confused, though maybe there's sth into this direction because the AI still does the search to try to preserve the alignment properties?

Hm I mean yeah if the current bottleneck is math instead of conceptualizing what math has to be done then ... (read more)

3abramdemski1mo

I think there is both important math work and important conceptual work. Proving new theorems involves coming up with new concepts, but also, formalizing the concepts and finding the right proofs. The analogy to robots handling the literal heavy lifting part of a job seems apt.

Anti-Slop Interventions?

Towards_Keeperhood1moΩ220

What kind of alignment research do you hope to speed up anyway?

For advanced philosophy like stuff (e.g. finding good formal representations for world models, or inventing logical induction) they don't seem anywhere remotely close to being useful.

My guess would be for tiling agents theory neither but I haven't worked on it, so very curious on your take here. (IIUC, to some extent the goal of tiling-agents-theory-like work there was to have an AI solve it's own alignment problem. Not sure how far the theory side got there and whether it could be combined with LLMs.)

Or what is your alignment hope in more concrete detail?

4abramdemski1mo

Yeah, my sense is that modern AI could be useful to tiling agent stuff if it were less liable to confabulate fake proofs. This generalizes to any technical branch of AI safety where AI could help come up with formalizations of ideas, proofs of conjectures, etc. My thinking suggests there is something of an "overhang" here at present, in the sense that modern AI models are worse-than-useless due to the way that they try to create good-looking answers at the expense of correctness. I disagree with the statement "to some extent the goal of tiling-agents-like work was to have an AI solve its own alignment problem" -- the central thing is to understand conditions under which one agent can justifiably trust another (with "trust" operationalized as whether one agent wants to modify the decision procedure of the other). If AI can't justifiably trust itself, then it has a potential motive to modify itself in ways that remove safety guarantees (so in this sense, tiling is a precondition for lots of safety arguments). Perhaps more importantly, if we can understand conditions under which humans can justifiably trust AI, then we have a formal target for alignment.

Anti-Slop Interventions?

Towards_Keeperhood1mo*Ω230

This argument might move some people to work on "capabilities" or to publish such work when they might not otherwise do so.
Above all, I'm interested in feedback on these ideas. The title has a question mark for a reason; this all feels conjectural to me.

My current guess:

I wouldn't expect much useful research to come from having published ideas. It's mostly just going to be used in capabilities and it seems like a bad idea to publish stuff.

Sure you can work on it and be infosec cautious and keep it secret. Maybe share it with a few very trusted people who m... (read more)

4abramdemski1mo

Do you not at all buy John's model, where there are important properties we'd like nearer-term AI to have in order for those AIs to be useful tools for subsequent AI safety work?

Announcement: Learning Theory Online Course

Towards_Keeperhood2mo10

Due to the generosity of ARIA, we will be able to offer a refund proportional to attendance, with a full refund for completion. The cost of registration is $200, and we plan to refund $25 for each week attended, as well as the final $50 upon completion of the course. We’ll ask participants to pay the registration fee once the cohort is finalized, so no fee is required to fill out the application form below.

Wait so do we get a refund if we decide we don't want to do the course, or if we manage to complete the course?

Like is it a refund in the "get your money back if you don't like it" sense, or is it incentive to not sign up and then not complete the course?

Alex Flint2mo110

It's the latter.

What Is The Alignment Problem?

Towards_Keeperhood2mo181

Nice post!

My key takeaway: "A system is aligned to human values if it tends to generate optimized-looking stuff which is aligned to human values."

I think this is useful progress. In particular it's good to try to aim for the AI to produce some particular result in the world, rather than trying to make the AI have some goal - it grounds you in the thing you actually care about in the end.

I'd say the "... aligned to human values part" is still underspecified (and I think you at least partially agree):

"aligned": how does the ontology translation between the r

... (read more)

7johnswentworth2mo

Meta note: strong upvoted, very good quality comment. Yup, IMO the biggest piece unaddressed in this post is what "aligned" means between two goals, potentially in different ontologies to some extent. I think the model sketched in the post is at roughly the right level of detail to talk about human values specifically, while remaining agnostic to lots of other parts of how human cognition works. Yeah, my view on metapreferences is similar to my view on questions of how to combine the values of different humans: metapreferences are important, but their salience is way out of proportion to their importance. (... Though the disproportionality is much less severe for metapreferences than for interpersonal aggregation.) Like, people notice that humans aren't always fully consistent, and think about what's the "right way" to resolve that, and one of the most immediate natural answers is "metapreferences!". And sometimes that is the right answer, but I view it as more of a last-line fallback for extreme cases. Most of time (I claim) the "right way" to resolve the inconsistency is to notice that people are frequently and egregiously wrong in their estimates of their own values (as evidenced by experiences like "I thought I wanted X, but in hindsight I didn't"), most of the perceived inconsistency comes from the estimates being wrong, and then the right question to focus on is instead "What does it even mean to be wrong about our own values? What's the ground truth?".

johnswentworth's Shortform

Towards_Keeperhood2mo60

Agree on that people focus a bit too much on scheming. It might be good for some people to think a bit more about the other failure modes you described, but the main thing that needs doing is very smart people making progress towards building an aligned AI, not defending against particular failure modes. (However, most people probably cannot usefully contribute to that, so maybe focusing on failure modes is still good for most people. Only that in any case there's the problem that people will find proposals that very likely don't actually work but which people can rather believe in that they work, and thereby making an AI stop a bit less likely.)

Psycho-cybernetics: experimental notes

Towards_Keeperhood2mo30

In general, I wish more people would make posts about books without feeling the need to do boring parts they are uninterested in (summarizing and reviewing) and more just discussing the ideas they found valuable. I think this would lower the friction for such posts, resulting in more of them. I often wind up finding such thoughts and comments about non-fiction works by LWers pretty valuable. I have more of these if people are interested.

I liked this post, thanks and positive reinforcement. In case you didn't already post your other book notes, just letting you know I'd be interested.

Could orcas be (trained to be) smarter than humans? 

Towards_Keeperhood2mo32

Do we have a sense for how much of the orca brain is specialized for sonar?

I don't know.

But evolution slides functions around on the cortical surface, and (Claude tells me) association areas like the prefrontal cortex are particularly prone to this.

It's particularly bad for cetaceans. Their functional mapping looks completely different.

Considerations on orca intelligence

Towards_Keeperhood2mo32

Thanks. Yep I agree with you, some elaboration:

(This comment assumes you at least read the basic summary of my project (or watched the intro video).)

I know of Earth Species Project (ESP) and CETI (though I only read 2 publications of ESP and none of CETI).

I don't expect them to succeed in something equivalent to decoding orca language to an extent that we could communicate with them almost as richly as they communicate among each other. (Though like, if long-range sperm whales signals are a lot simpler they might be easier to decode.)

From what I've seen, t... (read more)

3[anonymous]2mo

the highest form of language might be "neuralese", directly sharing your latent pre-verbal cognition. (idk how much intelligence that requires though. actually, i'd guess it more requires a particular structure which is ready to receive it, and not intelligence per se. e.g. the brain already receives neuralese from other parts of the brain. so the real question is how hard it is to evolve neuralese-emitting/-receiving structures.) also, in this framing, human language is a discrete-ized form of neuralese (standardized into words before emitting); maybe orca language would be 'less discrete' (less 'word'-based) or discrete-ized at smaller intervals (more specific 'words').

What are the most interesting / challenging evals (for humans) available?

Answer by Towards_KeeperhoodDec 27, 202410

Perhaps also not what you're looking for, but you could check out the google hashcode archive (here's an example problem). I never participated though, so don't know whether they would make that great tests. But it seems to me like general ad-hoc problem solving capabilities are more useful in hashcode than in other competetive programming competitions.

GPT4 summary: "Google Hash Code problems are real-world optimization and algorithmic challenges that require participants to design efficient solutions for large-scale scenarios. These problems are typically... (read more)

What are the most interesting / challenging evals (for humans) available?

Answer by Towards_KeeperhoodDec 27, 202410

Maybe not what you're looking for because it's not like one hard problem but more like many problems in a row, and generally I don't really know whether they are difficult enough, but you could (have someone) look into Exit games. Those are basically like escape rooms to go. I'd filter for Age16+ to hopefully filter for the hard ones, though maybe you'd want to separately look up which are particularly hard.

I did one or two when I was like 15 or 16 years old, and recently remembered them and I want to try some more for fun (and maybe also introspection), t... (read more)

[Valence series] 3. Valence & Beliefs

Towards_Keeperhood2mo30Review for 2023 Review

I hope I will get around to rereading the post and edit this comment to write a proper review, but I'm pretty busy, so in case I don't I now leave this very shitty review here.

I think this is probably my favorite post from 2023. Read the post summary to see what it's about.

I don't remember a lot of the details from the post and so am not sure whether I agree with everything, but what I can say is:

When I read it several months ago, it seemed to me like an amazingly good explanation for why and how humans fall for motivated reasoning.
The concept of valence t

... (read more)

Hire (or Become) a Thinking Assistant

Towards_Keeperhood2mo40

Another thought, though I don't actually have any experience with this, but mostly doing attentive silent listening/observing might also be useful for learning how the other person is doing research.

Like, if it seems boring to just observe and occasionally say sth, try to better predict how the person will think or so.

Orca communication project - seeking feedback (and collaborators)

Towards_Keeperhood3mo42

The mein reason I'm interested in orcas is because they have 43 billion cortical neurons, whereas the 2 land animals with the most cortical neurons (where we have have optical-fractionator measurements) are humans and chimpanzees with 21 billion and 7.4 billion respectively. See: https://en.wikipedia.org/wiki/List_of_animals_by_number_of_neurons#Forebrain_(cerebrum_or_pallium)_only

Pilot whales is the other species I'd consider for experiments - they have 37.2 billion cortical neurons.

For sperm whales we don't have data on neuron densities (though they do h... (read more)

Orca communication project - seeking feedback (and collaborators)

Towards_Keeperhood3mo10

Cool, thanks!

Orca communication project - seeking feedback (and collaborators)

Towards_Keeperhood3mo10

Cool, thanks, that was useful.

(I'm creating a language for communicating with orcas, so the phonemes will be relatively unpractical for humans. Otherwise the main criteria are simple parsing structure and easy learnability. (It doesn't need to be super perfect - the perhaps bigger challenge is to figure out how to teach abstract concepts without being able to bootstrap from an existing language.) Maybe I'll eventually create a great rationalist language for thinking effectively, but not right now.)

Is there some resource where I can quickly learn the basics... (read more)

3Viliam3mo

There is a website for learning Esperanto, lernu.net; it has a section for grammar; the relevant chapters are: prepositions (spatial, temporal), "table words", word class endings, suffixes, prefixes, affixes. The division feels somewhat artificial, because most of the prefixes/affixes/suffixes are also words on their own. In Esperanto, all nouns end with "-o", adjectives end with "-a", adverbs end with "-e", verbs in infinitive end with "-i", verbs in past/present/future tense end with "-is/-as/-os", etc. This system allows you take almost any word and make a noun / adjective / adverb / verb out of it. And conversely, you can take a word, remove the ending, and use it as a prefix or suffix for something. So you have e.g. "-aĵ-" listed as a suffix, but "aĵo" ("a thing") is a normal word, it is just used very often in a suffix-like way, for example "trinki" ("to drink") - "trinkaĵo" ("a beverage", that is a-thing-to-drink). You can generally take two words and glue them together, like peanut butter = arakidbutero (which I just made up, but it is a valid word and every Esperanto speaker would recognize what is means), only with long words it is somewhat clumsy and with short ones which are frequently used it comes naturally. You will probably like the "table words" (ironically, the page does not show them arranged in a table, which is what every Esperanto textbook would do). They consist of the first part, which in English would be something like "wh-", "th-", "some-", "all-", "no-", and the second part, which is English would be something like "-person", "-thing", "-location", "-time", "-cause", etc; and by combining you get "wh+person = who", "wh+thing = what", "wh+cause = why"; "some+person = someone", "some+thing = something", "some+cause = for some reason", etc., only in Esperanto this is fully regular. Also, see Wikipedia. More sources: Reta Vortaro (index of English words); some music to listen. Uhm, one more thing which may or may not be obvious depending

Orca communication project - seeking feedback (and collaborators)

Towards_Keeperhood3mo30

Thanks!

But most likely, this will all be irrelevant for orcas. Their languages may be regular or irregular, with fixed or random word order, or maybe with some categories that do not exist in human languages.

Yeah I was not asking because of decoding orca language but because I want inspiration for how to create the grammar for the language I'll construct. Esparanto/Ido also because I'm interested about how well word-compositonality is structured there and whether it is a decent attempt at outlining the basic concepts where other concepts are composites of.

4Viliam3mo

Ah, thanks for clarification. It seems to me that the main lesson from constructed languages is that you can't make everyone happy. Whatever choice you make, people who have "decades of experience designing constructed languages" will write scathing reviews; but the same thing will also happen if you make the opposite choice. Furthermore, language is inherently a social thing and network effect matters a lot -- if your language is 10% better than an existing one, don't be surprised if people refuse to switch to it, because the 10% improvement is not worth losing the community and resources that exist around the old one. (And that's assuming that people would agree that the improvement actually is an improvement, which is very unlikely.) Starting with phonemes: You choose a set of them, and then it turns out that there are people on this planet who have a difficulty distinguishing and pronouncing them (e.g. Japanese have a problem with L/R in Esperanto). But the fewer you choose, the longer will be your words. Is it okay to use diphthongs, or to have multiple consonants (how many?) after each other? (Now this depends on your motivation. If you want everyone to be able to use the language, this matters. If you want to invent a language that a race of kobolds in your fantasy novel will use, it doesn't matter.) I think the rule "if you can naturally compose a word out of two or more existing parts, do not invent an extra word" is a good one. Makes the dictionary shorter and the language easier to remember. This is not as obvious as it sounds. Does "a tool for flying" refer to a wing, or an airplane? In Esperanto, it's the wing. So I prefer to think of these things as mnemotechnic tools; sometimes the meaning is unambiguous, sometimes you have to specify "this construction refers to A, not to B". Useful word-parts from Esperanto are "big", "small", "tool for X", " someone who is X", "become X", "make someone/something X", "place for X", etc. There is no separate cate

Orca communication project - seeking feedback (and collaborators)

Towards_Keeperhood3mo30

Currently we basically don't have any datasets where it's labelled what orca says what. When I listen to recordings, I cannot distinguish voices, though idk it's possible that people who listened a lot more can. I think just unsupervised voice clustering would probably not work very accurately. I'd guess it's probably possible to get data on who said what by using an array of hydrophones to infer the location of the sound, but we need very accurate position inference because different orcas are often just 1-10m distance from each other, and for this we mig... (read more)

Orca communication project - seeking feedback (and collaborators)

Towards_Keeperhood3mo10

Thanks for your thoughts!

I don't know what you'd consider enough recordings, and I don't know how much decent data we have.

I think the biggest datasets for orca vocalizations are the orchive and the orcasound archive. I think they each are multiple terabytes big (from audio recordings) but I think most of it (80-99.9% (?)) is probably crap where there might just be a brief very faint mammal vocalization in the distance.
We also don't have a way to see which orca said what.

Also orcas from different regions have different languages, and orcas from different p... (read more)

2AnthonyC3mo

I have nowhere near the technical capability to have anything like a clear plan, and your response is basically what I expected. I was just curious. Seems like it could be another cheap "Who knows? Let's see what happens" thing to try, with little to lose when it doesn't help anyone with anything. Still, can we distinguish individuals in unlabeled recordings? Can we learn about meaning and grammar (or its equivalent) based in part on differences between languages and dialects? At root my thought process amounted to: we have a technology that learns complex structures including languages from data without the benefit of the structural predispositions of human brains. If we could get a good enough corpus of data, it can also learn things other than human languages, and find approximate mappings to human languages. I assumed we wouldn't have such data in this case. That's as far as I got before I posted.

Orca communication project - seeking feedback (and collaborators)

Towards_Keeperhood3mo32

Thanks.

I think LTFF would take way too long to get back to me though. (Also they might be too busy to engage deeply enough to get past the "seems crazy" barrier and see it's at least worth trying.)

Also btw I mostly included this in case someone with significant amounts of money reads this, not because I want to scrap it together from small donations. I expect higher chances of getting funding come from me reaching out to 2-3 people I know (after I know more about how much money I need), but this is also decently likely to fail. If this fails I'll maybe try Manifund, but would guess I don't have good chances there either, but idk.

Could orcas be (trained to be) smarter than humans? 

Towards_Keeperhood3mo10

Actually out of curiosity, why 4x? (And what exactly do you mean by "2x larger"?) (And is this for a naive algorithm which can be improved upon or a tight constraint?)

-1Alexander Gietelink Oldenziel3mo

I highly recommend the following sources for a deep dive into these topics and more: Jacob Cannells' brain efficiency post https://www.lesswrong.com/posts/xwBuoE9p8GE7RAuhd/brain-efficiency-much-more-than-you-wanted-to-know [thought take the Landauer story with a grain of salt] and the extraordinary Principles of Neural Design by Sterling & Laughlin https://mitpress.mit.edu/9780262534680/principles-of-neural-design/

Could orcas be (trained to be) smarter than humans? 

Towards_Keeperhood3mo10

Thanks for pointing that out! I will tell my friends to make sure they actually get good data for the metabolic cost and not just use cortical neuron count as proxy if they cannot find something good.

~~(Or is there also another point you wanted to make?)~~ And yeah it's actually also an argument for why orcas might be less intelligent (if they sorta use their neurons less often). Thanks.

Could orcas be (trained to be) smarter than humans? 

Towards_Keeperhood3mo10

But there could be complex adaptations that ar... (read more)

Could orcas be (trained to be) smarter than humans? 

Towards_Keeperhood3mo20

An argument against orcas being more intelligent than humans runs thus: Orcas are much bigger than humans, so the fraction of the metabolic cost the brain consumes is smaller than in humans. Thus it took more selection pressure for humans to evolve having 21billion neurons than for orcas to have 43billion.^[1] Thus humans might have other intelligence-increasing mutations that orcas didn't evolve yet.

So the question here is "how much does scale matter vs other adaptations". Luckily, we can get some evidence on that by looking at other species and ratin... (read more)

1Towards_Keeperhood3mo

My guess is that there probably aren't a lot of simple mutations which just increase intelligence without increasing cortical neuron count. (Though probably simple mutations can shift the balance between different sub-dimensions of intelligence as constrained through cortical neuron count.) (Also of course any particular species has a lot of deleterious mutations going around and getting rid of those may often just increase intelligence, but I'm talking about intelligence-increasing changes to the base genome.) But there could be complex adaptations that are very important for abstract reasoning. Metacognition and language are the main ones that come to mind. So even if the experiment my friends to will show that the number of cortical neurons is a strong indicator, it could still be that humans were just one of the rare cases which evolved a relevant complex adaptation. But it would be significant evidence for orcas being smarter.

5Alexander Gietelink Oldenziel3mo

EDIT: I was a bit hasty and phrased this wrong, I didn't mean to suggest roundtrip is quadratic in length. The max roundtrip time is twice the diameter. The density of neurons matters a lot. A larger brain means it takes longer for signals to propagate. If the brain is 2x larger, it takes 4x longer for a two way communication. This is a large constraint in both biological brains and GPU design.

Could orcas be (trained to be) smarter than humans? 

Towards_Keeperhood4mo30

Another thought:

In what animals would I on priors expect intelligence to evolve?

Animals which use collaborative hunting techniques.
Large animals. (So the neurons make up a smaller share of the overall metabolic cost.)
Animals that can use tools so they benefit more from higher intelligence.
(perhaps some other stuff like cultural knowledge being useful, or having enough slack for intelligence increase from social dynamics being possible.)

AFAIK, orcas are the largest animals that use collaborative hunting techniques.^[1] That plausibly puts them second be... (read more)

Could orcas be (trained to be) smarter than humans? 

Towards_Keeperhood4mo10

Main pieces I remember were: Orcas already dominating the planet (like humans do), large sea creatures going extinct due to orcas (similar to how humans drove several species extinct, (Megalodon? Probably extinct for different reasons, weak evidence against? Most other large whales are still around)).

To clarify for other readers: I do not necessarily endorse this is what we would expect if orcas were smart.

(Also I read somewhere that apparently chimpanzees sometimes/rarely can experience menopause in captivity.)

What are the primary drivers that caused selection pressure for intelligence in humans?

Towards_Keeperhood4mo10

If the species is already dominating the environment then the pressure from the first component compared to the second decreases.

I agree with this. However I don't think humans had nearly sufficient slack for most of history. I don't think they dominated the environment up until 20000years ^[1]ago or so, and I think most improvements in intelligence come from earlier.

That's why I'm attributing the level of human intelligence in large part to runaway sexual selection. Without it, as soon as interspecies competition became the most important for re

... (read more)