All of Kenoubi's Comments + Replies

Kenoubi20

I am saying you do not literally have to be a cog in the machine. You have other options. The other options may sometimes be very unappealing; I don't mean to sugarcoat them.

Organizations have choices of how they relate to line employees. They can try to explain why things are done a certain way, or not. They can punish line employees for "violating policy" irrespective of why they acted that way or the consequences for the org, or not.

Organizations can change these choices (at the margin), and organizations can rise and fall because of these choices. This... (read more)

1TristanTrim
I really like this. Agreed. Slack is good, and ideally we would have plenty for everyone, but Moloch is not a fan. I feel like your pov includes a tacit assumption that if there are problems, somewhere there is somebody who, if they had better competence or moral character, could have prevented things from being so bad. I am a fan of Tsuyoku naritai, and I think it applies to ethics as well... I want to be stronger, more skilled and more kind. I want others to want this too. But I also want to acknowledge that, when honestly looking for blame, sometimes it may rest fully in someones character, but sometimes (and I suspect many or most times) the fault exists in systems and our failures of forethought and failures to understand the complexities of large multi state systems and the difficult ambiguity in communication. It is also reasonable to assume both can be at fault. Something that may be driving me to care about this issue... it seems much of the world today is out for blood. Suffering and looking to identify and kill the hated outgroup. Maybe we have too much population and our productivity can't keep up. Maybe some people need to die. But that is awful, and I would rather we sought our sacrifices with sorrow and compassion than the undeserving bitter hatred that I see. I believe we very well could be in a world where every single human is good, but bad things still happen anyway.
Kenoubi40

I have trouble understanding what's going on in people's heads when they choose to follow policy when that's visibly going to lead to horrific consequences that no one wants. Who would punish them for failing to comply with the policy in such cases? Or do people think of "violating policy" as somehow bad in itself, irrespective of consequences?

Of course, those are only a small minority of relevant cases. Often distrust of individual discretion is explicitly on the mind of those setting policies. So, rather than just publishing a policy, they may choose to ... (read more)

5Dweomite
On my model, there are a few different reasons: * Some people aren't paying enough attention to grok that horrific consequences will ensue, because Humans Who Are Not Concentrating Are Not General Intelligences. Perhaps they vaguely assume that someone else is handling the issue, or just never thought about it at all. * Some people don't care about the consequences, and so follow the path of least resistance. * Some people revel in the power to cause problems for others. I have a pet theory that one the strategies that evolution preprogrammed into humans is "be an asshole until someone stops you, to demonstrate you're strong enough to get away with being an asshole up to that point, and thereby improve your position in the pecking order". (I also suspect this is why the Internet is full of assholes--much harder to punish it than in the ancestral environment, and your evolutionary programming misinterprets this as you being too elite to punish.) * Some people may genuinely fear that they'll be punished for averting the horrific consequences (possibly because their boss falls into the previous category). * Some people over-apply the heuristic that rules are optimized for the good of all, and therefore breaking a rule just because it's locally good is selfish cheating. You might also be interested in Scott Aaronson's essay on blankfaces.
2TristanTrim
I think there's an emperors new clothes effect in chains of command. In every layer, the truth is altered slightly to make things appear a justifiable amount better than they really are, but because there can be so many layers of indirection in the operation of and adherence to policy, the culture can look really different depending on where you find yourself in the class hierarchy. This is especially true with thinking things through and questioning orders. I think people in roles to make policy are often far removed from the mentality that must be adopted to operate in the frantic, understaffed efficiency of front line workers carrying out policy. "There is nothing that can force you to do something you know is wrong" seems like a very affluent pov. More working class families might suggest advice more like "lower your expectations to lower your stress". I don't know your background though. Do let me know if I'm misunderstanding you.
Kenoubi34

I hadn't noticed that there'd be any reason for people to claim Claude 3.7 Sonnet was "misaligned", even though I use it frequently and have seen some versions of the behavior in question. It seems to me like... it's often trying to find the "easy way" to do whatever it's trying to do. When it decides something is "hard", it backs off from that line of attack. It backs off when it decides a line of attack is wrong, too. Actually, I think "hard" might be a kind of wrong in its ontology of reasoning steps.

This is a reasoning strategy that needs to be applied... (read more)

Kenoubi10

That's possible, but what does the population distribution of [how much of their time people spend reading books] look like? I bet it hasn't changed nearly as much as overall reading minutes per capita has (even decline in book-reading seems possible, though of course greater leisure and wealth, larger quantity of cheaply and conveniently available books, etc. cut strongly the other way), and I bet the huge pile of written language over here has large effects on the much smaller (but older) pile of written language over there.

(How hard to understand was th... (read more)

2eggsyntax
My focus on books is mainly from seeing statistics about the decline in book-reading over the years, at least in the US. Pulling up some statistics (without much double-checking) I see: (from here.) For 2023 the number of Americans who didn't read a book within the past year seems to be up to 46%, although the source is different and the numbers may not be directly comparable: (chart based on data from here.) That suggests to me that selection effects on who reads have gotten much stronger over the years. I do think it would have been better split into multiple sentences. That could be; I haven't seen statistics on reading in other media. My intuition is that many people find reading aversive and avoid it to the extent they can, and I think it's gotten much more avoidable over the past decade.
Kenoubi30

I agree that the average reader is probably smarter in a general sense, but they also have FAR more things competing for their attention. Thus the amount of intelligence available for reading and understanding any given sentence, specifically, may be lower in the modern environment.

2eggsyntax
Interesting point. I'm not sure increased reader intelligence and greater competition for attention are fully countervailing forces -- it seems true in some contexts (scrolling social media), but in others (in particular books) I expect that readers are still devoting substantial chunks of attention to reading.
Kenoubi20

Question marks and exclamation points are dots with an extra bit. Ellipses may be multiple dots, but also indicate an uncertain end to the sentence. (Formal usage distinguishes "..." for ellipses in arbitrary position and "...." for ellipses coming after a full stop, but the latter is rarely seen in any but academic writing, and I would guess even many academics don't notice the difference these days.)

Kenoubi10

I read a bunch of its "thinking" and it gets SO close to solving it after the second message, but it miscounts the number of [] in the text provided for 19. Repeatedly. While quoting it verbatim. (I assume it foolishly "trusts" the first time it counted.) And based on its miscount, thinks that should be the representation for 23, instead. And thus rules out (a theory starting to point towards) the corrects answer.

I think this may at least be evidence that having anything unhelpful in context, even (maybe especially!) if self-generated, can be really harmful to model capabilities. I still think it's pretty interesting.

Kenoubi10

I have very mixed feelings about this comment. It was a good story (just read it, and wouldn't have done so without this comment) but I really don't see what it has to do with this LW post.

3romeostevensit
I saw memetic disenfranchisement as central themes of both.
Kenoubi10

Possible edge case / future work - what if you optimize for faithfulness and legibility of the chain of thought? The paper tests optimizing for innocent-looking CoT, but if the model is going to hack the test either way, I'd want it to say so! And if we have both an "is actually a hack" detector and a "CoT looks like planning a hack" detector, this seems doable.

Is this an instance of the Most Forbidden Technique? I'm not sure. I definitely wouldn't trust it to align a currently unaligned superintelligence. But it seems like maybe it would let you make an a... (read more)

Kenoubi10

Is it really plausible that human driver inattention just doesn't matter here? Sleepiness, drug use, personal issues, eyes were on something interesting rather than the road, etc. I'd guess something like that is involved in a majority of collisions, and that Just Shouldn't Happen to AI drivers.

Of course AI drivers do plausibly have new failure modes, like maybe the sensors fail sometimes (maybe more often than human eyes just suddenly stop working). But there should be plenty of data about that sort of thing from just testing them a lot.

The only realistic... (read more)

1Mis-Understandings
have been declared street-legal and are functioning in a roadway and regulatory system that humans (chose to) set up Does not eliminate the regulator screwing up the standards and not placing them high enough, or having a builder flub the implementation For instance, pure imitation learning has a likelyhood of starting to drive like a bad driver if trained on bad drivers, and if it does one thing that a bad driver would do. (we have seen this failure mode of bugs causing bugs in LLMs) Similarly, the best way to push down crash rates for self-driving cars is by reconstructing every accident and training the vehicle to avoid them. But if you mess up your training pattern, or if there are weird regulations, you don't get this, and you can end up with a system which consistently crashes in particular situations because of how it generalizes from it's training data. A good example for this misimplementation is the timeouts after collision warning on Waymo vehciles, and how this has caused multiple crashes without getting fixed. If something like that just slides, you don't end up safer If the only defense against this is activity from the regulator, then arguing that the regulator should get out of the way to make things safer does not work.  The complexity of getting declared street legal means exactly that the safety is an open question. 
Kenoubi10

“ goal” in “football| goal|keeping”

 

Looks like an anti-football (*American* football, that is) thing, to me.  American football doesn't have goals, and soccer (which is known as "football" in most of the world) does.  And you mentioned earlier that the baseball neuron is also anti-football.

Kenoubi10

Since it was kind of a pain to run, sharing these probably minimally interesting results. I tried encoding this paragraph from my comment:

I wonder how much information there is in those 1024-dimensional embedding vectors. I know you can jam an unlimited amount of data into infinite-precision floating point numbers, but I bet if you add Gaussian noise to them they still decode fine, and the magnitude of noise you can add before performance degrades would allow you to compute how many effective bits there are. (Actually, do people use this technique on la

... (read more)
1NickyP
Yeah it was annoying to get working. I now have added a Google Colab in case anyone else wants to try anything. It does seem interesting that the semantic arithmetic is hit or miss (mostly miss).
Kenoubi50

You appear to have two full copies of the entire post here, one above the other. I wouldn't care (it's pretty easy to recognize this and skip the second copy) except that it totally breaks the way LW does comments on and reactions to specific parts of the text; one has to select a unique text fragment to use those, and with two copies of the entire post, there aren't any unique fragments.

1NickyP
Ok thanks, not sure why that happened but it should be fixed now.
Kenoubi80

Wow, the SONAR encode-decode performance is shockingly good, and I read the paper and they explicitly stated that their goal was translation, and that the autoencoder objective alone was extremely easy! (But it hurt translation performance, presumably by using a lot of the latent space to encode non-semantic linguistic details, so they heavily downweighted autoencoder loss relative to other objectives when training the final model.)

I wonder how much information there is in those 1024-dimensional embedding vectors. I know you can jam an unlimited amount of ... (read more)

1Kenoubi
Since it was kind of a pain to run, sharing these probably minimally interesting results. I tried encoding this paragraph from my comment: with SONAR, breaking it up like this: sentences = [ 'I wonder how much information there is in those 1024-dimensional embedding vectors.', 'I know you can jam an unlimited amount of data into infinite-precision floating point numbers, but I bet if you add Gaussian noise to them they still decode fine, and the magnitude of noise you can add before performance degrades would allow you to compute how many effective bits there are.', '(Actually, do people use this technique on latents in general? I\'m sure either they do or they have something even better; I\'m not a supergenius and this is a hobby for me, not a profession.)', 'Then you could compare to existing estimates of text entropy, and depending on exactly how the embedding vectors are computed (they say 512 tokens of context but I haven\'t looked at the details enough to know if there\'s a natural way to encode more tokens than that;', 'I remember some references to mean pooling, which would seem to extend to longer text just fine?), compare these across different texts.'] and after decode, I got this: ['I wonder how much information there is in those 1024-dimensional embedding vectors.', 'I know you can encode an infinite amount of data into infinitely precise floating-point numbers, but I bet if you add Gaussian noise to them they still decode accurately, and the amount of noise you can add before the performance declines would allow you to calculate how many effective bits there are.', "(Really, do people use this technique on latent in general? I'm sure they do or they have something even better; I'm not a supergenius and this is a hobby for me, not a profession.)", "And then you could compare to existing estimates of text entropy, and depending on exactly how the embedding vectors are calculated (they say 512 tokens of context but I haven't lo
2NickyP
Thanks for reading, and yeah I was also surprised by how well it does. It does seem like there is degradation in auto-encoding from the translation, but I would guess that it probably does also make the embedding space have some nicer properties I did try some small tests to see how sensitive the Sonar model is to noise, and it seems OK. I tried adding gaussian noise and it started breaking at around >0.5x the original vector size, or at around cosine similarity <0.9, but haven't tested too deeply, and it seemed to depend a lot on the text. I meta's newer "Large Concept Model" paper they do seem to manage to train a model solely on Sonar vectors for training, though I think they also fine-tune the Sonar model to get better results (here is a draft distillation I did. EDIT: decided to post it). It seems to have some benefits (processing long contexts becomes much easier), though they don't test on many normal benchmarks, and it doesn't seem much better than LLMs on those. The SemFormers paper linked I think also tries to do some kind of "explicit planning" with a text auto-encoder but I haven't read it too deeply yet. I briefly gleamed that it seemed to get better at graph traversal or something. There are probably other things people will try, hopefully some that help make models more interpretable. Yeah I would like for there to be a good way of doing this in the general case. So far I haven't come up with any amazing ideas that are not variations on "train a classifier probe". I guess if you have a sufficiently good classifier probe setup you might be fine, but it doesn't feel to me like something that works in the general case. I think there is a lot of room for people to try things though. I don't think there is any explicit reason to limit to 512 tokens, but I guess it depends how much "detail" needs to be stored. In the Large Concept Models paper, the experiments on text segmentation did seem to degrade after around ~250 characters in length, but they on
Kenoubi10

Sorry, I think it's entirely possible that this is just me not knowing or understanding some of the background material, but where exactly does this diverge from justifying the AI pursuing a goal of maximizing the inclusive genetic fitness of its creators? Which clearly either isn't what humans actually want (there are things humans can do to make themselves have more descendants that no humans, including the specific ones who could take those actions, want to take, because of godshatter) or is just circular (who knows what will maximize inclusive genetic... (read more)

Answer by Kenoubi10

As the person who requested of MIRI to release the Sequences as paper books in the first place, I have asked MIRI to release the rest of them, and credibly promised to donate thousands of dollars if they did so. Given the current situation vis-a-vis AI, I'm not that surprised that it still does not appear to be a priority to them, although I am disappointed.

MIRI, if you see this, yet another vote for finishing the series! And my offer still stands!

1Anna Eplin
Add my vote too!!
Kenoubi10

Thank you for writing this. It has a lot of stuff I haven't seen before (I'm only really interested in neurology insofar as it's the substrate for literally everything I care about, but that's still plenty for "I'd rather have a clue than treat the whole area as spooky stuff that goes bump in the night").

As I understand it, you and many scientists are treating energy consumption by anatomical part of the brain (as proxied by blood flow) as the main way to see "what the brain is doing". It seems possible to me that there are other ways that specific though... (read more)

2 years and 2 days later, in your opinion, has what you predicted in your conclusion happened?

(I'm just a curious bystander; I have no idea if there are any camps regarding this issue, but if so, I'm not a member of any of them.)

The most recent thing I've seen on the topic is this post from yesterday on debate, which found that debate does basically nothing. In fairness there have also been some nominally-positive studies (which the linked post also mentions), though IMO their setup is more artificial and their effect sizes are not very compelling anyway.

My qualitative impression is that HCH/debate/etc have dropped somewhat in relative excitement as alignment strategies over the past year or so, more so than I expected. People have noticed the unimpressive results to some extent, ... (read more)

might put lawyers out of business

This might be even worse than she thought. Many, many contracts include the exact opposite of this clause, i.e., that the section titles are without any effect whatsoever on the actual interpretation of the contract.  I never noticed until just now that this is an instance of self-dealing on the part of the attorneys (typically) drafting the contracts!  They're literally saying that if they make a drafting error, in a way that makes the contract harder to understand and use and is in no conceivable way an improvem... (read more)

I was just reading about this, and apparently subvocalizing refers to small but physically detectable movement of the vocal cords. I don't know whether / how often I do this (I am not at all aware of it). But it is literally impossible for me to read (or write) without hearing the words in my inner ear, and I'm not dyslexic (my spelling is quite good and almost none of what's described in OP sounds familiar, so I doubt it's that I'm just undiagnosed). I thought this was more common than not, so I'm kind of shocked that the reacts on this comment's grandpar... (read more)

Kenoubi1-1

Leaving an unaligned force (humans, here) in control of 0.001% of resources seems risky. There is a chance that you've underestimated how large the share of resources controlled by the unaligned force is, and probably more importantly, there is a chance that the unaligned force could use its tiny share of resources in some super-effective way that captures a much higher fraction of resources in the future. The actual effect on the economy of the unaligned force, other than the possibility of its being larger than thought or being used as a springboard to g... (read more)

2Davidmanheim
Completely as an aside, coordination problems among ASI don't go away, so this is a highly non trivial claim.

Ah, okay, some of those seem to me like they'd change things quite a lot. In particular, a week's notice is usually possible for major plans (going out of town, a birthday or anniversary, concert that night only, etc.) and being able to skip books that don't interest one also removes a major class of reason not to go. The ones I can still see are (1) competing in-town plans, (2) illness or other personal emergency, and (3) just don't feel like going out tonight. (1) is what you're trying to avoid, of course. On (3) I can see your opinion going either way. ... (read more)

1omark
This is a tough call. How do you determine what is a "legitimately bad enough" case to miss the event? The examples you mention are clearly bad enough but there are other situation where it's much more personal. If I'm feeling low on energy is that a choice I am making or an unavoidable fact about my metabolism? You would have to set up some kind of tribunal or voting for deciding on these cases. That's a lot of effort and would only create bad vibes. So no, if you don't come you pay, no matter the reason. However, enforcement is lax. Mostly it's up to the people themselves to say "Yeah, today is my turn since two weeks ago I couldn't make it". If someone considers their case to be special they can easily get away with not paying and in all likelihood nobody would even notice let alone question it.

Reads like a ha ha only serious to me anyway.

I started a book club in February 2023 and since the beginning I pushed for the rule that if you don't come, you pay for everyone's drinks next time.

I'm very surprised that in that particular form that worked, because the extremely obvious way to postpone (or, in the end, avoid) the penalty is to not go next time either (or, in the end, ever again). I guess if there's agreement that pretty close to 100% attendance is the norm, as in if you can only show up 60% of the time don't bother showing up at all, then it could work. That would make sense for some... (read more)

1omark
This is definitely based on two assumptions that I mention in the article: If people don't really want to attend or the costs of the "blinds" are huge then things are different. That being said, you raise a good point. I can elaborate a little about the book club: * You can cancel "for free" if you do it sufficiently in advance (in theory 7 days, in practice 5 seems ok). This allows postponing if too many people cancel. * You can completely skip any book you don't find interesting (books are chosen via voting so only books that are generally popular make the cut). * There are now 10 attendees so paying for drinks is getting expensive. We are discussing how to keep it simple (e.g. collecting money to later spend it seems annoying) but also reduce the costs. * In practice everyone ends up paying occasionally so it evens out. * Some attendees feel ambivalent about the rule because it's constraining as you wrote. As I mentioned, it's important to be careful (and communicate well) about such things.

I think this is a very important distinction. I prefer to use "maximizer" for "timelessly" finding the highest value of an objective function, and reserve "optimizer" for the kind of stepwise improvement discussed in this post. As I use the terms, to maximize something is to find the state with the highest value, but to optimize it is to take an initial state and find a new state with a higher value. I recognize that "optimize" and "optimizer" are sometimes used the way you're saying, as basically synonymous with "maximize" / "maximizer", and I could retre... (read more)

Good post; this has way more value per minute spent reading and understanding it than the first 6 chapters of Jaynes, IMO.

There were 20 destroyed walls and 37 intact walls, leading to 10 − 3×20 − 1×37 = 13db

This appears to have an error; 10 − 3×20 − 1×37 = 10 - 60 - 37 = -87, not 13. I think you meant for the 37 to be positive, in which case 10 - 60 + 37 = -13, and the sign is reversed because of how you phrased which hypothesis the evidence favors (although you could also just reverse all the signs if you want the arithmetic to come out perfectly).

Al... (read more)

1dentalperson
Thanks! These are great points.  I applied the correction you noted about the signs and changed the wording about the direction of evidence.  I agree that the clarification about the 3 dB rule is useful; linked to your comment. Edit: The 10 was also missing a sign.  It should be -10 + 60 - 37.  I also flipped the 1:20 to 20:1 posterior odds that the orcs did it.

I re-read this, and wanted to strong-upvote it, and was disappointed that I already had. This is REALLY good. Way better than the thing it parodies (which was already quite good). I wish it were 10x as long.

The way that LLM tokenization represents numbers is all kinds of stupid. It's honestly kind of amazing to me they don't make even more arithmetic errors. Of course, an LLM can use a calculator just fine, and this is an extremely obvious way to enhance its general intelligence. I believe "give the LLM a calculator" is in fact being used, in some cases, but either the LLM or some shell around it has to decide when to use the calculator and how to use the calculator's result. That apparently didn't happen or didn't work properly in this case.

Thanks for your reply. "70% confidence that... we have a shot" is slightly ambiguous - I'd say that most shots one has are missed, but I'm guessing that isn't what you meant, and that you instead meant 70% chance of success.

70% feels way too high to me, but I do find it quite plausible that calling it a rounding error is wrong. However, with a 20 year timeline, a lot of people I care about will almost definitely still die, who could have not died if death were Solved, which group with very much not negligible probability includes myself. And as you note do... (read more)

2AlphaAndOmega
T1DM is a nasty disease, and much like you, I'm more than glad to live in the present day when we have tools to tackle it, even if other diseases still persist. There's no other time I'd rather be alive, even if I die soon, it's going to be interesting, and we'll either solve ~all our problems or die trying. I understand. My mother has chronic liver disease, and my grandpa is 95 years old, even if he's healthy for his age (a low bar!). In the former case, I think she has a decent chance of making it to 2043 in the absence of a Singularity, even if it's not as high as I would like. As for my grandfather, at that age just living to see the next birthday quickly becomes something you can't take for granted. I certainly cherish all the time I can spend him with him, and hope it all goes favorably for us all. As for me, I went from envying the very young, because I thought they were shoe-ins for making it to biological immortality, to pitying them more these days, because they haven't had at least the quarter decade of life I've had in the event AGI turns out malign. Hey, at least I'm glad we're not in the Worst Possible Timeline, given that awareness of AI x-risk has gone mainstream. That has to count for something.

P.S. Having this set of values and beliefs is very hard on one's epistemics. I think it's a writ-large version of what Eliezer has stated as "thinking about AI timelines is bad for one's epistemics". Here are some examples:

(1) Although I've never been at all tempted by e/acc techno-optimism (on this topic specifically) / alignment isn't a problem at all / alignment by default, boy, it sure would be nice to hear about a strategy for alignment that didn't sound almost definitely doomed for one reason or another. Even though Eliezer can (accurately, IMO) sh... (read more)

Kenoubi175

I agree with the Statement. As strongly as I can agree with anything. I think the hope of current humans achieving... if not immortality, then very substantially increased longevity... without AI doing the work for us, is at most a rounding error. And ASI that was even close to aligned, that found it worth reserving even a billionth part of the value of the universe for humans, would treat this as the obvious most urgent problem and solve death pretty much if there's any physically possible way of doing so. And when I look inside, I find that I simply don... (read more)

6Lyrialtus
Thank you for writing this. I usually struggle to find resonating thoughts, but this indeed resonates. Not all of it, but many key points have a reflection that I'm going to share: * Biological immortality (radical life extension) without ASI (and reasonably soon) looks hardly achievable. It's a difficult topic, but for me even Michael Levin's talks are not inspiring enough. (I would rather prefer to become a substrate-independent mind, but, again, imagine all the R&D without substantial super-human help.) * I'm a rational egoist (more or less), so I want to see the future and have pretty much nothing to say about the world without me. Enjoying not being alone on the planet is just a personal preference. (I mean, the origin system is good, nice planets and stuff, but what if I want to GTFO?) Also, I don't trust imaginary agents (gods, evolution, future generations, AGIs), however creating some of them may be rational. * Let's say that early Yudkowsky has influenced my transhumanist views. To be honest, I feel somewhat betrayed. Here my position is close to what Max More says. Basically, I value the opportunities, even if I don't like all the risks. * I agree that AI progress is really hard to stop. The scaling leaves possible algorithmic breakthroughs underexplored. There is so much to be done, I believe. The tech world will still be working on it even with mediocre hardware. So we are going to ASI anyway. * And all the alignment plans... Well, yes, they tend to be questionable. For me, creating human-like agency in AI (to negotiate with) is more about capabilities, but that's a different story.
1Kenoubi
P.S. Having this set of values and beliefs is very hard on one's epistemics. I think it's a writ-large version of what Eliezer has stated as "thinking about AI timelines is bad for one's epistemics". Here are some examples: (1) Although I've never been at all tempted by e/acc techno-optimism (on this topic specifically) / alignment isn't a problem at all / alignment by default, boy, it sure would be nice to hear about a strategy for alignment that didn't sound almost definitely doomed for one reason or another. Even though Eliezer can (accurately, IMO) shoot down a couple of new alignment strategies before getting out of bed in the morning. So far I've never found myself actually doing it, but it's impossible not to notice that if I just weren't as good at finding problems or as willing to acknowledge problems found by others, then some alignment strategies I've seen might have looked non-doomed, at least at first... (2) I don't expect any kind of deliberate slowdown of making AGI to be all that effective even on its own terms, with the single exception of indiscriminate "tear it all down", which I think is unlikely to get within the Overton window, at least in a robust way that would stop development even in countries that don't agree (forcing someone to sabotage / invade / bomb them). Although such actions might buy us a few years, it seems overdetermined to me that they still leave us doomed, and in fact they appear to cut away some of the actually-helpful options that might otherwise be available (the current crop of companies attempting to develop AGI definitely aren't the least concerned with existential risk of all actors who'd develop AGI if they could, for one thing). Compute thresholds of any kind, in particular, I expect to lead to much greater focus on doing more with the same compute resources rather than doing more by using more compute resources, and I expect there's a lot of low-hanging fruit there since that isn't where people have been focusing,
4AlphaAndOmega
I respectfully disagree on the first point. I am a doctor myself and given observable increase in investment in life extension (largely in well funded stealth startups or Google Calico), I have ~70% confidence that in the absence of superhuman AGI or other x-risks in the near term, we have a shot at getting to longevity escape velocity in 20 years. While my p(doom) for AGI is about 30% now, down from a peak of 70% maybe 2 years ago after the demonstration that it didn't take complex or abstruse techniques to reasonably align our best AI (LLMs), I can't fully endorse acceleration on that front because I expect the tradeoff in life expectancy to be net negative. YMMV, it's not like I'm overly confident myself at 70% for life expectancy being uncapped, and it's not like we're probably going to find out either. It just doesn't look like a fundamentally intractable problem in isolation.

Yes he should disclose somewhere that he's doing this, but deepfakes with the happy participation of the person whose voice is being faked seems like the best possible scenario.

Yes and no. The main mode of harm we generally imagine is to the person deepfaked. However, nothing prevents the main harm in a particular incident of harmful deepfaking from being to the people who see the deep fake and believe the person depicted actually said and did the things depicted.

That appears to be the implicit allegation here - that recipients might be deceived into th... (read more)

I've seen a lot of attempts to provide "translations" from one domain-specific computer language to another, and they almost always have at least one of these properties:

  1. They aren't invertible, nor "almost invertible" via normalization
  2. They rely on an extension mechanism intentionally allowing the embedding of arbitrary data into the target language
  3. They use hacks (structured comments, or even uglier encodings if there aren't any comments) to embed arbitrary data
  4. They require the source of the translation to be normalized before (and sometimes also after
... (read more)

Malbolge? Or something even nastier in a similar vein, since it seems like people actually figured out (with great effort) how to write programs in Malbolge. Maybe encrypt all the memory after every instruction, and use a real encryption algorithm, not a lookup table.

Some points which I think support the plausibility of this scenario:

(1) EY's ideas about a "simple core of intelligence", how chimp brains don't seem to have major architectural differences from human brains, etc.

(2) RWKV vs Transformers. Why haven't Transformers been straight up replaced by RWKV at this point? Looks to me like potentially huge efficiency gains being basically ignored because lab researchers can get away with it. Granted, affects efficiency of inference but not training AFAIK, and maybe it wouldn't work at the 100B+ scale, but it certainly... (read more)

I certainly don't think labs will only try to improve algorithms if they can't scale compute! Rather, I think that the algorithmic improvements that will be found by researchers trying to figure out how to improve performance given twice as much compute as the last run won't be the same ones found by researchers trying to improve performance given no increase in compute.

One would actually expect the low hanging fruit in the compute-no-longer-growing regime to be specifically the techniques that don't scale, since after all, scaling well is an existing cons... (read more)

5Zach Stein-Perlman
Thanks. idk. I'm interested in evidence. I'd be surprised by the conjunction (1) you're more likely to get techniques that scale better by looking for "fundamentally more efficient techniques that turn out to scale better too" and (2) labs aren't currently trying that.

Slowing compute growth could lead to a greater focus on efficiency. Easy to find gains in efficiency will be found anyway, but harder to find gains in efficiency currently don't seem to me to be getting that much effort, relative to ways to derive some benefit from rapidly increasing amounts of compute.

If models on the capabilities frontier are currently not very efficient, because their creators are focused on getting any benefit at all from the most compute that is practically available to them now, restricting compute could trigger an existing "efficie... (read more)

2Zach Stein-Perlman
I briefly discuss my skepticism in footnote 12. I struggle to tell a story about how labs would only pursue algorithmic improvements if they couldn't scale training compute. But I'm pretty unconfident and contrary opinions from people at major labs would change my mind.

I can actually sort of write the elevator pitch myself. (If not, I probably wouldn't be interested.) If anything I say here is wrong, someone please correct me!

Non-realizability is the problem that none of the options a real-world Bayesian reasoner is considering is a perfect model of the world. (It actually information-theoretically can't be, if the reasoner is itself part of the world, since it would need a perfect self-model as part of its perfect world-model, which would mean it could take its own output as an input into its decision process, but th... (read more)

1Lorxus
This seems approximately correct as the motivation, which IMO is expressible/ cashable-out in several isomorphic ways. (In that, in Demiurgery, in distributions over game-tree-branches, in expected utility maximinning...)

Let's say that I can understand neither the original IB sequence, nor your distillation. I don't have the prerequisites. (I mean, I know some linear algebra - that's hard to avoid - but I find topology loses me past "here's what an open set is" and I know nothing about measure theory.)

I think I understand what non-realizability is and why something like IB would solve it. Is all the heavy math actually necessary to understand how IB does so? I'm very tempted to think of IB as "instead of a single probability distribution over outcomes, you just keep a (c... (read more)

4cubefox
I think what's really needed would be a short single page introduction. Sort of an elevator pitch. Alternatively a longer non-technical explanation for dummies, similar to Yudkowsky's posts in the sequences. This would get people interested. It's unlikely to be motivated to dive into a 12k words math heavy paper without any prior knowledge of what the theory promises to accomplish.

I was wondering if anyone would mention that story in the comments. I definitely agree that it has very strong similarities in its core idea, and wondered if that was deliberate. I don't agree with any implications (which you may or may not have intended) that it's so derivative as to make not mentioning Omelas dishonest, though, and independent invention seems completely plausible to me.

Edited to add: although the similar title does incline rather strongly to Omelas being an acknowledged source.

4Richard_Ngo
So actually the main reason I didn't mention it being a rewrite of Omelas is because I did a typical-mind fallacy and assumed it would be obvious. Will edit to mention in intro.

It seems like there might be a problem with this argument if the true are not just unknown, but adversarially chosen. For example, suppose the true are the actual locations of a bunch of landmines, from a full set of possible landmine positions . We are trying to get a vehicle from A to B, and all possible paths go over some of the . We may know that the opponent placing the landmines only has landmines to place. Furthermore, suppose each landmine only goes off with some probability even if the vehicle drives over it. If we can mechanist... (read more)

Answer by Kenoubi30

I like this frame, and I don't recall seeing it already addressed.

What I have seen written about deceptiveness generally seems to assume that the AGI would be sufficiently capable of obfuscating its thoughts from direct queries and from any interpretability tools we have available that it could effectively make its plans for world domination in secret, unobserved by humans. That does seem like an even more effective strategy for optimizing its actual utility function than not bothering to think through such plans at all, if it's able to do it. But it's ha... (read more)

Hmm. My intuition says that your A and B are "pretty much the same size". Sure, there are infinitely many times that they switch places, but they do so about as regularly as possible and they're always close.

If A is "numbers with an odd number of digits" and B is "numbers with an even number of digits" that intuition starts to break down, though. Not only do they switch places infinitely often, but the extent to which one exceeds the other is unbounded. Calling A and B "pretty much the same size" starts to seem untenable; it feels more like "the concept of... (read more)

I think this comment demonstrates that the list of reacts should wrap, not extend arbitrarily far to the right.

The obvious way to quickly and intuitively illustrate whether reactions are positive or negative would seem to be color; another option would be grouping them horizontally or vertically with some kind of separator. The obvious way to quickly and intuitively make it visible which reactions were had by more readers would seem to be showing a copy of the same icon for each person who reacted a certain way, not a number next to the icon.

I make no claim that either of these changes would be improvements overall. Clearly the second would require a way to handl... (read more)

In the current UI, the list of reactions from which to choose is scrollable, but that's basically impossible to actually see. While reading the comments I was wondering what the heck people were talking about with "Strawman" and so forth. (Like... did that already get removed?) Then I discovered the scrolling by accident after seeing a "Shrug" reaction to one of the comments.

I've had similar thoughts. Two counterpoints:

  • This is basically misuse risk, which is not a weird problem that people need to be convinced even needs solving. To the extent AI appears likely to be powerful, society at large is already working on this. Of course, its efforts may be ineffective or even counterproductive.

  • They say power corrupts, but I'd say power opens up space to do what you were already inclined to do without constraints. Some billionaires, e.g. Bill Gates, seem to be sincerely trying to use their resources to help people. It isn't har

... (read more)

On SBF, I think a large part of the issue is that he was working in an industry called cryptocurrency that is basically has fraud as the bedrock of it all. There was nothing real about crypto, so the collapse of FTX was basically inevitable.

I don't deny that the cryptocurrency "industry" has been a huge magnet for fraud, nor that there are structural reasons for that, but "there was nothing real about crypto" is plainly false. The desire to have currencies that can't easily be controlled, manipulated, or implicitly taxed (seigniorage, inflation) by gove... (read more)

2Noosphere89
More specifically, the issue with crypto is that the benefits are much less than promised, and there's a whole lot of bullshit claims on crypto like it being secure or not manipulatable. On one example of why cryptocurrencies fail as an a currency, one of it's problems is that it's fixed supply and no central entity means the value of that currency swings wildly, which is a dealbreaker for any currency. Note, this is just one of the many, fractal problems here with crypto. Crypto isn't all fraud. There's reality, but it's built out of unsound foundations and trying to sell a fake castle to others.
Kenoubi1-1

Thank you for writing these! They've been practically my only source of "news" for most of the time you've been writing them, and before that I mostly just ignored "news" entirely because I found it too toxic and it was too difficult+distasteful to attempt to decode it into something useful. COVID the disease hasn't directly had a huge effect on my life, and COVID the social phenomenon has been on a significant decline for some time now, but your writing about it (and the inclusion of especially notable non-COVID topics) have easily kept me interested enou... (read more)

2Adam Zerner
I disagree with this part. It might be somewhat valuable, but I think Zvi's talents would be significantly better applied elsewhere.

I found it to be a pretty obvious reference to the title. SPAM is a meatcube. A meatcube is something that has been processed into uniformity. Any detectable character it had, whether faults, individuality, or flashes of brilliance, has been ground, blended, and seasoned away.

Load More