Best of LessWrong 2021

In this short story, an AI wakes up in a strange environment and must piece together what's going on from limited inputs and outputs. Can it figure out its true nature and purpose?

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
avturchin110
0
Lifehack: If you're attacked by a group of stray dogs, pretend to throw a stone at them. Each dog will think you're throwing the stone at it and will run away. This has worked for me twice.
habryka370
43
After many years of pain, LessWrong now has fixed kerning and a consistent sans-serif font on all operating systems. You have probably seen terrible kerning like this over the last few years on LW:  It really really looks like there is no space between the first comma and "Ash". This is because Apple has been shipping an extremely outdated version of Gill Sans with terribly broken kerning, often basically stripping spaces completely. We have gotten many complaints about this over the years. But it is now finally fixed. However, changing fonts likely has many downstream effects on various layout things being broken in small ways. If you see any buttons or text misaligned, let us know, and we'll fix it. We already cleaned up a lot, but I am expecting a long tail of small fixes.
AFAICT, approximately every "how to be good at conversation" guide says the same thing: conversations are basically a game where 2+ people take turns free-associating off whatever was said recently. (That's a somewhat lossy compression, but not that lossy.) And approximately every guide is like "if you get good at this free association game, then it will be fun and easy!". And that's probably true for some subset of people. But speaking for myself personally... the problem is that the free-association game just isn't very interesting. I can see where people would like it. Lots of people want to talk to other people more on the margin, and want to do difficult thinky things less on the margin, and the free-association game is great if that's what you want. But, like... that is not my utility function. The free association game is a fine ice-breaker, it's sometimes fun for ten minutes if I'm in the mood, but most of the time it's just really boring.
"weak benevolence isn't fake": https://roamresearch.com/#/app/srcpublic/page/ic5Xitb70 * there's a class of statements that go like: * "fair-weather friends" who are only nice to you when it's easy for them, are not true friends at all * if you don't have the courage/determination to do the right thing when it's difficult, you never cared about doing the right thing at all * if you sometimes engage in motivated cognition or are sometimes intellectually lazy/sloppy, then you don't really care about truth at all * if you "mean well" but don't put in the work to ensure that you're actually making a positive difference, then your supposed "well-meaning" intentions were fake all along * I can see why people have these views. * if you actually need help when you're in trouble, then "fair-weather friends" are no use to you * if you're relying on someone to accomplish something, it's not enough for them to "mean well", they have to deliver effectively, and they have to do so consistently. otherwise you can't count on them. * if you are in an environment where people constantly declare good intentions or "well-meaning" attitudes, but most of these people are not people you can count on, you will find yourself caring a lot about how to filter out the "posers" and "virtue signalers" and find out who's true-blue, high-integrity, and reliable. * but I think it's literally false and sometimes harmful to treat "weak"/unreliable good intentions as absolutely worthless. * not all failures are failures to care enough/try hard enough/be brave enough/etc. * sometimes people legitimately lack needed skills, knowledge, or resources! * "either I can count on you to successfully achieve the desired outcome, or you never really cared at all" is a long way from true. * even the more reasonable, "either you take what I consider to be due/appropriate measures to make sure you deliver, or you never really cared at all" isn't always true either!
I'm running a quant trading bootcamp at Lighthaven (in Berkeley) Nov 6-10. This is my first time trying the extended weekend model; it starts Wednesday night so if you're local you only have to take off Thursday and Friday. You can register here, or check out the LessWrong event here. The course covers the fundamentals of quant trading (markets, order books, auctions, risk and sizing, adverse selection, arbitrage, how quant trading firms make money). In terms of vibes, it's a cross between the Jane Street trading internship, Manifest, and summer-camp-style colorwar.  Also, if you check "LessWrong" for "How did you hear about this bootcamp?" you get a $150 discount.

Popular Comments

Recent Discussion

Epistemic status: model-building based on observation, with a few successful unusual predictions. Anecdotal evidence has so far been consistent with the model. This puts it at risk of seeming more compelling than the evidence justifies just yet. Caveat emptor.


Imagine you're a very young child. Around, say, three years old.

You've just done something that really upsets your mother. Maybe you were playing and knocked her glasses off the table and they broke.

Of course you find her reaction uncomfortable. Maybe scary. You're too young to have detailed metacognitive thoughts, but if you could reflect on why you're scared, you wouldn't be confused: you're scared of how she'll react.

She tells you to say you're sorry.

You utter the magic words, hoping that will placate her.

And she narrows her eyes in suspicion.

"You...

2Tao Lin
I'm often surprised how little people notice, adapt to, or even punish self deception. It's not very hard to detect when someone's deceiving them self, people should notice more and disincentivise that
Ratios21

This reads to me as, "We need to increase the oppression even more."

1Keenan Pepper
  AKA integrating the ego-dystonic into the homunculus
2Valentine
Yep. I'm not sure why you think this is a "very different" conclusion. I'd say the same thing about myself. The key question is how to handle the cases where becoming conscious of a "bad PR" motivation means it might get exposed. And you answer that! In part at least. You divide people into three categories based on (a) whether you need occlumency with them at all and (b) whether you need to use occlumency on the fact that you're using occlumency. I don't think of it in terms this explicit, but it's pretty close to what I do now. People get to see me to the extent that I trust them with what I show them. And that's conscious. Am I misunderstanding you somehow?   I both agree and partly disagree. I tagged your comment with where. Totally, yes, having a real and meaningful shared problem means we want a truth-seeking community. Strong agreement. But I think how we "strive" to be truth-seeking might be extremely important. If it's a virtue instead of an engineering consideration, and if people are shamed or punished for having non-truth-seeking behaviors, then the collective "striving" being talked about will encourage individual self-deception and collective untalkaboutability. It's an example of inducing adaptive entropy. Relatedly: mathematicians don't have truth-seeking collaboration because they're trying hard to be truth-seeking. They're trying to solve problems, and they can verify whether their proposed solutions actually solve the problems they're working on. That means truth-seeking is more useful for what they're doing than any alternatives are. There's no need for focusing on the Virtue of Seeking Truth as a culture. Likewise, there's no Virtue of Using a Hammer in carpentry. What puts someone in category 2 or 3 for me isn't something I can strive for. It's more like, I can be open to the possibility and be willing to look for how they and I interact. Then I discover how my trust of them shifts. If I try to trust people more than I do, I end up in

This is an entry in the 'Dungeons & Data Science' series, a set of puzzles where players are given a dataset to analyze and an objective to pursue using information from that dataset.

Estimated Complexity: 4/5  (this is a guess, I will update based on feedback/seeing how the scenario goes)

STORY

The Demon King rises in his distant Demon Castle.  Across the free lands of the world, his legions spread, leaving chaos and death in their wake.  The only one who can challenge him is the Summoned Hero, brought by the Goddess Herself from a distant world to aid this one in its time of need.  The Summoned Hero must call together all the free peoples of the world under their banner, to triumph united where they would surely fall...

A thanks a lot. I was actually working through the earlier scenarios, I just missed that I new one had popped up. Subscribed now, then I will hopefully notice the next one.
 

Also, my approach didn't work this time, I ended up trying with a way too complicated model. I really like how the actual answer to this one worked.

We’re coming out firmly against it.

Our attitude:

Image

The customer is always right. Yes, you should go ahead and fix your own damn pipes if you know how to do that, and ignore anyone who tries to tell you different. And if you don’t know how to do it, well, it’s at your own risk.

With notably rare exceptions, it should be the same for everything else.

I’ve been collecting these for a while. It’s time.

Campaign Talk

Harris-Walz platform includes a little occupational licensing reform, as a treat.

Universal Effects and Recognition

Ohio’s ‘universal licensing’ law has a big time innovation, which is that work experience outside the state actually exists and can be used to get a license (WSJ).

Occupational licensing decreases the number of Black men in licensed professions by up to 19%,...

Lifehack: If you're attacked by a group of stray dogs, pretend to throw a stone at them. Each dog will think you're throwing the stone at it and will run away. This has worked for me twice.

Recent discussions about artificial intelligence safety have focused heavily on ensuring AI systems remain under human control. While this goal seems laudable on its surface, we should carefully examine whether some proposed safety measures could paradoxically enable rather than prevent dangerous concentrations of power.

The Control Paradox

The fundamental tension lies in how we define "safety." Many current approaches to AI safety focus on making AI systems more controllable and aligned with human values. But this raises a critical question: controllable by whom, and aligned with whose values?

When we develop mechanisms to control AI systems, we are essentially creating tools that could be used by any sufficiently powerful entity - whether that's a government, corporation, or other organization. The very features that make an AI system "safe" in terms...

Yeah, this is my main risk scenario. But I think it makes more sense to talk about imbalance of power, not concentration of power. Maybe there will be one AI dictator, or one human+AI dictator, or many AIs, or many human+AI companies; but anyway most humans will end up at the bottom of a huge power differential. If history teaches us anything, this is a very dangerous prospect.

It seems the only good path is aligning AI to the interests of most people, not just its creators. But there's no commercial or military incentive to do that, so it probably won't happen by default.

6Vladimir_Nesov
A likely answer is "an AI".
2Noosphere89
This honestly depends on the level of control achieved over AI in practice. I do agree with the claim that there are pretty strong incentives to have AI peacefully takeover everything, but this is a long-term incentive, and more importantly if control gets good enough, at least some people would wield control of AI because of AIs wanting to be controlled by humans, combined with AI control strategies being good enough that you might avoid takeover at least in the early regime. To be clear, in the long run, I expect an AI to likely (as in 70-85% likely) to wield the fruits of control, but I think that humans will at least at first wield the control for a number of years, maybe followed by uploads of humans, like virtual dictators and leaders next in line for control.
6Vladimir_Nesov
The point is that the "controller" of a "controllable AI" is a role that can be filled by an AI and not only by a human or a human institution. AI is going to quickly grow the pie to the extent that makes current industry and economy (controlled by humans) a rounding error, so it seems unlikely that among the entities vying for control over controllable AIs, humans and human institutions are going to be worth mentioning. It's not even about a takeover, Google didn't take over Gambia.

To update our credence on whether or not LLMs are conscious, we can ask how many of the Butlin/Long indicator properties for phenomenal consciousness are satisfied by LLMs. To start this program, I zoomed in on an indicator property that is required for consciousness under higher-order theory, nicknamed “HOT-2”: Metacognitive monitoring distinguishing reliable perceptual representations from noise. Do today’s LLMs have this property, or at least have the prerequisites such that future LLMs may develop it?

In this post, I’ll describe my first-pass attempt at answering this question. I surveyed the literature on LLM capabilities relevant to HOT-2, namely LLM metacognition, confidence calibration and introspection. There is evidence that LLMs have at least rudimentary versions of each capability. Still, the extent to which the exact results of the experiments translate...

author on Binder et al. 2024 here. Thanks for reading our paper and suggesting the experiment!

To summarize the suggested experiment:

  • Train a model to be calibrated on whether it gets an answer correcct.
  • Modify the model (e.g. activation steering). This changes the model's performance on whether it gets an answer correct.
  • Check if the modified model is still well calibrated.

This could work and I'm excited about it. 

One failure mode is that the modification makes the model very dumb in all instances. Then its easy to be well calibrated on all these instanc... (read more)

2Nathan Helm-Burger
Interesting stuff! I've been dabbling in some similar things, talking with AE Studio folks about LLM consciousness.
3Garrett Baker
Sorry to give only a surface-level point of feedback, but I think this post would be much, much better if you shortened it significantly. As far as I can tell, pretty much every paragraph is 3x longer than it could be, which makes it a slog to read through.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Developments around relationships and dating have a relatively small speed premium, so I figured I would wait until I had a full post worth of them.

Indeed I now present such a post, in which I present several theories as to why so many of you might still be single.

While I am my usual opinionated self, I am not going to be offering a section of my list of related Good Advice. That would be its own project, which may or may not happen at some time in the future. There is still much in the way of practical implications or implied advice throughout.

You’re Single Because You’re Not Even Trying

A 2022 sample of singles is out, and charts are available, so that seems like a good place to...

Same. It would take incredible effort to find one person I reasonably connect with each year. 

So much of this is just location. I've met 100s of people over the last few years. Nearly all either over 40 with kids, or those kids. I've connected with many, maybe 10%, on a pretty good level. That doesn't help with dating at all.

I just really, really don't want it to be the case that he only answer is: move to NY, SF, or Seattle, becuase I really like it here.

We begin with three stories about three people.

First, Zhu Di, emperor of China from 1402 to 1424. In that period, it was traditional for foreign envoys to present gifts to the emperor and make a show of submission, reinforcing the emperor’s authority and China’s image as the center of civilization. Yet the emperor would send the envoys off with gifts in return, often worth more than the gifts the envoy had given - suggesting that the emperor’s authority and dominance did not actually translate into much bargaining power.

Second, Kevin Systrom, one of the founders of Instagram. When Instagram was bought by Facebook for $1B in 2012, it had only 13 employees. Systrom presumably found himself with a great deal of money, the most direct form of bargaining...

However, though dominance is hard-coded, it seems like something of a simple evolved hack to avoid costly fights among relatively low-cognitive-capability agents; it does not seem like the sort of thing which more capable agents (like e.g. future AI, or even future more-intelligent humans) would rely on very heavily.

This seems exactly reversed to me. It seems to me that since dominance underlies defense, law, taxes and public expenditure, it will stay crucial even with more intelligent agents. Conversely, as intelligence becomes "too cheap to meter", "getting what you want" will become less bottlenecked on relevant insights, as those insights are always available.

In an attempt to get myself to write more here is my own shortform feed. Ideally I would write something daily, but we will see how it goes.

I use Google Chrome on Ubuntu Budgie and it does look to me like both the font and the font size changed.

4Said Achmiz
Well, let’s see. Calibri is a humanist sans; Gill Sans is technically also humanist, but more more geometric in design. Geometric sans fonts tend to be less readable when used for body text. Gill Sans has a lower x-height than Calibri. That (obviously) is the cause of all the “the new font looks smaller” comments. (A side-by-side comparison of the fonts, for anyone curious, although note that this is Gill Sans MT Pro, not Gill Sans Nova, so the weight [i.e., stroke thickness] will be a bit different than the version that LW now uses.) Now, as far as font rendering goes… I just looked at the site on my Windows box (adjusting the font stack CSS value to see Gill Sans Nova again, since I see you guys tweaked it to give Calibri priority)… yikes. Yeah, that’s not rendering well at all. Definitely more blurry than Calibri. Maybe something to do with the hinting, I don’t know. (Not really surprising, since Calibri was designed from the beginning to look good on Windows.) And I’ve got a hi-DPI monitor on my Windows machine… Interestingly, the older version of Gill Sans (seen in the demo on my wiki, linked above) doesn’t have this problem; it renders crisply on Windows. (Note that this is not the flawed, broken-kerning version of the font that comes with Macs!) I also notice that the comment font size is set to… 15.08px. Seems weird? Bumping it up to 16px improves things a bit, although it’s still not amazing. If you can switch to the older (but not broken) version of Gill Sans, that’d be my recommendation. If you can’t… then one option might be to check out one of the many similar fonts to see if perhaps one of them renders better on Windows while still having matching metrics.
2Said Achmiz
Yeah, I agree with that, but that’s because of a post body font that wasn’t chosen for suitability for comments also. If you pick, to begin with, a font that works for both, then it’ll work for both. … of course, if you don’t think that any of the GW themes’ fonts work for both, then never mind, I guess. (But, uh, frankly I find that to be a strange view. But no accounting for taste, etc., so I certainly can’t say it’s wrong, exactly.)
2habryka
Sure, I was just responding to this literal quote: