Comments

Sorted by
gwernΩ213821

I'm not sure I see any difference here between regular dangerously convergent instrumental drives and this added risk of 'intrinsic' drives. They just seem like the same thing to me. Like the two predictions you give seem already true and fulfilled:

Relative to other goals, agentic systems are easy to steer to seek power.

Agentic systems seek power outside of the “training distribution”, but in ways which don’t seem to be part of larger power-seeking plans.

Both of these seem like I would expect from a flexible, intelligent agent which is capable of handling many complicated changing domains, like a LLM: they are easy to steer to seek power (see: all the work on RLHF and the superficiality of alignment and ease of steering and low-dimensional embeddings), and they can execute useful heuristics even if those cannot be easily explained as part of a larger plan. (Arguably, that's most of what they do currently.) In the hypotheticals you give, the actions seem just like a convergent instrumental drive of the sort that an agent will rationally develop in order to handle all the possible tasks which might be thrown at it in a bewildering variety of scenarios by billions of crazy humans and also other AIs. Trying to have 'savings' or 'buying a bit of compute to be safe', even if the agent cannot say exactly what it would use those for in the current scenario, seems like convergent, and desirable, behavior. Like buying insurance or adding validation checks to some new code, usually it won't help, but sometimes the prudence will pay off. As humans say, "shit happens". Agents which won't do that and just helplessly succumb to hardware they know is flaky or give up the moment something is a little more than expensive than average or write code that explodes the instant you look at it funny because you didn't say "make sure to check for X Y & Z" - those agents are not good agents for any purpose.

If there are 'subshards' which achieve this desirable behavior because they, from their own perspective, 'intrinsically' desire power (whatever that sort of distinction makes when you've broken things down that far), and it is these subshards which implement the instrumental drive... so what? After all, there has to be some level of analysis at which an agent stops thinking about whether or not it should do some thing and just starts doing the thing. Your muscles "intrinsically desire" to fire when told to fire, but the motor actions are still ultimately instrumental, to accomplish something other than individual muscles twitching. You can't have 'instrumental desire' homunculuses all the way down to the individual transistor or ReLU neuron.

gwern82

Possibly it will still be counterintuitive to many folks, as Said quoted in a sibling comment.

No, this is a little different. Your approach here sounds like ours and the intuitive one (just at the cost of additional complexity).

The 'auto dark mode' we abandoned is where you just use step #2 there and you skip #1 (and thus, any widget or toggle which enables a reader to do anything with localStorage), and 'auto is the only state'. The logic there is, the reader already has access to a widget or toggle to set their dark mode preference: it's just their OS/browser, which will have some config page somewhere with various settings like 'turn on dark mode at night' or 'always use dark mode' or 'always use light mode'. Just trust the OS/browser and use whatever setting it sends to the web page. Don't waste the effort and screen real estate to add in a redundant widget/toggle. It's handled already. Easier for everyone - it Just Works™!

Unfortunately, the connection between 'a year ago when I updated my Android phone and it asked me if I wanted to use the cool new dark mode, I said yes' and 'this webpage I am reading now is in dark mode for some reason, and I can't change it back to normal???', apparently winds up eluding some readers. (This is what Said's sibling comment is about.) It winds up being "too much magic".

The current toggle+localStorage+auto approach, on the other hand, while adding to the clutter, does not seem to confuse readers: "the page is in dark-mode, for some reason. But I want want light-mode and I am unhappy. I see a little light-mode button. I push that button. Now the page is in light-mode. I am happy." (And then it is light-mode ever after.) At least, I have seen many fewer (or no) complaints about the dark mode being on when it shouldn't be after we moved to the toggle. So as far as we can tell, it's working.

gwern40

You might be interested in a small "hybrid LLM" proposal I wrote for using diffusion on embeddings for then decoding/sampling.

gwern6712

At first look, I like your design a lot. Even though I am required to (because "imitation is the sincerest flattery"), it has its own fairly unique style which doesn't immediately remind me of anything else. I like the playfulness and use of some color. I am also impressed by your design writeup: you have covered far more than I would have expected and definitely thought it through. I may wind up stealing some ideas here.

More miscellaneous observations:

  • dark mode selector: you use a 2-state light vs dark selector. This is the obvious thing, but we think that it is ultimately wrong and you need a three-state selector to accommodate "auto". I think this is especially important given how many people now read websites like yours or mine on their smartphones, often at night or in bed, and just assume now that everything will use dark-mode as appropriate. (I'm sure you've seen many screenshots of Gwernnet on Twitter at this point, and noticed that they are almost always smartphones, and then much of the time, dark-mode. I am pretty sure that in most of those cases, it is not because the reader specifically opted-into dark-mode forever, but simply because auto fired, and readers take it for granted. I don't expect auto to become any less common, and the 2-state selector will just get more inappropriate and defaulting to the wrong thing. We have also noted that increasingly, websites are choosing 3-state rather than 2-state the past 2 years, often with nearly-identical semantics & icons, and take this as confirmation of our earlier choice.)

  • Smallcaps acronyms: I did the same thing originally but ultimately removed them. They wound up adding a lot to the page, and while they initially (ahem) looked cool and fancy, they alienated readers and over time I just kept noticing them and feeling more and more alienated by them. Smallcaps may be "proper" typographically, but I think that ship has sailed: we read so little material with acronyms typeset in true smallcaps, that it now achieves the opposite of the intended effect - it's the 'NASA' which is smallcapsed which is bizarre and alien looking, not the regular old 'NASA'. Is it worth spending "weirdness points" on? I ultimately felt not.

  • Color: You mention that link-icons can be chaotic if colored. I agree, but in your case, I think you have a lot of scope to be playful with color.

    For example, you went to a lot of trouble to separate the dropcaps and enable the fun colored dropcaps.... but then don't use the colored ones anywhere (right?). So why not make the dropcaps colored... on hover?

    In fact, why not make 'fun on hover' a core design principle? "If not friend, why friend-shaped?" Make everything on the site a little friend you can play with. (This would be a good time to try to write down a few catchphrases or design principles to sum up your goals here. Why dropcaps or the animated pond logo? etc) When I look at your pond, I feel like it would be wonderful if the pond was animated on hover - if when I hovered, then it was animated.

    Right now, it feels a bit awkward. It's animated just enough to bother me in the corner of my eye, but not enough to consciously notice it. It is also too small, IMO. The detail is illegible at this size, beautiful as the fullsize version is. (What looks good at large size almost never looks as good at small size, like line-height or less, and needs to be redone. This is part of why link-icons are hard.*) Also, in the long run, I think you are better off looking into generative pixel art for adding more images/video in that style. You may think you are willing to pay $270 each time, I'm sure you could afford lots of them for something as close to your heart as your personal website - but you're not. The cost and time will gradually deter you and inherently create a scarcity mindset, sabotaging your creativity and playfulness and willingness to go "wouldn't it be fun if...?". Beware more than trivial inconveniences! This is a website design which would benefit from fun little pixel art motifs all over the place, and you want to be able to flip over to your generative tool as soon as an idea for a trout element hits you and start creating it. You don't have to go all Yamauchi No.10 Family Office on the reader, but for this sort of cozy playful design, I think the more the better, so there's a feeling of always something cute around the corner.

    You have a nice fleuron footer. But wouldn't it be so much niftier if that fish were cheerfully animated once I hover over it, and it does little trout flips around my cursor? And if the fleurons became brighter blue or richer texture and more water-like?

    And wouldn't it be nice if all of the trout link-icons also turned blue on hover? (I think the trout link-icon spacing is a bit off, incidentally. The Youtube link icon is also definitely bad with the "YouTube's logo is definitely red" example - way too close to the 's'.) We have recently implemented link-icon colors on Gwernnet (some background), and while I'm still not sure how appropriate it is for Gwernnet or if it needs to be rethought, I feel it's very appropriate for your design.

    Lots of things you could do with it. For example, you could have a gentle "breathing" cycle of all of the colors, similar to some of Apple's light icons - the page could use JS to very slowly cycle through the default color-less version to the hover versions and back. (Perhaps just for the first minute, or perhaps instead after a few minutes, whatever feels more esthetic.) And Pope suggests that for the AI risk articles, like empowerment, you could have the eyes turn red at random times.

    Or you could define the hover colors to be a 'theme' and have different parts of the site have different themes. Theming is a classic thing to do with websites (see eg GreaterWrong). For example, the same way that Gwernnet has different dropcaps for different subjects - the dropcats for the cat essays, the yinit for technical articles, the Goudy for biology, cheshire for literature etc - you could have, I don't know, personal stuff be yellow, technical AI be blue, humanities stuff be green, and so on.

  • Might note "callouts" are also called "admonitions".

  • Visual regression testing: you can also check snapshots of the raw HTML too. Since you are trying to bake a lot into the HTML, this should work well for you and complement the image approach. This can be as just downloading some URLs and running diff against a directory of older downloads. I implemented this a few months ago and it was easy to implement and has given me more confidence when I review the lorem unit-test pages to check that any changes in the final HTML make sense.

  • I notice way down in the footer a backlinks section, but doesn't seem to be covered in the design page yet? Also, possible bug: the backlinks section of the design page includes... the design page?

  • "Text transformers" seems like a risky terminology choice, especially given your profession & site content. I know I did a double-take when skimming - "he's using text transformers? ooh how interesting - oh wait." Maybe just call them "compilers" or something.

  • Collapses: I prefer collapses to not require clicks because it reduces friction. I think this is especially true of the Table of Contents - if you don't display that by default (which seems like a bad choice on long pages like the design page), at least make them as easy as possible to access!

  • List indentation: your lists do not indent the contents / outdent the list marker. Is that deliberate? (Actually, is this even consistent? It felt like I saw it happening somewhere but not other places...)

  • Overall clutter: on reflection, I agree with the other comments that right now the pages have some degree of clutter. Just doing too much.

    An example here would be the underlining in the superscripted counter of the dates like "Published on October 31st, 2024" - it really jumps out at you, when you look at the date line, the 'st' is the first thing you read. This is bad because this is neither in line with the semantics of the rest of the page, where underlining always denotes a hyperlink, nor is it decorative in a way which improves the rest of the page appearance or is consistent with the blue-pixel-art-book esthetic. The 'st' shouldn't be underlined, it should if anything be even smaller or faded out.

    Another example would be the slashed-zeros: the slash is somewhat distracting and overloaded on its own and questionable at best (this isn't source code or raw data where confusing 'O'/'0' can be catastrophic) but combined with the zeros also being funny little squashed zeros, you have this overloaded effect where the zeros all over the page keep popping out at you from the corner of your eye or while scanning.

    Then you have all of the other flourishes like the swashes for the capital 'Q'... It's just too much. You can have lots of semantics, like the link-icons, or you can have lots of decoration, but you can't have both, not if they are going to often be on the screen together. (Like just at the top of the design page, you're being hit with logos, toggles, faded out text, underlined superscripts, doubly-variant common letters (8 instances of '0' alone), very fancy capital swashes, dropcaps, collapses with icon+chevrons+backgrounds, screenshots inline without a clear border (and everything inside the screenshots tugging at the eye), 2 link-icons, monospace+italics+bold+roman...)

  • Collapses: the '>' for the disclosure toggles seem oddly offset, and just above the midline enough to look like a bug. Either commit to it being superscript or make it exactly middle-looking/inline.

Overall, best new personal website I've seen in a while: ★★★★☆.

I look forward to it being tidied up some more, and seeing what clever new touches you put on it as you keep evolving it and presumably can experiment with things like LLM rewrites or integration or add more pixel art, so I can add that last star. :)

* There is a semi-famous game development anecdote about this effect, about how John Romero's Daikatana wound up shipping so late due to poor management: an artist proudly showed off the multi-thousand-pixel art of the fancy sword they had been slaving away for a while on. The person pointed out to the artist that the sword in question was going to be rendered at like 64x64 pixels, and every detail was going to be invisible when resized, and it was going to look like s---t and so they had just wasted all that work, were going to have to throw it away and start from scratch, and they had fallen that much further behind schedule. A small image is not a large image with fewer pixels, and pixel art is not a drawing with blockier points. · This affects a lot of things - like part of why our new link-icon color feature is so difficult to implement well is that a color which looks fine as a big logo will look totally different as a thin line of a few pixels. It's really quite surprising to me how different things can look when you scale them way down. Something that is clearly purple when I clone it from Paul Graham's website will turn into a 'white' line when I use it as the link color for pg links, or some blue that is medium-colored as a page background will become jet black. So even after the considerable manual labor of getting all of the right colors defined, you still have to do esoteric colorspace transforms to ensure they look right, and I think we're going to have to adjust a bunch of them on top of that as well, once we have time and I can catch up post-Dwarkesh-Patel interview etc. /sigh Good web design is only easy if you don't really care about good results.

gwern40

Coffee culture in America doesn't have much to do with the Revolutionary War. The rise of coffee is much later than the American Revolution. The brief boycott didn't last (after all, Americans - infamous smugglers in general - were smuggling plenty of tea because of the taxes, so sourcing tea was not a problem) and there was enormous consumption of tea consistently throughout: https://en.wikipedia.org/wiki/American_tea_culture#Colonial_and_Revolutionary_eras In fact, I was surprised to learn recently that American tea was overwhelmingly green tea in the 1800s, and one of the biggest export markets for green tea worldwide.

(This was really surprising to me, because if you look around the 1900s, even as late as the 1990s, black tea is the standard American tea; all iced tea is of course black tea, and your local grocery store would be full of mostly just black teas with a few token green teas, and exactly one oolong tea if you were lucky - as I found out the hard way when I became interested in non-black teas.)

gwern120

It can be both, of course. Start with process supervision but combine it with... something else. It's hard to learn how to reason from scratch, but it's also clearly not doing pure strict imitation learning, because the transcripts & summaries are just way too weird to be any kind of straightforward imitation learning of expert transcripts (or even ones collected from users or the wild).

gwern345

Also worth noting Dustin Moskowitz was a prominent enough donor this election cycle, for Harris, to get highlighted in news coverage of her donors: https://www.washingtonexaminer.com/news/campaigns/presidential/3179215/kamala-harris-influential-megadonors/ https://www.nytimes.com/2024/10/09/us/politics/harris-billion-dollar-fundraising.html

gwern1414

There's no way I can meaningfully pick from like 100 covers. Pick 5 or 10, max, if you expect meaningful votes from people.

gwern32

The extensive effort they make to integrate into legacy systems & languages shows how important that is.

gwern52

codyz is doubling down on the UFO claims, but as far as I can see, the case has fallen apart so completely no one even wants to discuss it and even Tyler Cowen & Robin Hanson have stopped nudge-nudge-wink-winking it for now.

So I hereby double my unilateral bet to $2,000.

Load More