In no particular order, because interestingness is multi-dimensional and they are probably all to some degree on my personal interesting Pareto frontier:
Random thought: maybe (at least pre-reasoning-models) LLMs are RLHF'd to be "competent" in a way that makes them less curious & excitable, which greatly reduces their chance of coming up with (and recognizing) any real breakthroughs. I would expect though that for reasoning models such limitations will necessarily disappear and they'll be much more likely to produce novel insights. Still, scaffolding and lack of context and agency can be a serious bottleneck.
Interestingly, the text to speech conversion of the "Text does not equal text" section is another very concrete example of this:
Downvoted for 3 reasons:
Or as a possible more concrete prompt if preferred: "Create a cost benefit analysis for EU directive 2019/904, which demands that bottle caps of all plastic bottles are to remain attached to the bottles, with the intention of reducing littering and protecting sea life.
Output:
key costs and benefits table
economic cost for the beverage industry to make the transition
expected change in littering, total over first 5 years
QALYs lost or gained for consumers throughout the first 5 years"
In the EU there's some recent regulation about bottle caps being attached to bottles, to prevent littering. (this-is-fine.jpg)
Can you let the app come up with a good way to estimate the cost benefit ratio of this piece of regulation? E.g. (environmental?) benefit vs (economic? QALY?) cost/drawbacks, or something like that. I think coming up with good metrics to quantify here is almost as interesting as the estimate itself.
I have the vague impression that this is true for me as well, and I remember having made that same claim (that spontaneous conversations at conferences seem maybe most valuable) to a friend when traveling home from an EAGx. My personal best guess: planned conversations are usually 30 minutes long, and while there is some interest based filtering going on, there's usually no guarantees you vibe well with the person. Spontaneous encounters however have pretty variable length, so the ones where you're not vibing will just be over naturally quickly, whereas th...
I made a somewhat similar point in a post earlier this year, but much more superficial and less technical. So it was nice to read your deeper exploration of the topic.
Almost two years after writing this post, this is still a concept I encounter relatively often. Maybe less so in myself, as, I like to think, I have sufficiently internalized the idea to not fall into the "fake alternative trap" anymore very often. But occasionally this comes up in conversations with others, when they're making plans, or we're organizing something together.
With some distance, and also based on some of the comments, I think there is room for improvement:
For people who like guided meditations: there's a small YouTube channel providing a bunch of secular AI-generated guided meditations of various lengths and topics. More are to come, and the creator (whom I know) is happy about suggestions. Three examples:
They are also available in podcast form here.
I wouldn't say these meditations are necessarily better or worse than any others, but they're free and provide some variety. Personally, I avoid apps like Waking Up and Headspace due to both their imho outrageou...
I'm a bit torn regarding the "predicting how others react to what you say or do, and adjust accordingly" part. On the one hand this is very normal and human and makes sense. It's kind of predictive empathy in a way. On the other hand, thinking so very explicitly about it and trying to steer your behavior in a way so as to get the desired reaction out of another person also feels a bit manipulative and inauthentic. If I knew another person would think that way and plan exactly how they interacted with me, I would find that quite off-putting. But maybe the solution is just "don't overdo it", and/or "only use it in ways the other person would likely consent to" (such as avoiding to accidentally say something hurtful).
My take on this is that patching the more "obvious" types of jailbreaking and obfuscation already makes a difference and is probably worth it (as long as it comes at no notable cost to the general usefulness of the system). Sure, some people will put in the effort to find other ways, but the harder it is, and the fewer little moments of success you have when first trying it, the fewer people will get into it. Of course one could argue that the worst outcomes come from the most highly motivated bad actors, and they surely won't be deterred by such measures....
Some other comments already discussed the issue that often neither A nor B are necessarily correct. I'd like to add that there are many cases where the truth, if existent in any meaningful way, depends on many hidden variables, and hence A may be true in some circumstances, and B in some other circumstances, and it's a mistake to look for "the one static answer". Of course the question "when are A or B correct?" / "What does it depend on?" are similarly hard questions. But it's possible that this different framing can already help, as inquiring why the two sides believe what they believe can sometimes uncover these hidden variables, and it becomes apparent that the two sides' "why"s are not always opposite sides of a single axis.
An argument against may be that for some people there's probably a risk of getting addicted / losing control. I'm not familiar with to what degree it's possible to predict such tendencies in advance, but for some people that risk may well outweigh any benefits of arbitrate opportunities or improvements to their calibration.
Note from the future: I asked a bunch of LLMs for Terry Pratchett quotes on the human stomach, and while there's no guarantee any of them are actual non-hallucinated quotes (in different conversations I got many different ones while no single one came up twice), I think they're all pretty good:
"All he knew was that his stomach had just started investigating some of the more revolutionary options available to it."
"The stomach is smarter than the mind, which is why it likes to make all the important decisions."
"His stomach was making the kind of noises that ...
There’s a nice board + online game called Codenames. The basic idea is: you have two teams, each team split into two roles, the spymaster and the operatives. All players see an array of 25 cards with a single word on each of them. Everybody sees the words, but only the spymasters see the color of these cards. They can be blue or red, for the two teams, or white for neutral. The teams then take turns. Each time, the spymaster tries to come up with a single freely chosen word that would then allow their operati...
Without having thought much about it, I would think that it's a) pretty addictive and b) "scales well". Many forms of consumption have some ~natural limit, e.g. you can only eat so much food, going to the movies or concerts or whatever takes some energy and you probably wouldn't want to do this every day. Even addictive activities like smoking tend to have at least somewhat of a cap on how much you spend on it. Whereas gambling (which sports betting probably basically is to many people) potentially can just eat up all your savings if you let it.
So it would...
While much of this can surely happen to varying degrees, I think an important aspect in music is also recognition (listening to the same great song you know and like many times with some anticipation), as well as sharing your appreciation of certain songs with others. E.g. when hosting parties, I usually try to create a playlist where for each guest there are a few songs in there that they will recognize and be happy to hear, because it has some connection to both of us. Similarly, couples often have this meme of "this is our song!", which throws them back...
I once had kind of the opposite experience: I was at a friend's place, and we watched the recording of a System of a Down concert from a festival that we both had considered attending but didn't. I thought it was terrific and was quite disappointed not to have attended in person. He however got to the conclusion that the whole thing was so full of flaws that he was glad he hadn't wasted money on a ticket.
Just like you, I was baffled, and to be honest just kind of assumed he was just trying to signal his high standards or something but surely didn't a...
I appreciate your perspective, and I would agree there's something to it. I would at first vaguely claim that it depends a lot on the individual situation whether it's wise to be wary of people's insecurities and go out of one's way to not do any harm, or to challenge (or just ignore) these insecurities instead. One thing I've mentioned in the post is the situation of a community builder interacting with new people, e.g. during EA or lesswrong meetups. For such scenarios I would still defend the view that it's a good choice to be very careful not to throw ...
Thanks for sharing your thoughts and experience, and that first link indeed goes exactly in the direction I was thinking.
I think in hindsight I would adjust the tone of my post a bit away from "we're generally bad at thinking in 3D" and more towards "this is a particular skill that many people probably don't have as you can get through the vast majority of life without it", or something like that. I mostly find this distinction between "pseudo 3D" (as in us interacting mostly with surfaces that happen to be placed in a 3D environment, but very rarely, if ever, with actual volumes) and "real 3D" interesting, as it's probably rather easy to overlook.
I find your first point particularly interesting - I always thought that weights are quite hard to estimate and intuit. I mean of course it's quite doable to roughly assess whether one would be able to, say, carry an object or not. But when somebody shows me a random object and I'm supposed to guess the weight, I'm easily off by a factor of 2+, which is much different from e.g. distances (and rather in line with areas and volumes).
Indeed! I think I remember having read that a while ago. A different phrasing I like to use is "Do you have a favorite movie?", because many people actually do and then are happy to share it, and if they don't, they naturally fall back on something like "No, but I recently watched X and it was great" or so.
I would add 3) at the start of an event, everyone is asked to state their hopes and expectations about the event. While it's certainly useful to reflect on these things, I (embarassingly?) often in such situations don't even have any concrete hopes or expectations and am rather in "let's see what happens" mode. I still think it's fair to ask this question, as it can provide very benefitial feedback for the organizer, but they should at least be aware that a) this can be quite stressful for some participants, and b) many of the responses may be "made up" on...
I think it's a fair point. To maybe clarify a bit though, while potentially strawmanning your point a bit, my intention with the post was not so much to claim "the solution to all social problems is that sufficiently-assertive people should understand the weaknesses of insufficiently-assertive people and make sure to behave in ways that don't cause them any discomfort", but rather I wanted to try to shed some light on situations that for a long time I found confusing and frustrating, without being fully aware of what caused that perceived friction. So I ce...
I would expect that they fare much better with a text representation. I'm not too familiar with how multimodality works exactly, but kind of assume that "vision" works very differently from our intuitive understanding of it. When we are asked such a question, we look at the image and start scanning it with the problem in mind. Whereas transformers seem like they just have some rather vague "conceptual summary" of the image available, with many details, but maybe not all for any possible question, and then have to work with that very limited representation....
Maybe I accidentally overpromised here :D this code is just an expression, namely 1.0000000001 ** 175000000000
, which, as wolframalpha agrees, yields 3.98e7.
One crucial question in understanding and predicting the learning process, and ultimately the behavior, of modern neural networks, is that of the shape of their loss landscapes. What does this extremely high dimensional landscape look like? Does training generally tend to find minima? Do minima even exist? Is it predictable what type of minima (or regions of lower loss) are found during training? What role does initial randomization play? Are there specific types of basins in the landscape that are qualitatively different from others, that we might care ab...
I'd like to point out that for neural networks, isolated critical points (whether minima, maxima, or saddle points) basically do not exist. Instead, it's valleys and ridges all the way down. So the word "basin" (which suggests the geometry is parabolic) is misleading.
Because critical points are non-isolated, there are more important kinds of "flatness" than having small second derivatives. Neural networks have degenerate loss landscapes: their Hessians have zero-valued eigenvalues, which means there are directions you can walk along that don't change...
That seems like a rather uncharitable take. Even if you're mad at the company, would you (at least (~falsely) assuming this all may indeed be standard practice and not as scandalous as it turned out to be) really be willing to pay millions of dollars for the right to e.g. say more critical things on Twitter, that in most cases extremely few people will even care about? I'm not sure if greed is the best framing here.
(Of course the situation is a bit different for AI safety researchers in particular, but even then, there's not that much actual AI (safety) related intel that even Daniel was able to share that the world really needs to know about; most of the criticism OpenAI is dealing with now is on this meta NDA/equity level)
I would assume ChatGPT gets much better at answering such questions if you add to the initial prompt (or system prompt) to eg think carefully before answering. Which makes me wonder whether "ChatGPT is (not) intelligent" even is a meaningful statement at all, given how vastly different personalities (and intelligences) it can emulate, based on context/prompting alone. Probably a somewhat more meaningful question would be what the "maximum intelligence" is that ChatGPT can emulate, which can be very different from its standard form.
Thanks for the post, I find this unique style really refreshing.
I would add to it that there's even an "alignment problem" on the individual level. A single human in different circumstances and at different times can have quite different, sometimes incompatible values, preferences and priorities. And even at any given moment their values may be internally inconsistent and contradictory. So this problem exists on many different levels. We haven't "solved ethics", humanity disagrees about everything, even individual humans disagree with themselves, and now we're suddenly racing towards a point where we need to give AI a definite idea of what is good & acceptable.
Aren't LLMs already capable of two very different kinds of search? Firstly, their whole deal is predicting the next token - which is a kind of search. They're evaluation all the tokens at every step, and in the end choose the most probable seeming one. Secondly, across-token search when prompted accordingly. Say "Please come up with 10 options for X, then rate them all according to Y, and select the best option" is something that current LLMs can perform very reliably - whether or not "within token search" exists as well. But then again, one might of cours...
Great post! Two thoughts that came to mind while reading it:
Fair point. Maybe if I knew you personally I would take you to be the kind of person that doesn't need such careful communication, and hence I would not act in that way. But even besides that, one could make the point that your wondering about my communication style is still a better outcome than somebody else being put into an uncomfortable situation against their will.
I should also note I generally have less confidence in my proposed mitigation strategies than in the phenomena themselves.
Thanks for the example! It reminds me of how I once was a very active Duolingo user, but then they published some update that changed the color scheme. Suddenly the duolingo interface was brighter and lower contrast, which just gave me a headache. At that point I basically instantly stopped using the app, as I found no setting to change it back to higher contrast. It's not quite the same of course, but probably also something that would be surprising to some product designers -- "if people want to learn a language, surely something so banal as a brightening up the font color a bit would not make them stop using our app".
Another operationalization for the mental model behind this post: let's assume we have two people, Zero-Zoe and Nonzero-Nadia. They are employed by two big sports clubs and are responsible for the living and training conditions of the athletes. Zero-Zoe strictly follows study results that had significant results (and no failed replications) in her decisions. Nonzero-Nadia lets herself be informed by studies in a similar manner, but also takes priors into account for decisions that have little scientific backing, following a "causality is everywhere and eff...
You're right of course - in the quoted part I link to the wikipedia article for "almost surely" (as the analogous opposite case of "almost 0"), so yes indeed it can happen that the effect is actually 0, but this is so extremely rare on a continuum of numbers that it doesn't make much sense to highlight that particular hypothesis.
For many such questions it's indeed impossible to say. But I think there are also many, particularly the types of questions we often tend to ask as humans, where you have reasons to assume that the causal connections collectively point in one direction, even if you can't measure it.
Let's take the question whether improving air quality at someone's home improves their recovery time after exercise. I'd say that this is very likely. But I'd also be a bit surprised if studies were able to show such an effect, because it's probably small, and it's probably hard...
A basic operationalization of "causality is everywhere" is "if we ran an RCT on some effect with sufficiently many subjects, we'd always reach statistical significance" - which is an empirical claim that I think is true in "almost" all cases. Even for "if I clap today, will it change the temperature in Tokyo tomorrow?". I think I get what you mean by "if causality is everywhere, it is nowhere" (similar to "a theory that can explain everything has no predictive power"), but my "causality is everyhwere" claim is an at least in theory verifiable/falsifiable f...
Indeed, I fully agree with this. Yet when deciding that something is so small that it's not relevant, it's (in my view anyway) important to be mindful of that, and to be transparent about your "relevance threshold", as other people may disagree about it.
Personally I think it's perfectly fine for people to consciously say "the effect size of this is likely so close to 0 we can ignore it" rather than "there is no effect", because the former may well be completely true, while the latter hints at a level of ignorance that leaves the door for conceptual mistakes wide open.
This makes me wonder, how could an AI figure out whether it had conscious experience? I always used to assume that from first person perspective it's clear when you're conscious. But this is kind of circular reasoning as it assumes you have a "perspective" and are able to ponder the question. Now what does a, say, reasoning model do? If there is consciousness, how will it ever know? Does it have to solve the "easy" problem of consciousness first and apply the answer to itself?