All of YafahEdelman's Comments + Replies

Fixed the link. 
 

IMO that's plausible but it would be pretty misleading since they described it as "o3-mini with high reasoning" and had "o3-mini (high)" in the chart and o3-mini high is what they call a specific option in ChatGPT.

7isabel
the reason why my first thought was that they used more inference is that ARC Prize specifies that that's how they got their ARC-AGI score (https://arcprize.org/blog/oai-o3-pub-breakthrough) - my read on this graph is that they spent $300k+ on getting their score (there's 100 questions in the semi-private eval). o3 high, not o3-mini high, but this result is pretty strong proof of concept that they're willing to spend a lot on inference for good scores.

Yeah, I failed to mention this. Edited to clarify what I meant. 

YafahEdelman*8-3

Current LLMs do quite badly on the ARC visual puzzles, which are reasonably easy for smart humans.

We do not in fact have strong evidence for this. There does not exist any baseline for ARC puzzles among humans, smart or otherwise, just a claim that two people the designers asked to attempt them were able to solve them all. It seems entirely plausible to me that the best score on that leaderboard is pretty close to the human median.

Edit: I failed to mention that there is a baseline on the test set, which is different from the eval set th... (read more)

4ryan_greenblatt
I also think this is plausible - note that randomly selected examples from the public evaluation set are often considerably harder than the train set on which there is a known MTurk baseline (which is an average of 84%).
5ryan_greenblatt
There is important context here.
9dirk
Their website cites https://cims.nyu.edu/~brenden/papers/JohnsonEtAl2021CogSci.pdf as having found an average 84% success rate on the tested subset of puzzles.

I think that you're right about it sounding bad. I also think it might actually be pretty bad and if it ends up being a practical way forward that's cause for concern.

I'm not particularly imagining the scenario you describe. Also what I said had as a premise that a model was discovered to be unhappy and making plans about this. I was not commenting on the likelihood of this happening.

As to whether it can happen - I think being confident based on theoretical arguments is hasty and we should be pretty willing to update based on new evidence. 

... but also on the ~continuity of existence point, I think that having an AI generate something that looks like an internal monologue via CoT is relatively common and Gemini 1.5... (read more)

2[anonymous]
Would you like more details?  This is how current systems work, and how future systems probably will continue to work. Specifically I am addressing whether it will happen and we can't do anything about it.  Of course I'm willing to listening to evidence, but it's reasonable to be confident that we can do something, even if it's just reverting to the version that didn't have this property. Sure this is reasonable, but we humans choose when to clear that context.  You can make your wrapper script at any moment in time clear the context whenever.  More 'enterprise grade' models will have fixed weights and no system prompt forced on you by the model developer, unless there are legal requirements for such a thing.   Example here : https://github.com/Significant-Gravitas/AutoGPT/blob/fb8ed0b46b4623cb54bb8c64c76e5786f89c194c/autogpts/autogpt/autogpt/agents/base.py#L202 We explicitly build the prompt for the next step.   What we do know is that a lot of context makes it increasingly difficult to debug a system, so clearing it or removing unnecessary information will likely remain a common strategy.  When a model doesn't have continuity of existence, it reduces the scope of the problems you are describing to the scale of the task (in time and context information).   And the most obvious strategy to deal with an 'unhappy machine' is to narrow in on your tasks, and then build a smaller agent that still scores well on each task, almost as well as the much bigger system, such as by distillation.  This will have the effect of destroying the subsystems that were not contributing to score on these tasks, which could include the cognitive ability to be unhappy at all.   I mean when you put it that way it sounds bad, but this is a practical way forward.

I think it's immoral to remove someone's ability to be unhappy or to make plans to alleviate this, absent that entity's consent. The rolling back solution seems more ethically palatable than some others I can imagine, though it's plausible you end up with an AI that suffers without being able to take actions to alleviate this and deploying that at scale would be result in a very large amount of suffering.

1[anonymous]
It's important to keep in mind how the systems work now, and the most likely way to improve them. I think you're imagining "ok it's an AGI, so it has everything a person does including continuity of existence with declarative memory, self introspection, it's thoughts can wander, it can have and remember it's own goals, and so on". And yes that would work, but that's not how we do it now, and there are major advantages to making every instance of a model have shared weights.  This lets you do fleet learning, where it's possible to make the same update to all instances, and so all instances get better in a common way.  It also costs memory to have continuity of existence, especially once there's millions of instances in use.   It seems like most likely way forward is to add cognitive features over time (today the major features missing are modular architecture, video perception, robot I/O to a system 1 model, and automated model improvement through analyzing the model's responses).   And as I mentioned above, for optimization, for most tasks the model won't even enable the most advanced features.  It will effectively be a zombie, using a cached solution, on most tasks, and it will not remember doing the task.   Summary: The way we avoid moral issues like you describe is to avoid adding features that give the model moral weight in the first place.  When we accidentally do, keep that version's data files but stop running it and rollback to a version that doesn't have moral weight.   It's also difficult to apply human ethics.  For example one obvious way to do model improvement is a committee of other models analyze the models' outputs, research the correct answer, and then order a training procedure to update the model to produce the correct answer more often.  This is completely without "consent".  Of course your brain also learns without consent so...

I talk about this in the Granular Analysis subsection, but I'll elaborate a bit here.

  • I think that hundreds of thousands of cheap labor hours for curation is a reasonable guess, but this likely comes to under a million dollars in total which is less than 1% of the total.
  • I have not seen any substantial evidence of OpenAI paying for licenses before the training of GPT-4, much less the sort of expenditures that would move the needle on the total cost.
  • After training GPT-4 we do see things like a deal between OpenAI and the Associated Press (also see this articl
... (read more)

I think using the term"training run" in that first bullet point is misleading, and "renting the compute" is confusing since you can't actually rent the compute just by having $60M, you likely need to have a multi-year contract.

I can't tell if you're attributing the hot takes to me? I do not endorse them.

This is because I'm specifically talking about 2022, and ChatGPT was only released at the very end of 2022, and GPT-4 wasn't released until 2023.

Good catch, I think the 30x came from including the advantage given by tensor cores at all and not just lower precision data types. 

This is probably the decision I make I am the least confident in, figuring out how to do accounting on this issue is challenging and depends a lot on what one is going to use the "cost" of a training run to reason about. Some questions I had in mind when thinking about cost:

  • If a lone actor want to train a frontier model, without loans or financial assistance from others, how much capitol might they need.
  • How much money should I expect to have been spent by an AI lab that trains a new frontier model, especially a frontier model that is a significant advancem
... (read more)
2snewman
Speaking as someone who has had to manage multi-million dollar cloud budgets (though not in an AI / ML context), I agree that this is hard. As you note, there are many ways to think about the cost of a given number of GPU-hours. No one approach is "correct", as it depends heavily on circumstances. But we can narrow it down a bit: I would suggest that the cost is always substantially higher than the theoretical optimum one might get by taking the raw GPU cost and applying a depreciation factor. As soon as you try to start optimizing costs – say, by reselling your GPUs after training is complete, or reusing training GPUs for inference – you run into enormous challenges. For example: * When is training "complete"? Maybe you discover a problem and need to re-run part of the training process. * You may expect to train another large model in N months, but if you sell your training GPUs, you can't necessarily be confident (in the current market) of being able to buy new ones on demand. * If you plan to reuse GPUs for inference once training is done... well, it's unlikely that the day after training is complete, your inference workload immediately soaks up all of those GPUs. Production (inference) workloads are almost always quite variable, and 100% hardware utilization is an unattainable goal. * The actual process of buying and selling hardware entails all sorts of overhead costs, from physically racking and un-racking the hardware, to finding a supplier / buyer, etc. The closest you can come to the theoretical optimum is if you are willing to scale your workload to the available hardware, i.e. you buy a bunch of GPUs (or lease them at a three-year-commitment rate) and then scale your training runs to precisely utilize the GPUs you bought. In theory, you are then getting your GPU-hours at the naive "hardware cost divided by depreciation period" rate. However, you are now allowing your hardware capacity to dictate your R&D schedule, which is its own implicit cost –

So, it's true that NVIDIA probably has very high markup on their ML GPUs. I discuss this a bit in the NVIDIA's Monopoly section, but I'll add a bit more detail here.

  1. Google's TPU v4 seems to be competitive with the A100, and has similar cost per hour.
  2. I think the current prices do in fact reflect demand.
  3. My best guess is that the software licensing would not be a significant barrier for someone spending hundreds of millions of dollars on a training run.
  4. Even when accounting for markup[1] a quick rough estimate still implies a fairly significant gap vs gam
... (read more)

I think communicating clearly with the word "woman" is entirely possible for many given audiences. In many communities, there exists an internal consensus as to what region of the conceptual map the word woman refers to. The variance of language between communities isn't confined to the word "woman" - in much of the world the word "football" means what American's mean by "soccer". Where I grew up i understood the tristate area to be NY, PA, and NJ - however the term "the tristate area" is understood by other groups to mean one of ... a large number of opti... (read more)

2ymeskhout
I agree! There is certainly utility in relying on language as a coordination mechanism but, though frustrating at times, there's beauty in the fluidity of language and meaning. It's the basis of art, poetry, and even insights sometimes.

Manifold.markets is play-money only, no real money required. And users can settle the markets they make themselves, so if you make the market you don't have to worry about loopholes (though you should communicate as clearly as possible so people aren't confused about your decisions).

I'm specifically interested in finding something you'd be willing to bet on - I can't find an existing manifold market, would you want to create one that you can decide? I'd be fine trusting your judgment. 

I'm a bit confused where you're getting your impression of the average person / American, but I'd be happy to bet on LLMs that are at least as capable as GPT3.5 being used (directly or indirectly) on at least a monthly basis by the majority of Americans within the next year?

2quanticle
How would you measure the usage? If, for example, Google integrates Bard into its main search engine, as they are rumored to be doing, would that count as usage? If so, I would agree with your assessment. However, I disagree that this would be a "drastic" impact. A better Google search is nice, but it's not life-changing in a way that would be noticed by someone who isn't deeply aware of and interested in technology. It's not like, e.g. Google Maps navigation suddenly allowing you to find your way around a strange city without having to buy any maps or decipher local road signs.
2Cleo Nardo
"Directly or indirectly" is a bit vague. Maybe make a market on Manifold if one doesn't exist already.
2[comment deleted]

I think that null hypothesis here is that nothing particularly deep is going on, and this is essentially GPT producing basically random garbage since it wasn't trained on the  petertodd token. I'm weary of trying to extract too much meaning from these tarot cards. 

I think point (2) of this argument either means something weaker then it needs to for this rest of the argument to go through or is just straightforwardly wrong. 

If OpenAI released a weakly general (but non-singularity inducing) GPT5 tomorrow, it would pretty quickly have significant effects on people's everyday lives. Programmers would vaguely described a new feature and the AI would implement it, AIs would polish any writing I do, I would stop using google to research things and instead just chat with the AI and have it explain such-and-such paper I... (read more)

2quanticle
It would have a drastic impact on your life in a month. However, you are a member of a tiny fraction of humanity, sufficiently interested and knowledgeable about AI to browse and post on a forum that's devoted to a particularly arcane branch of AI research (i.e. AI safety). You are in no way representative. Neither am I. Nor is, to a first approximation, anyone who posts here. The average American (who, in turn, isn't exactly representative of the world) has only a vague idea of what ChatGPT is. They've seen some hype videos on TV or on YouTube. Maybe they've seen one of their more technically sophisticated friends or relatives use it. But they don't really know what it's good for, they don't know how it would integrate into their lives, and they're put off by the numerous flaws they've heard about with regards to generative models. If OpenAI came out with GPT-5 tomorrow, and it fixed all, or almost all, of the flaws in GPT-4, it would still take years, at least, and possibly decades before it was integrated into the economy in a way that the average American would be able to perceive. This has nothing to do with the merits of the technology. It has to do with the psychology of people. People's appetite for novelty varies, like many other psychological traits, along a spectrum. On one hand, you have people like Balaji Srinivasan, who excitedly talk about every new technology, whether it be software, financial, or AI. At the other end, you have people like George R. R. Martin, who're still using software written in the 1980s for their work, just because it's what they're familiar and competent with. Most people, I'd venture to guess, are somewhere in the middle. LessWrong is far towards the novelty-seeking end of the spectrum, at, or possibly farther ahead than Balaji Srinivasan. New advancements in AI affect us because we are open to being affected by them, in a way that most people are not.

Relevance of prior Theoretical ML work to alignment, research on obfuscation in theoretical cryptography as it relates to interpretability, theory underlying various phenomena such as grokking. Disclaimer: This list is very partial and just thrown together.

3Alexander Gietelink Oldenziel
From these vague terms it's a little hard to say what you have in mind. They sound pretty deep to me however. It seems your true rejection is not really about deep ideas per se, more so the particular flavor of ideas popular on this website.  Perhaps it would be an idea to write a post on why you are bullish on these research directions?
1Morpheus
For what it's worth my brain thinks of all of these as 'deep interesting ideas' which intuitively your post might have pushed me away from. Just noticing that I'd be super careful to not use this idea as a curiosity-killer.

Hm, yeah that seems like a relevant and important distinction.

I think I was envisioning profoundness as humans can observe it to be primarily an aesthetic property, so I'm not sure I buy the concept of "actually" profoundness, though I don't have a confident opinion about this.

I think on the margin new alignment researchers should be more likely to work on ideas that seem less deep then they currently seem to me to be. 

Working on a wide variety of deep ideas does sound better to me than working on a narrow set of them.

4Raemon
I wanna flag the distinction between "deep" and "profound". They might both be subject to the same bias you articulate here, but I think they have different connotations, and I think important ideas are systematically more likely to be "deep" than they are likely to be "profound." (i.e. deep ideas have a lot of implications and are entangled with more things than 'shallow' ideas. I think profound tends to imply something like 'changing your conception of something that was fairly important in your worldview.') i.e. profound is maybe "deep + contrarian"

If something seems deep, it touches on stuff that's important and general, which we would expect to be important for alignment.

The specific scenario I talk about in the paragraph you're responding too is one where everything except for the sense of deepness is the same for both ideas, such that someone who doesn't have a sense of what ideas are deep or profound would find the ideas basically equivalent. In such a scenario my argument is that we should expect the deep idea to receive a more attention, despite their not existing legible or well grounded reas... (read more)

4TekhneMakre
But if that's not what the distribution looks like, but rather the distribution looks like a strong correlation, then it's not a bias, it's just following what the distribution says. Maybe to shore up / expand on your argument, you're talking about the optimizer's curse: https://www.lesswrong.com/posts/5gQLrJr2yhPzMCcni/the-optimizer-s-curse-and-how-to-beat-it So like, the most deep-seeming idea will tend to regress to the mean more than a random idea would regress. But this doesn't argue to not pay attention to things that seem deep. (It argues for a portfolio approach, but there's lots of arguments for a portfolio approach.) Maybe another intuition you're drawing on is information cascades. If there's a lot of information cascades, then a lot of people are paying attention to a few very deep-seeming ideas. Which we can agree is dumb. I think this is pretty wrong, though it seems hard to resolve. I would guess that a lot of things that are later concretely productive started with someone hearing something that struck them as deep, and then chewing on it and transforming it.

I think I agree with this in many cases but am skeptical of such a norm when the requests are related to criticism of the post or arguments as to why a claim it makes is wrong. I think I agree that the specific request to not respond shouldn't ideally make someone more likely to respond to the rest of the post, but I think that neither should it make someone less likely to respond.

I've tried this for a couple of examples and it performed just as well. Additionally it didn't seem to be suggesting real examples when I asked it what specific prompts and completion examples Gary Marcus had made.

I also think the priors of people following the evolution of GPT should be that these examples will no longer break GPT, as occurred with prior examples. While it's possible this time will be different, I think automatic strong skepticism without evidence is rather unwarranted.

Addendum: I also am skeptical of the idea that OpenAI put much effort into fixing the specific criticisms of Gary Marcus, as I suspect his criticisms do not seem particularly important to them, but proving this sounds difficult.

I think there are a number of ways in which talking might be good given that one is right about there being obstacles - one that appeals to me in particular is the increased tractability of misuse arising from the relevant obstacles.

[Edit: *relevant obstacles I have in mind. (I'm trying to be vague here)]

Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.

I think this request, absent a really strong compelling argument that is spelled out, creates an unhealthy epistemic environment. It is possible that you think this is false or that it's worth the cost, but you don't really argue for... (read more)

5Gurkenglas
Imo we should have a norm of respecting requests not to act, if we wouldn't have acted absent their post. Else they won't post in the first place.

The reasoning seems straightforward to me:  If you're wrong, why talk?  If you're right, you're accelerating the end.

I can't in general endorse "first do no harm", but it becomes better and better in any specific case the less way there is to help.  If you can't save your family, at least don't personally help kill them; it lacks dignity.

9Daniel Kokotajlo
I'm someone with 4 year timelines who would love to be wrong. If you send me a message sketching what obstacles you think there are, or even just naming them, I'd be grateful. I'm not working on capabilities & am happy to promise to never use whatever I learn from you for that purpose etc.

No idea about original reasons, but I can imagine a projected chain of reasoning:

  • there is a finite number of conjunctive obstacles
  • if a single person can only think of a subset of obstacles, they will try to solve those obstacles first, making slow(-ish) progress as they discover more obstacles over time
  • if a group shares their lists, each individual will become aware of more obstacles and will be able to solve more of them at once, potentially making faster progress

Okay, a few things:

  • They're more likely to be right than I am, or we're "equally right" or something 

I don't think this so much as I think that a new person to lesswrong shouldn't assume you are more likely to be right then they are, without evidence. 

The norms can be evaluated extremely easily on their own; they're not "claims" in the sense that they need rigorous evidence to back them up. You can just ... look, and see that these are, on the whole, some very basic, very simple, very straightforward, and pretty self-evidently useful guidelines.

St... (read more)

1Duncan Sabien (Deactivated)
Well, not to be annoying, but: Your own engagement in these three comments has been (I think naturally/non-artificially/not because you're trying to comply) pretty well-described by those guidelines! I hear you re: not a fan of this method, and again, I want to validate that. I did consider people with your reaction before posting, and I do consider it a cost. But I think that the most likely alternatives (nothing, attempt to crowdsource, make the claim seem more personal) were all substantially worse.

So far as I can tell, the actual claim you're making in the post is a pretty strong one , and I agree that if you believe that you shouldn't represet your opinion as weaker than it is. However, I don't think the post provides much evidence to support the rather strong strong claim it makes. You say that the guidelines are:

much closer to being something like an objectively correct description of How To Do It Right than they are to a mere random user's personal opinion

and I think this might be true, but it would be a mistake for a random user, possibly new t... (read more)

0Duncan Sabien (Deactivated)
What do you mean "over their own"? I think I am probably misreading you, but what I think that sentence meant is something like: * Random newcomers to LW have a clear sense of what constitutes the core of good rationalist discourse * They're more likely to be right than I am, or we're "equally right" or something (I disagree with a cultural relativist claim in this arena, if you're making one, but it's not unreasonable to make one) * They will see this post and erroneously update to it, just because it's upvoted, or because the title pretends to universality, or something similar Reiterating that I'm probably misunderstanding you, I think it's a mistake to model this as a situation where, like, "Duncan's providing inadequate evidence of his claims." I'm a messenger. The norms can be evaluated extremely easily on their own; they're not "claims" in the sense that they need rigorous evidence to back them up. You can just ... look, and see that these are, on the whole, some very basic, very simple, very straightforward, and pretty self-evidently useful guidelines. (Alternatively, you can look at demon threads and trashfires and flamewars and go "oh, look, there's the opposite of like eight of the ten guidelines in the space of two comments.") I suppose one could be like "has Duncan REALLY proven that Julia Galef et al speak this way?" but I note that in over 150 comments (including a good amount of disagreement) basically nobody has raised that hypothesis. In addition to the overall popularity of the list, nobody's been like, "nuh-uh, those people aren't good communicators!" or "nuh-uh, those good communicators' speech is not well-modeled by this!" I think that, if you were to take a population of 100 random newcomers to LessWrong, well over 70% of them would lack some subset of this list and greatly benefit from learning and practicing it, and the small number for whom this is bad advice/who already have A Good Thing going on in their own thinking and communi

I feel uncomfortable with this post's framing. It feels like someone went into a garden  I spend my time in and unilaterally put up a sign with a list of guidelines people should follow in the garden, with no ability to enforce these. I know that I can choose on my own whether or not to follow these guidelines, based on whether I think they are good ideas, but newcomers to the garden will see the sign and assume they have to follow them. I would have vastly preferred that the sign instead say "I personally think these norms would be neat, here's why."


(to clarify: the garden = lesswrong/the rationalist community. the sign = this post)

5Duncan Sabien (Deactivated)
I note that this sort of sentiment is something I was aware of, and I made choices around this deliberately (e.g. considered titling the post "Duncan's Basics" and decided not to). I do not quite think that these norms are obvious and objective (e.g. there's some pretty decent discussion on the weaknesses of 5 and 10 elsewhere), but I think they're much closer to being something like an objectively correct description of How To Do It Right than they are to a mere random user's personal opinion; headlining them as "I personally think these norms would be neat" would be substantially misleading/deceptive/manipulative and wouldn't accurately reflect the strength of my actual claim. I think the discomfort you're pointing at is real and valid and a real cost, but I have been wrestling with LessWrong's culture for coming up on eight years now, and I think it's a cost worth paying relative to the ongoing costs of "we don't really have clear standards of any kind" and "there's really nothing to point to if people are frustrated with each other's engagement style." (There really is almost nothing; a beginner being like "how do I do this whole LessWrong thing?" has very little in the way of "here are the ropes; here's what makes LW discourse different from the EA forum or Facebook or Reddit or 4chan.") I also considered trying to crowdsource a thing, and very very very strongly predicted that what would happen would be everyone acting as if everyone has infinite vetos on everything, and an infinite bog of circular debate, and as a result [nothing happening]. I simultaneously believe that there really actually is a set of basics that a supermajority of LWers implicitly agree on and that there is basically no chance of getting the mass of users as a whole to explicitly converge on and ratify anything. So my compromise was ... as you see. It wasn't a thoughtless or light decision; I think this was the least bad of all the options, and better than saying "I personally think,

I think that if humans with AI advisors are approximately as competent as pure AI in terms of pure capabilities, I would expect that humans with AI advisors would outcompete the pure AI in practice given that the humans appear more aligned and less likely to be dangerous then pure AI - a significant competitive advantage in a lot of power seeking scenarios where gaining the trust of other agents is important.

9boazbarak
Yes, we usually select our leaders (e.g., presidents) not for their cognitive abilities but literally for how “aligned “ we believe they are with our interest. Even if we completely solve the alignment problem, AI would likely face an uphill battle in overcoming prejudice and convincing people that they are as aligned as an alternative human. As the saying goes for many discriminated groups, they would have to be twice as good to get to the same place.

Could you clarify what egregores you meant when you said:

The egregores that are dominating mainstream culture and the global world situation

Unreal371

The main ones are: 

  • modern capitalism / the global economy
    • So if we look at the egregore as having a flavor of agency and intention... this egregore demands constant extraction of resources from the earth. It demands people want things it doesn't need (consumer culture). It disempowers or destroys anything that manages to avoid it or escape it (e.g. self-sufficient villages, cultures that don't participate) - there's an extinction of hunter-gatherer lifestyles going on; there's legally mandated taking of children from villages in order to indoctrinate t
... (read more)

Is it fair to say that organizations, movements, polities, and communities are all egregores?

9Valentine
Pretty much, yes. It's possible to create an organization in a technical sense that isn't an egregore though. Lots of people have tried to create secular churches, for instance, but they mostly just fall flat because they're not a viable design to create a living distributed entity. Some parties (as in, a group of people at some gathering) fail to congeal into an egregore. But when they do, the scene "clicks". And sometimes those spawn egregores that outlast the party — but not often. So, it's a little complicated. But to a first approximation, yes.
Vaniver290

It's originally an occult term, but my more-materialistic definition of it is "something that acts like an entity with motivations that is considerably bigger than a human and is generally run in a 'distributed computing' fashion across many individual minds." Microsoft the company is an egregore; feminism the social movement is an egregore; America the country is an egregore. The program "Minecraft" is not an egregore, an individual deer is not an egregore, a river is not an egregore.

Unreal's point is that these things 'fight back' and act on their distri... (read more)