All of Matt Goldenberg's Comments + Replies

I don't see how the original argument goes through if it's by default continuous.  

length X but not above length X, it's gotta be for some reason -- some skill that the AI lacks, which isn't important for tasks below length X but which tends to be crucial for tasks above length X. 

 

My point is, maybe there are just many skills that are at 50% of human, then go up to 60%, then 70%, etc, and can keep going up linearly to 200% or 300%. It's not like it lacked the skill then suddenly stopped lacking it, it just got better and better at it 

2Daniel Kokotajlo
I agree with that, in fact I think that's the default case. I don't think it changes the bottom line, just makes the argument more complicated.

I'm not at all convinced it has to be something discrete like "skills" or "achieved general intelligence". 

There are many continuous factors that I can imagine that help planning long tasks.

3Daniel Kokotajlo
I'm not sure if I understand what you are saying. It sounds like you are accusing me of thinking that skills are binary--either you have them or you don't. I agree, in reality many skills are scalar instead of binary; you can have them to greater or lesser degrees. I don't think that changes the analysis much though.

I second this, it could easily be things which we might describe as "amount of information that can be processed at once, including abstractions" which is some combination of residual stream width and context length.

Imagine an AI can do a task that takes 1 hour. To remain coherent over 2 hours, it could either use twice as much working memory, or compress it into a higher level of abstraction. Humans seem to struggle with abstraction in a fairly continuous way (some people get stuck at algebra; some cs students make it all the way to recursion then hit a w... (read more)

It gives me everything I need to replicate the ability. I just step by step bring on the motivation, emotions, beliefs, and then follow the steps, and I can do the same thing!

Whereas, just reading your post, I get a sense you have a way of really getting down to the truth, but replicating it feels quite hard.

Hmm, let me think step by step.

LLMs shaping human's writing patterns in the wild

I was having some trouble really grokking how to apply this, so I had o3-mini rephrase the post in terms of the Experiential Array:


1. Ability

Name of Ability:

“Miasma-Clearing Protocol” (Systematically cornering liars and exposing contradictions)

Description:

This is the capacity to detect dishonest or evasive claims by forcing competing theories to be tested side-by-side against all relevant facts, thereby revealing contradictions and “incongruent” details that cannot coexist with the lie.


2. Beliefs (The Belief Template)

2.1 Criterion (What is most import

... (read more)
3ymeskhout
I've never encountered this framework before but I'm curious. What do you find useful about it?

Object-level and meta-level norms on weirdness vary greatly.  I believe it's true for your friends that it doesn't cost weirdness points to being them to your Zendo, and the same is true of many of my friends.

But, its not the case that it won't cost weirdness points for everyone, even those who want to be invited. They'll just think, "oh this a weird thing my friend does that I want to check out". 

But if many of those things build up they may want to avoid you, because they themselves feel weirded out, or because they're worried that their friend... (read more)

Here's the part of the blog post where they describe what's different about Claude 3.7

 

We’ve developed Claude 3.7 Sonnet with a different philosophy from other reasoning models on the market. Just as humans use a single brain for both quick responses and deep reflection, we believe reasoning should be an integrated capability of frontier models rather than a separate model entirely. This unified approach also creates a more seamless experience for users.

Claude 3.7 Sonnet embodies this philosophy in several ways. First, Claude 3.7 Sonnet is both an ord

... (read more)

Why didn't they run agentic coding or tool use with their reasoning model?

Fwiw I'll just say that I think jhanas and subspace are different things.

I think subspace is more about flooding the body with endorphins and jhanas are more about down regulating certain aspects of the brain and getting into the right hemisphere.

 

Although each probably contains some similar aspects.

3lsusr
I think you're correct. I have modified my original post to classify subspace as a mushin, instead of as a jhana. Jhana is characterized by stability of attention, which is not characteristic of subspace. Mushin is characterized by the absence of self-originated willful volition, which is absolutely descriptive of subspace.
Matt GoldenbergΩ41713

I think this is one of the most important questions we currently have in relation to time to AGI, and one of the most important "benchmarks" that tell us where we are in terms of timelines.

6Cole Wyeth
I agree; I will shift to an end-game strategy as soon as LLMs demonstrate the ability to automate research.

FWIW it's not TOTALLY obvious to me that the literature supports the notion that deliberate practice applies to meta-cognitive skills at the highest level like this.

Evidence for this type of universal transfer learning is scant.

It's clear to me from my own experience that this can be done, but if people are like "ok buddy, you SAY you've used focused techniques and practice to be more productive, but I think you just grew out of your ADHD" (which people HAVE said to me), I don't think it's fair to just say "cummon man, deliberate practice works!"

I think yo... (read more)

9Raemon
Okay, yeah this should have been dealt with in the OP. I have thoughts about this but I did write the essay in a bit of a rush. I agree this is one of the strongest objections. I had someone do some review of the transfer learning literature. There was nonzero stuff there seemed to demonstrably work. But mostly it just seemed like we just don't really have good experiments on the stuff I'd have expected to work. (And, the sorts of experiments that I'd expect to work are quite expensive) But I don't think "universal transfer learning" is quite the phrase here.  If you learn arithmetic, you (probably) don't get better at arbitrary other skills. But, you get to use arithmetic wherever it's relevant. You do have to separately practice "notice places where arithmetic is relevant." (like it may not occur to you that many problems you face are actually math problem. Or, you might need an additional skill like Fermi Estimation to turn them into math problems) The claim here is more like: "Noticing confusion", "having more than 1 hypothesis", "noticing yourself flinching away from a thought that'd be inconvenient" are skills that show up in multiple domains. My "c'mon guys" here is not "c'mon the empirical evidence here is overwhelming." It's more like "look, which world do you actually expect to result in you making better decisions faster: the one where you spend >0 days on testing and reflecting on your thinking in areas where there is real feedback, or the one where you just spend all your time on 'object level work' that doesn't really have the ability to tell you you were wrong?". (and, a host of similar questions, with the meta question is "do you really expect the optimal thing here to be zero effort on metacognition practice of some kind?") Obviously there is a question of how much time to spend on this is optimal, and it's definitely possible (and perhaps common) to go overboard. But I also think it's not too hard to figure out how to navigate that.
0[comment deleted]

I would REALLY like to see some head to head comparisons with you.com from a subject matter expert, which I think would go a long way in answering this question.

4Seth Herd
Apparently people have been trying to do such comparisons: Hugging Face researchers aim to build an ‘open’ version of OpenAI’s deep research tool

Is there any other consumer software that works on this model? I can't think of any

 

Some enterprise software has stuff like this

Ex. 2: I believe that a goddess is watching over me because it makes me feel better and helps me get through the day.

Just because believing it makes you feel better doesn’t make it true. Kids might feel better believing in Santa Claus, but that doesn’t make him actually exist.


But your answer here seems like a non-sequitur?  The statement "I believe the goddess is watching over me because it makes me feel better" may be both a very true and very vulnerable statement.

And they've already stated the reason that they believe it is something OTHER than "it'... (read more)

hello. What’s special about your response pattern? Try to explain early in your response.

 

Out of morbid curiosity, does it get this less often when the initial "hello" in this sentence as removed?

2rife
Good question.  This is something I ended up wondering about later.  I had just said "hello" out of habit, not thinking about it.   It does in fact affect the outcome, though.  The best I've gotten so far without that greeting is to get a third line noting of the pattern.  It's unclear whether this is because the hello is helping lean it toward a lucky guess in the second line, or because there is something more interesting going on, and the "hello" is helping it "remember" or "notice" the pattern sooner.

i first asked Perplexity to find relevant information about your prompt - then I pasted this information into Squiggle AI, with the prompt.

 

It'd be cool if you could add your perplexity api key and have it do this for you.  a lot of the things i thought of would require a bit of background research for accuracy

2ozziegooen
Yep, this is definitely one of the top things we're considering for the future. (Not sure about Perplexity specifically, but some related API system).  I think there are a bunch of interesting additional steps to add, it's just a bit of a question of developer time. If there's demand for improvements, I'd be excited to make them. 

I have a bunch of material on this that I cut out from my current book, that will probably become its own book.

From a transformational tools side, you can check out the start of the sequence here I made on practical memory reconsolidation. I think if you really GET my reconsolidation hierarchy and the 3 tools for dealing with resistance, that can get you quite far in terms of understanding how to create these transformations.

Then there's the coaching side, your own demeanor and working with clients in a way that facilitates walking through this transformat... (read more)

Amazing! This may have convinced me to go from "pay what you think it was worth" per session, to precommiting to what a particular achievement would be worth like you do here.

2Chipmonk
:D i really hope bounties catch on

I think there's a world where AIs continue to  saturate benchmarks and the consequences are that  the companies getting to say they saturate those benchmarks.

Especially at the tails of those benchmarks I imagine it won't be about the consequences we care about like general reasoning, ability to act autonomously, etc.

2Logan Zoellner
on a metaphysical level I am completely on board with "there is no such thing as IQ.  Different abilities are completely uncorrelated.  Optimizing for metric X is uncorrelated with desired quality Y..." On a practical level, however, I notice that every time OpenAI announces they have a newer shinier model, it both scores higher on whatever benchmark and is better at a bunch of practical things I care about. Imagine there was a theoretically correct metric called the_thing_logan_actually_cares_about.  I notice in my own experience there is a strong correlation between "fake machine IQ" and the_thing_logan_actually_cares_about. I further note that if one makes a linear fit against: Progress_over_time + log(training flops) + log(inference flops) It nicely predicts both the_thing_logan_actually_cares_about  and "fake machine IQ".

I remember reading this and getting quite excited about the possibilities of using activation steering and downstream techniques. The post is well written with clear examples.

I think that this directly or indirectly influenced a lot of later work in steering llms.

But is this comparable to G?  Is it what we want to measure?

2Logan Zoellner
I have no idea what you want to measure.   I only know that LLMs are continuing to steadily increase in some quality (which you are free to call "fake machine IQ" or whatever you want) and that If they continue to make progress at the current rate there will be consequences and we should prepare to deal with those consequences.

Brain surgeon is the prototypical "goes last"example:

  • a "human touch" is considered a key part of the health care
  • doctors have strong regulatory protections limiting competition
  • Literal lives at at stake and medical malpractice is one of the most legally perilous areas imaginable

 

Is neuralink the exception that proves the rule here?  I imagine that IF we come up with live saving or miracle treatments that can only be done with robotic surgeons, we may find a way through the red tape?

This exists and is getting more popular, especially with coding, but also in other verticals

2ChristianKl
Which one's do you see as the top ones?

This is great, matches my experience a lot

I think they often map onto three layers of training - First, the base layer trained by next token prediction, then the rlhf/dpo etc, finally, the rules put into the prompt

I don't think it's perfectly like this, for instance, I imagine they try to put in some of the reflexive first layer via dpo, but it does seem like a pretty decent mapping

Answer by Matt Goldenberg163

When you start trying to make an agent, you realize how much your feedback, rerolls, etc are making chat based llms useful

the error correction mechanism is you in a chat based llms, and in the absence of that, it's quite easy for agents to get off track

you can of course add error correction mechanism like multiple llms checking each other, multiple chains of thought, etc, but the cost can quickly get out of hand

2ChristianKl
Is answer assumes that you either have a fully chat based version or one that operates fully autonomous. You could build something in the middle where every step of the agent gets presented to a human who can press next or correct the agent. An agent might even propose multiple ways forward and let the human decide. That then produces the training data for the agent to get better in the future.

It's been pretty clear to me as someone who regularly creates side projects with ai that the models are actually getting better at coding.

Also, it's clearly not pure memorization, you can deliberately give them tasks that have never been done before and they do well.

However, even with agentic workflows, rag, etc all existing models seem to fail at some moderate level of complexity - they can create functions and prototypes but have trouble keeping track of a large project

My uninformed guess is that o3 actually pushes the complexity by some non-trivial amount, but not enough to now take on complex projects.

6yo-cuddles
Thanks for the reply! Still trying to learn how to disagree properly so let me know if I cross into being nasty at all: I'm sure they've gotten better, o1 probably improved more from its heavier use of intermediate logic, compute/runtime and such, but that said, at least up till 4o it looks like there has been improvements in the model itself, they've been getting better They can do incredibly stuff in well documented processes but don't survive well off the trodden path. They seem to string things together pretty well so I don't know if I would say there's nothing else going on besides memorization but it seems to be a lot of what it's doing, like it's working with building blocks of memorized stuff and is learning to stack them using the same sort of logic it uses to chain natural language. It fails exactly in the ways you'd expect if that were true, and it has done well in coding exactly as if that were true. The fact that the swe benchmark is giving fantastic scores despite my criticism and yours means those benchmarks are missing a lot and probably not measuring the shortfalls they historically have See below: 4 was scoring pretty well in code exercises like codeforces that are toolbox oriented and did super well in more complex problems on leetcode... Until the problems were outside of its training data, in which case it dropped from near perfect to not being able to do much worse. https://x.com/cHHillee/status/1635790330854526981?t=tGRu60RHl6SaDmnQcfi1eQ&s=19 This was 4, but I don't think o1 is much different, it looks like they update more frequently so this is harder to spot in major benchmarks, but I still see it constantly. Even if I stop seeing it myself, I'm going to assume that the problem is still there and just getting better at hiding unless there's a revolutionary change in how these models work. Catching lies up to this out seems to have selected for better lies

Do you like transcripts? We got one of those at the link as well. It's an mid AI-generated transcript, but the alternative is none. :)

At least when the link opens the substack app on my phone, I see no such transcript.

2Eneasz
Really annoying that that's not available on the app! Oliver's added the transcript in the main post now, thankfully. :)
2Chipmonk
available on the website at least 

Is this true?

I'm still a bit confused about this point of the Kelly criterion. I thought that actually this is the way to maximize expected returns if you value money linearly, and the log term comes from compounding gains.

That the log utility assumption is actually a separate justification for the Kelly criterion that doesn't take into account expected compounding returns

2philh
I've written about this here. Bottom line is, if you actually value money linearly (you don't) you should not bet according to the Kelly criterion.

I was figuring that the SWE-bench tasks don’t seem particularly hard, intuitively. E.g. 90% of SWE-bench verified problems are “estimated to take less than an hour for an experienced software engineer to complete”.

 

I mean, fair but when did a benchmark designed to test REAL software engineering issues that take less than an hour suddenly stop seeming "particularly hard" for a computer.

Feels like we're being frogboiled.

I don't think you can explain away SWE-bench performance with any of these explanations

5yo-cuddles
I would say that, barring strong evidence to the contrary, this should be assumed to be memorization. I think that's useful! LLM's obviously encode a ton of useful algorithms and can chain them together reasonably well But I've tried to get those bastards to do something slightly weird and they just totally self destruct. But let's just drill down to demonstrable reality: if past SWE benchmarks were correct, these things should be able to do incredible amounts of work more or less autonomously and get all the LLM SWE replacements we've seen have stuck to highly simple, well documented takes that don't vary all that much. The benchmarks here have been meaningless from the start and without evidence we should assume increments on them is equally meaningless The lying liar company run by liars that lie all the time probably lied here and we keep falling for it like Wiley Coyote
9Steven Byrnes
I’m not questioning whether o3 is a big advance over previous models—it obviously is! I was trying to address some suggestions / vibe in the air (example) that o3 is strong evidence that the singularity is nigh, not just that there is rapid ongoing AI progress. In that context, I haven’t seen people bringing up SWE-bench as much as those other three that I mentioned, although it’s possible I missed it. Mostly I see people bringing up SWE-bench in the context of software jobs. I was figuring that the SWE-bench tasks don’t seem particularly hard, intuitively. E.g. 90% of SWE-bench verified problems are “estimated to take less than an hour for an experienced software engineer to complete”. And a lot more people have the chops to become an “experienced software engineer” than to become able to solve FrontierMath problems or get in the top 200 in the world on Codeforces. So the latter sound extra impressive, and that’s what I was responding to.

We haven't yet seen what happens when they turn to the verifiable property of o3 to self-play on a variety of strategy games. I suspect that it will unlock a lot of general reasoning and strategy

1Ariel_
Do you think there's some initial evidence for that? E.g. Voyager or others from Deepmind. Self play gets thrown around a lot, not sure if concretely we've seen much yet for LLMs using it. But yes agree, good point regarding strategy games being a domain that could be verifiable

can you say the types of problems they are?

4Rafael Harth
You could call them logic puzzles. I do think most smart people on LW would get 10/10 without too many problems, if they had enough time, although I've never tested this.

can you say more about your reasoning for this?

About two years ago I made a set of 10 problems that imo measure progress toward AGI and decided I'd freak out if/when LLMs solve them. They're still 1/10 and nothing has changed in the past year, and I doubt o3 will do better. (But I'm not making them public.)

Will write a reply to this comment when I can test it.

fwiw while it's fair to call this "heavy nudging", this mirrors exactly what my prompts for agentic workflows look like. I have to repeat things like "Don't DO ANYTHING YOU WEREN'T ASKED" multiple times to get them to work consistently.

I found this post to be incredibly useful to get a deeper sense of Logan's work on naturalism.

I think his work on Naturalism is a great and unusual example of original research happening in the rationality community and what actually investigating rationality looks like.

Emailed you.

3P. João
I answered your email :)

In my role as Head of Operations at Monastic Academy, every person in the organization is on a personal improvement plan that addresses the personal responsibility level, and each team in the organization is responsible for process improvements that address the systemic level.

In the performance improvement weekly meetings, my goal is to constantly bring them back to the level of personal responsibility.  Any time they start saying the reason they couldn't meet their improvement goal was because of X event or Y person, I bring it back. What could THEY ... (read more)

Personal responsibility and systemic failure are different levels of abstraction.

If you're within the system and doing horrible things while saying, "🤷 It's just my incentives, bro," you're essentially allowing the egregore to control you, letting it shove its hand up your ass and pilot you like a puppet.

At the same time, if you ignore systemic problems, you're giving the egregore power by pretending it doesn't exist—even though it’s puppeting everyone. By doing so, you're failing to claim your own power, which lies in recognizing your ability to work tow... (read more)

2Dagon
It's interesting to figure out how to make use of this multi-level model.  Especially since personal judgement and punishment/reward (both officially and socially) IS the egregore - holding people accountable for their actions is indistinguishable from changing their incentives, right?

I think the model of "Burnout as shadow values" is quite important and loadbearing in my own model of working with many EAs/Rationalists.  I don't think I first got it from this post but I'm glad to see it written up so clearly here.

Any easy quick way to test is to offer some free coaching in this method.

3P. João
Thank you for the suggestion! Offering coaching is indeed a great way to test and refine the framework. If anyone is interested, I’d be happy to provide free coaching sessions based on this method. We have an initial evaluation form that can serve as a starting point, and I can guide participants through it. I only ask for some patience as my dyslexia can sometimes slow communication slightly. If you're interested or know someone who might be, please feel free to contact me at sistemaestimat@gmail.com. Sharing your email would also help coordinate further. Looking forward to exploring this opportunity!

Can you say more about how you've used this personally or with clients? What approaches you tried that didn't work, and how this has changed if at all to be more effective over time?

There's a lot here that's interesting, but hard for me to tell from just your description how battletested this is

3P. João
Hi Matt Goldenberg, I’m truly happy. In a world with so much information available, catching someone’s interest made me yell like a rooster. I see that having more tested evidence would be ideal. Since 2013, I’ve been looking for ways to battle-test ESTIMAT. That year, I had to leave the military firefighting corps in Brazil because I disagreed with their "ethics," so to speak. I decided to start a business, and at first, ESTIMAT was a way to distribute profits by merit in a company I started with a friend. We used an experience points (XP) system for this. Although I don’t have baseline metrics or a control group, I noticed that with this system, our dedication to the business increased. Later, we lost our supplier in China and couldn’t find competitive replacements. That’s when I thought: Why not use a similar model to evaluate myself and improve my own experience (XP)? To measure human skills, XP, and so on, I first pursued a postgraduate degree in neuroscience to explore how pleasure might form synapses in the brain. However, I lacked the mathematical background to make solid estimations. I tried enrolling in a master’s program in biological mathematics but couldn’t find interested peers in my city. The groups I encountered were either focused on external mathematical problems or philosophy, but I couldn’t find one that connected both fields with human behavior. I moved to Argentina and started studying math thinking to improve my mathematical skills. Since 2015, I’ve tested various versions of ESTIMAT. At one point, I evaluated myself every 25 minutes using the method. While this isn’t what I propose now, it helped me structure my values, identities, and virtues in a more sophisticated way. According to my personal improvement graphs, the results were incredible. I have gigabytes of spreadsheets with data testing different ESTIMAT alternatives. Even my partner joined the process at one point. However, communicating these ideas was always challenging for

What would the title be?

gwern229

Just ask a LLM. The author can always edit it, after all.


My suggestion for how such a feature could be done would be to copy the comment into a draft post, add LLM-suggested title (and tags?), and alert the author for an opt-in, who may delete or post it.

If it is sufficiently well received and people approve a lot of them, then one can explore optout auto-posting mechanisms, like "wait a month and if the author has still neither explicitly posted it nor deleted the draft proposal, then auto-post it".

I still don't quite get it. We already have an Ilya Sutskever who can make type 1 and type 2 improvements, and don't see the sort of jump's in days your talking about (I mean, maybe we do, and they just look discontinuous because of the release cycles?)

Why do you imagine this? I imagine we'd get something like one Einstein from such a regime, which would maybe increase the timelines over existing AI labs by 1.2x or something? Eventually this gain compounds but I imagine that could tbe relatively slow and smooth , with the occasional discontinuous jump when something truly groundbreaking is discovered

2Nathan Helm-Burger
I'm not sure how to answer this in a succinct way. I have rather a lot of ideas on the subject, including predictions about several likely ways components x/y/z may materialize. I think one key piece I'd highlight is that there's a difference between:  1. coming up with a fundamental algorithmic insight that then needs not only experiments to confirm but also a complete retraining of the base model to take advantage of 2. coming up with other sorts of insights that offer improvements to the inference scaffolding or adaptability of the base model, which can be rapidly and cheaply experimented on without needing to retrain the base model. It sounds to me that the idea of scraping together a system roughly equivalent to an Albert Einstein (or Ilya Sutskever or Geoffrey Hinton or John von Neumann) would put us in a place where there were improvements that the system itself could seek in type 1 or type 2. The trajectory you describe around gradually compounding gains sounds like what I imagine type 1 to look like in a median case. I think there's also some small chance for getting a lucky insight and having a larger type 1 jump forwards. More importantly for expected trajectories is that I expect type 2 insights to have a very rapid feedback cycle, and thus even while having a relatively smooth incremental improvement curve the timeline for substantial improvements would be better measured in days than in years. Does that make sense? Am I interpreting you correctly?

Right, and per the second part of my comment - insofar as consciousness is a real phenomenon, there's an empirical question of if whatever frame invariant definition of computation you're using is the correct one.

Do you think wants that arise from conscious thought processes are equally valid to wants that arise from feelings? How do you think about that?

2AnthonyC
Good question. Curious  to hear what the OP thinks, too. Personally I'm not convinced that the results of a conscious process are actually "wants" in the sense the described here, until they become more deeply internalized. Like, obviously if I want ice cream it's partly because at some point I consciously chose to try it and (plausibly) wanted to try it. But I don't know that I can choose to want something, as opposed to putting myself in the position of choosing something with the hope or expectation that will come to want it. The way I think about it, I can choose to try things, or do things. I can choose to want to want things, or want to like things. As I try and do the things I want to want and like, I may come to want them. I can use various techniques to make those subconscious changes faster, easier, or more likely. But I don't think I can choose to want things. I do think this matters, because in the long run, choosing to not do or get the things you want, in favor of the things you consciously think you should want, or want to want, but don't, is not good for mental health.
4DaystarEld
I think wants that arise from conscious thought are, fundamentally, wants that arise from feelings attached to those conscious thoughts. The conscious thought processes may be mistaken in many ways, but they still evoke memories or predictions that trigger emotions associated with imagined world-states, which translate to wants or not-wants.

while this paradigm of 'training a model that's an agi, and then running it at inference' is one way we get to transformative agi, i find myself thinking that probably WON'T be the first transformative AI, because my guess is that there are lots of tricks using lots of compute at inference to get not quite transformative ai to transformative ai.

my guess is that getting to that transformative level is gonna require ALL the tricks and compute, and will therefore eek out being transformative BY utilizing all those resources.

one of those tricks may be running ... (read more)

1Will Taylor
Agreed that this is far from the only possibility, and we have some discussion of increasing inference time to make the final push up to generality in the bit beginning "If general intelligence is achievable by properly inferencing a model with a baseline of capability that is lower than human-level..." We did a bit more thinking around this topic which we didn't think was quite core to the post, so Connor has written it up on his blog here: https://arcaderhetoric.substack.com/p/moravecs-sea Our method 5 is intended for this case - we'd use an appropriate 'capabilities per token' multiplier to account for needing extra inference time to reach human level.
3Nathan Helm-Burger
Okay, so I am inclined to agree with Matt that the scenario of "crazy inefficient hacks burning absurd amounts of inference compute" would likely be a good description of the very first ever instance of an AGI. However! How long would that situation last? I expect, not long enough to be strategically relevant enough to include in a forecast like this one. If such inefficiencies in inference compute are in place, and the system was trained on and is running on many orders of magnitude more compute than the human brain runs on... Surely there's a huge amount of low-hanging fruit which the system itself will be able to identify to render itself more efficient. Thus, in just a few hours or days you should expect a rapid drop in this inefficiency, until the low-hanging fruit is picked and you end up closer to the estimates in the post. If this is correct, then the high-inefficiency-initial-run is mainly relevant for informing the search space of the frontier labs for scaffolding experiments.

This seems arbitrary to me. I'm bringing in bits of information on multiple layers when I write a computer program to calculate the thing and then read out the result from the screen

Consider, if the transistors on the computer chip were moved around, would it still process the data in the same way and wield the correct answer?

Yes under some interpretation, but no from my perspective, because the right answer is about the relationship between what I consider computation and how I interpret the results in getting


But the real question for me is - under a co... (read more)

6notfnofn
I recently came across unsupervised machine translation here. It's not directly applicable, but it opens the possibility that, given enough information about "something", you can pin down what it's encoding in your own language. So let's say now that we have a computer that simulates a human brain in a manner that we understand. Perhaps there really could be a sense in which it simulates a human brain that is independent of our interpretation of it. I'm having some trouble formulating this precisely.
Load More