All of Aprillion's Comments + Replies

thanks for concrete examples, can you help me understand how these translate from individual productivity to externally-observable productivity?

3 days to make a medium sized project

I agree Docker setup can be fiddly, however what happened with the 50+% savings - did you lower price for the customer to stay competitive, do you do 2x as many paid projects now, or did you postpone hiring another developer who is not needed now, or do you just have more free time? No change in support&maintenance costs compared to similar projects before LLMs?

processing is

... (read more)
3mruwnik
I can do more projects in parallel than I could have before. Which means that I have even more work now... The support and maintenance costs of the code itself are the same, as long as you maintain constant vigilance to make sure nothing bad gets merged. So the costs are moved from development to review. It's a lot easier to produce thousands of lines of slop which then have to be reviewed and loads of suggestions made. It's easy for bad taste to be amplified, which is a real cost that might not be noticed that much.    There are some evals which work on large codebases (e.g. "fix this bug in django"), but those are the minority, granted. They can help with the scaffolding, though - those tend to be large projects in which a Claude can help find things. But yeah, large files are ok if you just want to find something, but somewhere under 500 loc seems to be the limit of what will work well. Though you can get round it somewhat by copying the parts to be changed to a different file then copying them back, or other hacks like that...

toxic slime, which releases a cloud of poison gas if anything touches it

this reminds me of Oxygen Not Included (though I just learned the original reference is D&D), where Slime (which also releases toxic stuff) can be harversted to produce useful stuff in Algae Distiller

the metaphor runs differently, one of the useful stuff from Slime is Polluted Water, which is also produced by humans replicants in Lavatory ... and there is Water Sieve that will process Polluted Water into Water (and some plants want to be watered by the Polluted variant)

makes me won... (read more)

Talking out loud is even better. There is something about forcing your thoughts into language...

Those are 2 very different things for some people ;)

I, for one, can think MUCH faster without speaking out loud, even if subvocalize real words (for the purpose of revealing gaps) and don't go all the way to manipulating concepts-that-don't-have-words-yet-but-have-been-pointed-to-already or concepts-that-have-a-word-but-the-word-stands-for-5-concepts-and-we-already-narrowed-it-down-without-explicit-label ...

the set of problems the solutions to which are present in their training data

a.k.a. the set of problems already solved by open source libraries without the need to re-invent similar code?

that's not how productivity ought to be measured - it should measure some output per (say) a workday

1 vs 5 FTE is a difference in input, not output, so you can say "adding 5 people to this project will decrease productivity by 70% next month and we hope it will increase productivity by 2x in the long term" ... not a synonym of "5x productivity" at all

it's the measure by which you can quantify diminishig results, not obfuscate them!

...but the usage of "5-10x productivity" seems to point to a diffent concept than a ratio of useful output per input 🤷 AFAICT it's a synonym with "I feel 5-10x better when I write code which I wouldn't enjoy writing otherwise"

A thing I see around me, my mind.
Many a peak, a vast mountain range,
standing at a foothill,
most of it unseen.

Two paths in front of me,
a lighthouse high above.

Which one will it be,
a shortcut through the forest,
or a scenic route?

Climbing up for better views,
retreating from overlooks,
away from the wolves.

To think with all my lighthouses.

all the scaffold tools, system prompt, and what not add context for the LLM ... but what if I want to know what's the context too?

we can put higher utility on the shutdown

sounds instrumental to expand your moral circle to include other instances of yourself to keep creating copies of yourself that will shut down ... then exand your moral circle to include humans and shut them down too 🤔

exercise for readers: what patterns need to hold in the environment in order for "do what I mean" to make sense at all?

Notes to self (let me know if anyone wants to hear more, but hopefully no unexplored avenues can be found in my list of "obvious" if somewhat overlapping points):

  • sparsity - partial observations narrow down unobserved dimensions
  • ambiguity / edge of chaos - the environment is "interesting" to both agents (neither fully predictable nor fully random)
  • repetition / approximation / learnability - induction works
  • computational boundedness / embeddedness / diversity
  • power balance / care / empathy / trading opportunity / similarity / scale

Parity in computing is whether the count of 1s in a binary string is even or odd, e.g. '101' has two 1s => even parity (to output 0 for even parity, XOR all bits like 1^0^1 .. to output 1 for this, XOR that result with 1).

The parity problem (if I understand it correctly) sounds like trying to find out the minimum amount of data samples per input length a learning algorithm ought to need to figure out that a mapping between a binary input and a single bit output is equal to computing XOR parity and not something else (e.g. whether an integer is even/odd,... (read more)

Aprillion3-1

The failure mode of the current policy sounds to me like "pay for your own lesson to feel less motivated to do it again" while the failure mode of this proposal would be "one of the casinos might maybe help you cheat the system which will feel even more exciting" - almost as if the people who made the current policy knew what they were doing to set aligned incentives 🤔

Focus On Image Generators

 

How about audio? Is the speech-to-text domain as "close to the metal" as possible to deserve focus too or did people hit roadblocks that made image generators more attractive? If the latter, where can I read about the lessons learned, please?

What if you tried to figure out a way to understand the "canonical cliffness" and design a new line of equipment that could be tailored to fit any "slope"... Which cliff would you test first? 🤔

IMO

in my opinion, the acronym for the international math olympiad deserves to be spelled out here

1Stag
Good point imo, expanded and added a hyperlink!

Evolution isn't just a biological process; it's a universal optimization algorithm that applies to any type of entity

Since you don't talk about the other 3 forces of biological evolution, or about "time evolution" concept in physics...

And since the examples seem to focus on directional selection (and not on other types of selection), and also only on short-term effect illustrations, while in fact natural selection explains most aspects of biological evolution, it's the strongest long-term force, not the weakest one (anti-cancer mechanisms and why viruses d... (read more)

If anyone here might enjoy a dystopian fiction about a world where the formal proofs will work pretty well, I wrote Unnatural abstractions

Thank you for the engagement, but "to and fro" is a real expression, not a typo (and I'm keeping it).. it's used slightly unorthodoxly here, but it sounded right to my ear, so it survived editing ¯\_(ツ)_/¯

I tried to be use the technobabble in a way that's usefully wrong, so please also let me know if someone gets inspired by this short story.

I am not making predictions about the future, only commenting on the present - if you notice any factual error from that point of view, feel free to speak up, but as far as the doominess spectrum goes, it's supposed to be both too dystopian and too optimistic at the same time.

And if someone wants to fix a typo or a grammo, I'd welcome a pull request (but no commas shall be harmed in the process). 🙏

2Martin Vlach
*from?
Aprillion*50

Let me practice the volatile kidness here ... as a European, do I understand it correctly that this advice is targeted for US audience? Or am I the only person to whom it sounds a bit fake?

1X4vier
You might be right that the concept only applies to specific subcultures (in my case, educated relatively well-off Australians). Maybe another test could be - can you think of someone you've met in the past who a critic might describe as "rude/loud/obnoxious" but despite this, they seem to draw in lots of friends and you have a lot of fun whenever you hang out with them?

How I personally understand what it could mean to "understand an action:"

Having observed action A1 and having a bunch of (finite state machine-ish) models, each with a list of states that could lead to action A1, more accurate candidate model => more understanding. (and meta-level uncertainty about which model is right => less understanding)

Model 1            Model 2
S11 -> 50% A1      S21 -> 99% A1
    -> 50% A2          ->  1% A2

S21 -> 10% A1      S22 ->  1% A1
    -> 90% A3          -> 99% A2
                   
                   S23 -> 100% A3

Thanks for the clarification, I don't share the intuition this will prove harder than other hard software engineering challenges in non-AI areas that weren't solved in months but were solved in years and not decades, but other than "broad baseline is more significant than narrow evidence for me" I don't have anything more concrete to share.

A note until fixed: Chollet also discusses 'unhobbling' -> Aschenbrenner also discusses 'unhobbling'

5eggsyntax
I think the shift of my intuition over the past year has looked something like: a) (a year ago) LLMs seem really smart and general (especially given all the stuff they unexpectedly learned like translation), but they lack goals and long-term memory, I bet if we give them that they'll be really impressive. b) Oh, huh, if we add goals and long-term memory they don't actually do that well. c) Oh, huh, they fail at stuff that seems pretty basic relative to how smart and general they seem. d) OK, probably we should question our initial impression of how smart and general they are. I realize that's not really a coherent argument; just trying to give a sense of the overall shape of why I've personally gotten more skeptical.

I agree with "Why does this matter" and with the "if ... then ..." structure of the argument.

But I don't see from where do you see such high probability (>5%) of scaffolding not working... I mean whatever will work can be retroactively called "scaffolding", even if it will be in the "one more major breakthrough" category - and I expect they were already accounted for in the unhobblings predictions.

a year ago many expected scaffolds like AutoGPT and BabyAGI to result in effective LLM-based agents

Do we know the base rate how many years... (read more)

2eggsyntax
I've gone back and added my thoughts on unhobbling in a footnote: "Chollet Aschenbrenner also discusses 'unhobbling', which he describes as 'fixing obvious ways in which models are hobbled by default, unlocking latent capabilities and giving them tools, leading to step-changes in usefulness'. He breaks that down into categories here. Scaffolding and tooling I discuss here; RHLF seems unlikely to help with fundamental reasoning issues. Increased context length serves roughly as a kind of scaffolding for purposes of this discussion. 'Posttraining improvements' is too vague to really evaluate. But note that his core claim (the graph here) 'shows only the scaleup in base models; “unhobblings” are not pictured'." Frankly I'd be hesitant to put > 95% on almost any claims on this topic. My strongest reason for suspecting that scaffolding might not work to get LLMs to AGI is updating on the fact that it doesn't seem to have become a useful approach yet despite many people's efforts (and despite the lack of obvious blockers). I certainly expect scaffolding to improve over where it is now, but I haven't seen much reason to believe that it'll enable planning and general reasoning capabilities that are enormously greater than LLMs' base capabilities. What I mean by scaffolding here is specifically wrapping the model in a broader system consisting of some combination of goal-direction, memory, and additional tools that the system can use (not ones that the model calls; I'd put those in the 'tooling' category), with a central outer loop that makes calls to the model. Breakthroughs resulting in better models wouldn't count on my definition.

Aschenbrenner argues that we should expect current systems to reach human-level given further scaling

In https://situational-awareness.ai/from-gpt-4-to-agi/#Unhobbling, "scaffolding" is explicitly named as a thing being worked on, so I take it that progress in scaffolding is already included in the estimate. Nothing about that estimate is "just scaling".

And AFAICT neither Chollet nor Knoop made any claims in the sense that "scaffolding outside of LLMs won't be done in the next 2 years" => what am I missing that is the source of hope for longer timelines, please?

5eggsyntax
Thanks, yes, I should have mentioned 'unhobbling' in that sentence, have added. I debated including a flowchart on that (given below); in the end I didn't, but maybe I should have. But tl;dr, from the 'Why does this matter' section:  

It’s a failure of ease of verification: because I don’t know what to pay attention to, I can’t easily notice the ways in which the product is bad.

Is there an opposite of the "failure of ease of verification" that would add up to 100% if you would categorize the whole of reality into 1 of these 2 categories? Say in a simulation, if you attributed every piece of computation into following 2 categories, how much of the world can be "explained by" each category?

  • make sure stuff "works at all and is easy to verify whether it works at all"
  • stuff that works must be
... (read more)
0[comment deleted]

This leaves humming in search of a use case.

we can still hum to music, hum in (dis)agreement, hum in puzzlement, and hum the "that's interesting" sound ... without a single regard to NO or viruses, just for fun!

I agree with the premises (except "this is somewhat obvious to most" 🤷).

On the other hand, stopping AI safety research sounds like a proposal to go from option 1 to option 2:

  1. many people develop capabilities, some of them care about safety
  2. many people develop capabilities, none of them care about safety
Aprillion3-3

half of the human genome consists of dead transposons

The "dead" part is a value judgement, right? Parts of DNA are not objectively more or less alive.

It can be a claim that some parts of DNA are "not good for you, the mind" ... well, I rather enjoy my color vision and RNA regulation, and I'm sure bacteria enjoy their antibiotic resistance.

Or maybe it's a claim that we already know everything there is to know about the phenomena called "dead transposons", there is nothing more to find out by studying the topic, so we shouldn't finance that area of research.... (read more)

[This comment is no longer endorsed by its author]Reply
4johnswentworth
No, "dead transposons" meaning that they've mutated in some way which makes them no longer functional transposons, i.e. they can no longer copy themselves back into the genome (often due to e.g. another transposon copying into the middle of the first transposon sequence).
Aprillion10

Know some fancier formulas like left/mid/right, concatenate, hyperlink

Wait, I thought basic fancier formulas are like =index(.., match(.., .., 0)) 

I guess https://dev.to/aprillion/self-join-in-sheets-sql-python-and-javascript-2km4 might be a nice toy example if someone wants to practice the lessons from the companion piece 😹

Aprillion10

It's duct tapes all the way down!

Aprillion10

Bad: "Screw #8463 needs to be reinforced."

The best: "Book a service appointment, ask them to replace screw #8463, do a general check-up, and report all findings to the central database for all those statistical analyses that inform recalls and design improvements."

Aprillion20

Oh, I should probably mention that my weakness is that I cannot remember the stuff well while reading out loud (especially when I focus on pronunciation for the benefit of listeners)... My workaround is to make pauses - it seems the stuff is in working memory and my subconscious can process it if I give it a short moment, and then I can think about it consciously too, but if I would read out loud a whole page, I would have trouble even trying to summarize the content.

Similarly a common trick how to remember names is to repeat the name out loud.. that doesn... (read more)

Aprillion20

Yeah, I myself subvocalize absolutely everything and I am still horrified when I sometimes try any "fast" reading techniques - those drain all of the enjoyment our of reading for me, as if instead of characters in a story I would imagine them as p-zombies.

For non-fiction, visual-only reading cuts connections to my previous knowledge (as if the text was a wave function entangled to the rest of the universe and by observing every sentence in isolation, I would collapse it to just "one sentence" without further meaning).

I never move my lips or tongue though, ... (read more)

3Lorxus
I speed-read fiction, too. When I do, though, I'll stop for a bit whenever something or someone new is being described, to give myself a moment to picture it in a way that my mind can bring up again as set dressing.
2Shoshannah Tekofsky
That sounds great! I have to admit that I still get a far richer experience from reading out loud than subvocalizing, and my subvocalizing can't go faster than my speech. So it sounds like you have an upgraded form with more speed and richness, which is great!
Aprillion10

ah, but booby traps in coding puzzles can be deliberate... one might even say that it can feel "rewarding" when we train ourselves on these "adversarial" examples

the phenomenon of programmers introducing similar bugs in similar situations might be fascinating, but I wouldn't expect a clear answer to the question "Is this true?" without a slightly more precise definitions of:

  • "same" bug
  • same "bug"
  • "hastily" cobbled-together programs
  • hastily "cobbled-together" programs ...
Aprillion10

To me as a programmer and not a mathematitian, the distinction doesn't make practical intuitive sense.

If we can create 3 functions f, g, h so that they "do the same thing" like f(a, b, c) == g(a)(b)(c) == average(h(a), h(b), h(c)), it seems to me that cross-entropy can "do the same thing" as some particular objective function that would explicitly mention multiple future tokens.

My intuition is that cross-entropy-powered "local accuracy" can approximate "global accuracy" well enough in practice that I should expect better global reasoning from larger model ... (read more)

Aprillion94

transformer is only trained explicitly on next token prediction!

I find myself understanding language/multimodal transformer capabilities better when I think about the whole document (up to context length) as a mini-batch for calculating the gradient in transformer (pre-)training, so I imagine it is minimizing the document-global prediction error, it wasn't trained to optimize for just a single-next token accuracy...

There is evidence that transformers are not in fact even implicitly, internally, optimized for reducing global prediction error (except insofar as comp-mech says they must in order to do well on the task they are optimized for).

Do transformers "think ahead" during inference at a given position? It is known transformers prepare information in the hidden states of the forward pass at t that is then used in future forward passes t+τ. We posit two explanations for this phenomenon: pre-caching, in which off-diagonal gradient terms present in training result i

... (read more)
2Adam Shai
That's an interesting framing. From my perspective that is still just local next-token accuracy (cross-entropy more precisely), but averaged over all subsets of the data up to the context length. That is distinct from e.g. an objective function that explicitly mentioned not just next-token prediction, but multiple future tokens in what was needed to minimize loss. Does that distinction make sense? One conceptual point I'd like to get across is that even though the equation for the predictive cross-entropy loss only has the next token at a given context window position in it, the states internal to the transformer have the information for predictions into the infinite future. This is a slightly different issue than how one averages over training data, I think.
Aprillion20

Can you help me understand a minor labeling convention that puzzles me? I can see how we can label  from the Z1R process as  in MSP because we observe 11 to get there, but why  is labeled as  after observing either 100 or 00, please?

2Adam Shai
Good catch! That should be eta_00, thanks! I'll change it tomorrow.

Pushing writing ideas to external memory for my less burned out future self:

  • agent foundations need path-dependent notion of rationality

    • economic world of average expected values / amortized big O if f(x) can be negative or you start very high
    • vs min-maxing / worst case / risk-averse scenarios if there is a bottom (death)
    • pareto recipes
  • alignment is a capability

    • they might sound different in the limit, but the difference disappears in practice (even close to the limit? 🤔)
  • in a universe with infinite Everett branches, I was born in the subset tha

... (read more)

Now, suppose Carol knows the plan and is watching all this unfold. She wants to make predictions about Bob’s picture, and doesn’t want to remember irrelevant details about Alice’s picture. Then it seems intuitively “natural” for Carol to just remember where all the green lines are (i.e. the message M), since that’s “all and only” the information relevant to Bob’s picture.


(Writing before I read the rest of the article): I believe Carol would "naturally" expect that Alice and Bob share more mutual information than she does with Bob herself (even if they... (read more)

yeah, I got a similar impression that this line of reasoning doesn't add up...

we interpret other humans as feeling something when we see their reactions

we interpret other eucaryotes as feeling something when we see their reactions 🤷

(there are a couple of circuit diagrams of the whole brain on the web, but this is the best.  From this site.)

could you update the 404 image, please? (link to the site still works for now, just the image is gone)

1Joseph Bloom
The first frame, apologies.  This is a detail of how we number trajectories that I've tried to avoid dealing with in this post. We left pad in a context windows of 10 timesteps so the first observation frame is S5. I've updated the text not to refer to S5. 
2Jay Bailey
The agent's context includes the reward-to-go, state (i.e, an observation of the agent's view of the world) and action taken for nine timesteps. So, R1, S1, A1, .... R9, S9, A9. (Figure 2 explains this a bit more) If the agent hasn't made nine steps yet, some of the S's are blank. So S5 is the state at the fifth timestep. Why is this important? If the agent has made four steps so far, S5 is the initial state, which lets it see the instruction. Four is the number of steps it takes to reach the corridor where the agent has to make the decision to go left or right. This is the key decision for the agent to make, and the agent only sees the instruction at S5, so S5 is important for this reason. Figure 1 visually shows this process - the static images in this figure show possible S5's, whereas S9 is animation_frame=4 in the GIF - it's fast, so it's hard to see, but it's the step before the agent turns.

I agree with what you say. My only peeve is that the concept of IGF is presented as a fact from the science of biology, while it's used as a confused mess of 2 very different concepts.

Both talk about evolution, but inclusive finess is a model of how we used to think about evolution before we knew about genes. If we model biological evolution on the genetic level, we don't have any need for additional parameters on the individual organism level, natural selection and the other 3 forces in evolution explain the observed phenomena without a need to talk about... (read more)

Aprillion*110

humans don't actually try to maximize their own IGF


Aah, but humans don't have IGF. Humans have https://en.wikipedia.org/wiki/Inclusive_fitness, while genes have allele frequency https://en.wikipedia.org/wiki/Gene-centered_view_of_evolution ..

Inclusive genetic fitness is a non-standard name for the latter view of biology as communicated by Yudkowsky - as a property of genes, not a property of humans.

The fact that bio-robots created by human genes don't internally want to maximize the genes' IGF should be a non-controversial point of view. The human genes su... (read more)

2jacob_cannell
When I use IGF in the dialogue I'm doing so mostly because Nate's sharp left turn post which I quoted used 'IGF', but I understood it to mean inclusive fitness - ie something like "fitness of an individual's shared genotype". If this is his "obvious example", then it's just as obviously wrong. There is immense optimization pressure to align the organism's behavior with IGF, and indeed the theory of IGF was developed in part to explain various observed complex altruistic-ish behaviors. As I argue in the dialogue, humanity is an excellent example of inner alignment success. There is a singular most correct mathematical measure of "alignment success" (fitness score of geneset - which is the species homo sapiens in this case), and homo sapiens undeniably are enormously successful according to that metric.

Some successful 19th century experiments used 0.2°C/minute and 0.002°C/second.

Have you found the actual 19th century paper?

The oldest quote about it that I found is from https://www.abc.net.au/science/articles/2010/12/07/3085614.htm

Or perhaps the story began with E.M. Scripture in 1897, who wrote the book, The New Psychology. He cited earlier German research: "…a live frog can actually be boiled without a movement if the water is heated slowly enough; in one experiment the temperature was raised at the rate of 0.002°C per second, and the frog was found
... (read more)
6philh
So the linked article is exactly the type of thing I'm complaining about. * If you dump a frog in literally boiling water, will it jump out? Sure, no. But like I say, I don't consider that the interesting part of the claim. * If you dump a frog in water that's hot enough to kill it slowly, will it jump out? Everyone seems to agree yes. * If you dump a frog in cold water, then slowly increase the temperature to where it's hot enough to kill the frog, will it jump out? * According to wikipedia: 19th century researchers say no, if you do it about 0.1°C/minute; yes, if you do it about 3.8°C/minute. * According to both wikipedia and the linked article: 20th century researchers say yes, if you heat it about 2°F/minute. * These are obviously not in contradiction! The obvious simple conclusion is "not if the speed is below some critical threshold somewhere between about 0.1°C/minute and 1°C/minute". It's probably not actually that simple - there are lots of different frog species, and even more individual frogs, and maybe it makes a difference how pure the water is or how still it is or the air temperature or when the frog last ate or or or... but the evidence presented should obviously not be enough to make us think the effect is fake. Like, we might not be convinced that the effect is real - maybe we think the 19th century researchers made shit up or something. But we definitely shouldn't be dismissing it based on the modern experiments that we've been told about. (Even if we don't know about the 19th century experiments, just knowing the modern results shouldn't make us dismiss the idea. Perhaps Victor Hutchison has some reason to think that an effect not seen at 2°F/minute won't be seen at all. If he does, the article doesn't tell us about it. If not, it's a leap to go from "we haven't seen this yet" to "this doesn't exist".) Now admittedly the article does acknowledge and try to to refute the 19th century researchers. But most of this refutation is obv

I'm not sure what to call this sort of thing. Is there a preexisting name?

sounds like https://en.wikipedia.org/wiki/Emergence to me 🤔 (not 100% overlap and also not the most useful concept, but very similar shaky pointer in concept space between what is described here and what has been observed as a phenomena called Emergence)

Thanks to Gaurav Sett for reminding me of the boiling frog.

I would like to see some mention that this is a pop culture reference / urban myth, not  something actual frogs might do.

To quote https://en.wikipedia.org/wiki/Boiling_frog, "the premise is false".

[This comment is no longer endorsed by its author]Reply
2philh
(Discussed in this comment.)

PSA: This is the old page pointing to the 2022 meetup month events, chances are you got here in year 2023 (at the time of writing this comment) while there was a bug on the homepage of lesswrong.com with a map and popup link pointing here...

https://www.lesswrong.com/posts/ynpC7oXhXxGPNuCgH/acx-meetups-everywhere-2023-times-and-places seems to be the right one 🤞

sampled uniformly and independently

 

🤔 I don't believe this definition fits the "apple" example - uniform samples from a concept space of "apple or not apple" would NEVER™ contain any positive example (almost everything is "not apple")... or what assumption am I missing that would make the relative target volume more than ~zero (for high n)?

Bob will observe a highly optimized set of Y, carefully selected by Alice, so the corresponding inputs will be Vastly correlated and interdependent at least for the positive examples (centeroid first, dynamically selected for error-correction later 🤷‍♀️), not at all selected by Nature, right?

A hundred-dollar note is only worth anything if everyone believes in its worth. If people lose that faith, the value of a currency goes down and inflation goes up.

Ah, the condition for the reality of money is much weaker though - you only have to believe that you will be able to find "someone" who believes they can find someone for whom money will be worth something, no need to involve "everyone" in one's reasoning.

Inflation is much more complicated of course, but in essence, you only have to believe that other people believe that money is losing value and... (read more)

1Karl von Wendt
Yes, thanks for the clarification! I was indeed oversimplifying a bit.

yes, it takes millions to advance, but companies are pouring BILLIONS into this and number 3 can earn its own money and create its own companies/DAOs/some new networks of cooperation if it wanted without humans realizing ... have you seen any GDP per year charts whatsoever, why would you think we are anywhere close to saturation of money? have you seen any emergent capabilities from LLMs in the last year, why do you think we are anywhere close to saturation of capabilities per million of dollars? Alpaca-like improvemnts are somehow one-off miracle and thin... (read more)

2[anonymous]
Because we are saturated right now and I gave evidence and you can read the gpt-4 paper for more evidence. See: "getting more money saturates, there is a finite number of training accelerators manufactured per quarter and it takes time to ramp to higher volume" "Billions" cannot buy more accelerators than exist, and the robot/compute/capabilities limits also limit the ROI that can be provided, which makes the billions not infinite as eventually investors get impatient. What this means is that it may take 20 years or more of steady exponential growth (but only 10-50 percent annually) to reach ASI and self replicating factories and so on. On a cosmic timescale or even a human lifespan this is extremely fast. I am noting this is more likely than "overnight" scenarios where someone tweaks a config file, an AI reaches high superintelligence and fills the earth with grey goo in days. There was not enough data in existence for the AI to reach high superintelligence, a "high" superintelligence would require thousands or millions of times as much training compute as GPT-4 (because it's a power law), even once it's trained it doesn't have sufficient robotics to bootstrap to nanoforges without years or decades of steady ramping to be ready to do that. (a high superintelligence is a machine that is not just a reasonable amount better than humans at all tasks but is essentially a deity outputting perfect moves on every task that take into account all of the machines plans and cross task and cross session knowledge. So it might communicate with a lobbyist and 1e6 people at once and use information from all conversations in all conversations, essentially manipulating the world like a game of pool. Something genuinely uncontainable.)
Load More