All of ErickBall's Comments + Replies

I would think things are headed toward these companies fine tuning an open source near-frontier LLM. Cheaper than building one from scratch but with most of the advantages.

Yeah, something along the lines of an ELO-style rating would probably work better for this. You could put lots of hard questions on the test and then instead of just ranking people you compare which questions they missed, etc.

This works for corn plants because the underlying measurement "amount of protein" is something that we can quantify (in grams or whatever) in addition to comparing two different corn plants to see which one has more protein. IQ tests don't do this in any meaningful sense; think of an IQ test more like a Moh's hardness scale, where you can figure out a new material's position on the scale by comparing it to a few with similar hardness and seeing which are harder and which are softer. If it's harder than all of the previously tested materials, it just goes at the top of the scale.

gwern*216

IQ tests include sub-tests which can be cardinal, with absolute variables. For example, simple & complex reaction time; forwards & backwards digit span; and vocabulary size. (You could also consider tests of factual knowledge.) It would be entirely possible to ask, 'given that reaction time follows a log-normalish distribution in milliseconds and loads on g by r = 0.X and assuming invariance, what would be the predicted lower reaction time of someone Y SDs higher than the mean on g?' Or 'given that backwards digit span is normally distributed...' T... (read more)

8GeneSmith
You can definitely extrapolate out of distribution on tests where the baseline is human performance. We do this with chess ELO ratings all the time.

I wasn't saying it's impossible to engineer a smarter human. I was saying that if you do it successfully, then IQ will not be a useful way to measure their intelligence. IQ denotes where someone's intelligence falls relative to other humans, and if you make something smarter than any human, their IQ will be infinity and you need a new scale.

2tailcalled
IQ tests are built on item response theory, where people's IQ is measured in terms of how difficult tasks they can solve. The difficulty of tasks is determined by how many people can solve them, so there is an ordinal element to that, but by splitting the tasks off you could in principle measure IQ levels quite high, I think.
5GeneSmith
I don't think this is the case. You can make a corn plant with more protein than any other corn plant, and using standard deviatios to describe it will still be useful. Granted, you may need a new IQ test to capture just how much smarter these new people are, but that's different than saying they're all the same.
ErickBall6-2

it’s not even clear what it would mean to be a 300-IQ human

IQ is an ordinal score, not a cardinal one--it's defined by the mean of 100 and standard deviation of 15. So all it means is that this person would be smarter than all but about 1 in 10^40 natural-born humans. It seems likely that the range of intelligence for natural-born humans is limited by basic physiological factors like the space in our heads, the energy available to our brains, and the speed of our neurotransmitters. So a human with IQ 300 is probably about the same as IQ 250 or IQ 1000 or IQ 10,000, i.e. at the upper limit of that range.

2tailcalled
IQ is an ordinal score in that it's relationship to outcomes of interest is nonlinear, but for the most important outcomes of interest, e.g. ability to solve difficult problems or income or similar, the relationship between IQ and success at the outcome is exponential, so you'd be seeing accelerating returns for a while. Presumably fundamental physics limits how far these exponential returns can go, but we seem quite far from those limits (e.g. we haven't even solved aging yet).
6localdeity
The original definition of IQ, intelligence quotient, is mental age (as determined by cognitive test scores) divided by chronological age (and then multiplied by 100).  A 6-year-old with the test scores of the average 9-year-old thus has an IQ of 150 by the ratio IQ definition. People then found that IQ scores roughly followed a normal distribution, and subsequent tests defined IQ scores in terms of standard deviations from the mean.  This makes it more convenient to evaluate adults, since test scores stop going up past a certain age in adulthood (I've seen some tests go up to age 21).  However, when you get too many standard deviations away from the mean, such that there's no way the test was normed on that many people, it makes sense to return to the ratio IQ definition. So an IQ 300 human would theoretically, at age 6, have the cognitive test scores of the average 18-year-old.  How would we predict what would happen in later years?  I guess we could compare them to IQ 200 humans (of which we have a few), so that the IQ 300 12-year-old would be like the IQ 200 18-year-old.  But when they reached 18, we wouldn't have anything to compare them against. I think that's the most you can extract from the underlying model.
4GeneSmith
I would be quite surprised if this were true. We should expect scaling laws for brain volume alone to continue well beyond the current human range, and brain volume only explains about 10% of the variance in intelligence.

I've heard doctors ask questions like this but I don't think they usually get very helpful answers. "My diet's okay I guess, pretty typical, a lot of times I don't sleep great, and yeah I have a pretty stressful job." Great, what do you do with that?

"Food" in general is about the easiest and most natural thing for a dog to identify. Distinguishing illegal drugs from all the other random stuff a person might be carrying (soap, perfume, medicine, etc.) at least requires a lot better training than finding food.

7Ben
Very possible. I am not fully convinced. The dog had to identify the people who had food in there bags, and tell them apart from all the people who used to have food in those same bags, or were eating on the flight and have food on there breath or hands. A dog trying to identify (for example) canabis would probably have an easier time. My stance is not "I know 100% that sniffer dogs are a silver bullet", but the weaker position "The majority of the value of a sniffer dog comes from it actually smelling things, rather than giving the officer controlling it a plausible way of profiling based on other (possibly protected) characteristics."

It's interesting that 3.5 Sonnet does not seem to match, let alone beat, GPT-4o on the leaderboard (https://chat.lmsys.org/?leaderboard). Currently it shows GPT-4o with elo 1287 and Claude 3.5 Sonnet at 1271.

8gwern
Yeah, there's a decent amount of debate going on about how good 3.5 Sonnet is vs 4o, or if 4o was badly underperforming its benchmarks + LMsys to begin with. Has 4o been crippled by something post-deployment?* Is this something about long-form interaction with Claude, which is missed by benchmarks and short low-effort LMsys prompts? Are Claude users especially tilting into coding now given the artifact/project features, which seems to be the main strength of Claude-3.5-Sonnet? Every year, it seems like benchmarking powerful generalist AI systems gets substantially harder, and this may be the latest iteration of that difficulty. (Given the level of truesight and increasing level of persistency of account history, we may be approaching the point where different models give different people intrinsically different experiences - eg. something like, Claude genuinely works better for you than for me, while I genuinely find ChatGPT-4o more useful, because you happen to be politer and ask more sensible questions like Claude is a co-worker and that works better with the Claude RLAIF, while the RLHF crushes GPT-4o into submission so while it's a worse model it's more robust to my roughshod treatment of GPT-4o as a slave. Think of it as like Heisenbugs on steroids, or operant conditioning into tacit knowledge: some people just have more mana and mechanical sympathy, and they can't explain how or why.) * I've noticed what seems like some regressions in GPT-4o since the launch, in my Gwern.net scripts, where it seems to have gotten oddly worse at some simple tasks like guessing URLs or picking keywords to bold in abstracts, and is still failing to clean some URL titles despite ~40 few-shot examples collected from previous errors.

Although it would also be nice to distinguish that from "I read this post already somewhere else"

I would love to have a checkbox or something next to each post to indicate "I saw this and I don't want to click on it"

3Ruby
Yeah, I think we should do something like this. Maybe the box is "don't show me posts like this"
1ErickBall
Although it would also be nice to distinguish that from "I read this post already somewhere else"
ErickBall73

As a counterpoint, take a look at this article: https://peterattiamd.com/protein-anabolic-responses/

The upshot is that the studies saying your body can only use 45g of protein per meal for muscle synthesis are mostly based on fast-acting whey protein shakes. Stretching out the duration of protein metabolism (by switching protein sources and/or combining it with other foods in a gradually-digested meal) can mitigate the problem quite a bit.

ErickBall60

Saturated fats are definitely manageable in small amounts. For most of history, and still in many places today, the biggest concern for an infant was getting sufficient calories, and saturated fat is a great choice for that. When you look at modern hunter-gatherer diets, they contain animal products, but in most cases they do not make up the majority of calories (exceptions usually involve lots of seafood), the meats are wild and therefore fairly lean, and BMI stays generally quite low. Under those conditions, heart disease risk is small and whether it is ... (read more)

ErickBall40

Real can of worms that deserves its own post I would think

ErickBall32

I think in this case just spacing them out would help more.

ErickBall327

Downvoted because I waded through all those rhetorical shenanigans and I still don't understand why you didn't just say what you mean.

5RHollerith
As a deep-learning novice, I found the post charming and informative.
2[comment deleted]
7abramdemski
To me, the lengthy phrases do in fact get closer to "zack saying what zack meant" than the common terms like 'deep learning' -- but, like you, I didn't really get anything new out of the longer phrases. I believe that people who don't already think of deep learning as function approximation may get something out of it tho. So in consequence I didn't downvote or upvote.
Zack_M_Davis1511

This comment had been apparently deleted by the commenter (the comment display box having a "deleted because it was a little rude, sorry" deletion note in lieu of the comment itself), but the ⋮-menu in the upper-right gave me the option to undelete it, which I did because I don't think my critics are obligated to be polite to me. (I'm surprised that post authors have that power!) I'm sorry you didn't like the post.

ErickBall10

Separate clocks would be a pain to manage in a board game, but in principle "the game ends once 50% of players have run out of time" seems like a decent condition.

2mako yass
In practice what I was going to do was just say that each turn is limited to like 40 seconds or whatever.

Oh, good point, I had forgotten about the zero-sum victory points. The extent to which the other parts are zero sum depends a lot on how large the game board is relative to the number of players, so it could be adjusted. I was thinking about having a time limit instead of a round limit, to encourage the play to move quickly, but maybe that's too stressful. If you want the players to choose to end the game, then you'd want to build in a mechanic that works against all of them more and more as the game progresses, so that at some point continuing becomes counterproductive...

3mako yass
I like time limits because time constraints are what make negotiation difficult (imperfect compromise), though just having a single shared time limit lets players filibuster. If players have separate time limits it's basically still a round limit, but good point to remember to impose a time limit.

Would a good solution be to just play Settlers, but instead of saying "the goal is to get more points than anyone else," say "this is a variant where the goal is to get the highest score you can, individually"? That seems like it would change the negotiation dynamics in a potentially interesting way without having to make or teach a brand new game. Does this miss the point somehow?

9mako yass
Solution to what. That would be cohabitive, I'd like to play that at least once, but I wouldn't expect it to work that well. 4 of 10 victory points in catan come from criteria that're inherently zero sum (having a longer road or bigger army than anyone else) (I wouldn't know how to adapt those). I'm not sure to what extent land scarcity makes the other conditions fairly zero sum as well. I haven't played a lot of Catan. You'd have to replace the end condition with a round limit. P1 (and the other one I'm going to publish soon, Final Autumn) also just ends after a certain number of rounds, and the only way to pace it well is to make it end 'too early', so that every game will be a study of haste. I don't love it. I wonder if we should try for a mechanic where players have to, to some extent somewhat deliberately build the true peace by taking some actions in the world that freezes current conditions in place/ends the game. I think that could be pretty interesting.

So, then it seems like the client's best move in this scenario is to lie to you strategically, or at least omit information strategically. They could say "I know for sure you won't find any fingerprints or identifiable face in the camera footage" and "I think my friends will confirm that I was playing video games with them", and as long as they don't actually tell you that's a lie, you can put those friends on the stand, right?

5ymeskhout
Correct, there are indeed potential advantages to lying to your attorney under very specific and narrow circumstances. You also have to consider the risky gamble this presents because you can't predict every aspect of the machinery. Maybe the jury never would've paid attention to the alibi aspect of the case, but if the alibi witnesses get exposed as liars by the prosecution, that alone could swing jurors from acquittal and towards conviction.

You say that lying to you can only hurt them but "There is a kernel of an exception that is almost not worth mentioning" because it is rarely relevant. I find this pretty hard to believe. If your client tells you "yeah I totally robbed that store, but I was wearing a ski mask and gloves so I think a jury will have reasonable doubt assuming my friends say I was playing video games with them the whole time", would you be on board with that plan? There must be plenty of cases where the cops basically know who did it but have trouble proving it. Maybe those just don't get to the point of a public defender getting assigned?

4ymeskhout
If a client tells me they know for sure that their alibi witness will be lying in their favor, then I'm not allowed to elicit the false testimony from that witness. If they admit to me to robbing the store but (truthfully and without omissions) say they were wearing a mask and functional gloves, then that lets me know what facets to focus on and what to avoid. If they're sure enough they left no fingerprints, then I can comfortably ask the investigating detectives if any fingerprints were found. If the circumstances allow it, then I may even get my own expert to dust the entire scene for fingerprints with the aim of presenting their absence as exculpatory evidence to the jury. Keep in mind that my job is not to help the government prosecute my client. And yes, there are plenty of cases where the perpetrator might be obvious from a common-sense perspective, but it would be legally difficult to prove in court.

That's like saying that because we live in a capitalist society, the default plan is to destroy every bit of the environment and fill every inch of the world with high rise housing projects. It's... true in some sense, but only as a hypothetical extreme, a sort of economic spherical cow. In reality, people and societies are more complicated and less single minded than that, and also people just mostly don't want that kind of wholesale destruction.

I didn't think the implication was necessarily that they planned to disassemble every solar system and turn it into probe factories. It's more like... seeing a vast empty desert and deciding to build cities in it. A huge universe, barren of life except for one tiny solar system, seems not depressing exactly but wasteful. I love nature and I would never want all the Earth's wilderness to be paved over. But at the same time I think a lot of the best the world has to offer is people, and if we kept 99.9% of it as a nature preserve then almost nobody would be around to see it. You'd rather watch the unlifted stars, but to do that you have to exist.

2jbash
No, the probes are instrumental and are actually a "cost of doing business". But, as I understand it, the orthodox plan is to get as close as possible to disassembling every solar system and turning it into computronium to run the maximum possible number of "minds". The minds are assumed to experience qualia, and presumably you try to make the qualia positive. Anyway, a joule not used for computation is a joule wasted.

I don't think governments have yet committed to trying to train their own state of the art foundation models for military purposes, probably partly because they (sensibly) guess that they would not be able to keep up with the private sector. That means that government interest/involvement has relatively little effect on the pace of advancement of the bleeding edge.

Fair point, but I can't think of a way to make an enforceable rule to that effect. And even if you could make that rule, a rogue AI would have no problem with breaking it.

1RogerDearnaley
Frontier models are all behind APIs, and the number of companies offering them is currently two, likely to soon be three. If they all agree this is unsafe, it's not that hard to prevent. For anything more than mildly intimate, it's also already blocked by their Terms of Service and their models will refuse. For a rogue, I agree. And one downside of not letting frontier models do this would be leaving unfulfilled demand for a rogue to take advantage of.

I think if you could demonstrably "solve alignment" for any architecture, you'd have a decent chance of convincing people to build it as fast as possible, in lieu of other avenues they had been pursuing.

2Seth Herd
Some people. But it would depend what the prospects were for that type of AGI. Because I don't think you could convince everyone else to stop working on other types of AGI. So it would be a race between the new "more alignable" type and the currently-leading types. If the "more alignable" type seemed guaranteed to lose that race, I'm not sure many people would even try building it.

Since our info doesn't seem to be here already: We meet on Sundays at 7pm, alternating between virtual and in-person in the lobby of the UMBC Performing Arts and Humanities Building. For more info, you can join our Google group (message the author of this post, bookinchwrm).

I found this post interesting, mostly because it illustrates deep flaws in the US tax system that we should really fix. I downvoted it because I think it is a terrible strategy for giving more money to charity. Many other good objections have been raised in the comments, and the post itself admits that lack of effectiveness is a serious problem. One problem I did not see addressed anywhere is reputational risk. The world is not static, and a technique that works for an individual criminal or a few conscientious objectors probably will not work consistently... (read more)

I always thought it would be great to have one set of professors do the teaching, and then a different set come in from other schools just for a couple weeks at the end of the year to give the students a set of intensive written and oral exams that determines a big chunk of their academic standing.

Answer by ErickBall10

I can now get real-time transcripts of my zoom meetings (via a python wrapper of the openai api) which makes it much easier to track the important parts of a long conversation. I tend to zone out sometimes and miss little pieces otherwise, as well as forget stuff.

That's fair, most of them were probably never great teachers.

ErickBall8-11

You are attributing a lot more deviousness and strategic boldness to the so-called deep state than the US government is organizationally capable of. The CIA may have tried a few things like this in banana republics but there's just no way anybody could pull it off domestically.

5trevor
This is a good point, that much of the data we have comes from leaked operations in South America (e.g. the Church Hearings), and CIA operations are probably much easier there than on American soil. However, there are also different kinds of systems pointed inward which look more like normal power games e.g. FBI informants, or lobbyists forming complex agreements/arrangements (like how their lawyer counterparts develop clever value-handshake-like agreements/arrangements to settle out-of-court). It shouldn't be surprising that domestic ops are more complicated and look like ordinary domestic power plays (possibly occasionally augmented by advanced technology). The profit motive alone could motivate Microsoft execs to leverage their access to advanced technology to get a better outcome for Microsoft. I was pretty surprised by the possibility that silicon valley VCs alone could potentially set up sophisticated operations e.g. using pre-established connections to journalists to leak false information or access to large tech companies with manipulation capabilities (e.g. Andreessen Horowitz's access to Facebook's manipulation research).

Professors being selected for research is part of it. Another part is the tenure you mentioned - some professors feel like once they have tenure they don't need to pay attention to how well they teach. But I think a big factor is another one you already mentioned: salaries. $150k might sound like a lot to a student, but to the kind of person who can become a math or econ professor at a top research university this is... not tiny but not close to optimal. They are not doing it for the money. They are bought in to a culture where the goal is building status ... (read more)

2Seth Herd
That's not luck. Non-research universities do select faculty by teaching skill.
0ChrisRumanov
I'm not fully convinced by the salary argument, especially with quality-of-life adjustment. As an example, let's imagine I'm a skilled post-PhD ML engineer, deciding between: Jane Street Senior ML Engineer: $700-750k, 50-55hrs/week, medium job security, low autonomy [Harvard/Yale/MIT] Tenured ML Professor: $200-250k, 40-45hrs/week, ultra-high job security, high autonomy A quick google search says that my university grants tenure to about 20 people per year. Especially as many professors have kids, side jobs, etc. it seems unlikely that a top university really can't find 20 good people across all fields who are both good teachers and would take the second option (in fact, I would guess that being a good teacher predisposes you to taking the second option). Is there some part of the tradeoff I'm missing?
6Viliam
I imagine that if they taught well before, they would still teach well by the sheer force of habit. Maybe slightly worse because they no longer bother to do it perfectly, but not "consistently present things in unclear or inconsistent ways".

But that sort of singularity seems unlikely to preserve something as delicately balanced as the way that (relatively well-off) humans get a sense of meaning and purpose from the scarcity of desirable things.

I think our world actually has a great track record of creating artificial scarcity for the sake of creating meaning (in terms of enjoyment, striving to achieve a goal, sense of accomplishment). Maybe "purpose" in the most profound sense is tough to do artificially, but I'm not sure that's something most people feel a whole lot of anyway?

I'm pretty opti... (read more)

Excellent, I think I will give something like that a try

I know this is an old thread but I think it's interesting to revisit this comment in light of what happened at Twitter. Musk did, in fact, fire a whole lot of people. And he did, in fact, unban a lot of conservatives without much obvious delay or resistance within the company. I'm not sure how much of an implication that has about your views of the justice department, though. Notably, it was pretty obvious that the decisions at Twitter were being made at the top, and that the people farther down in the org chart had to implement those decisions or be fired... (read more)

Thanks! I'd love to hear any details you can think of about what you actually do on a daily basis to maintain mental health (when it's already fairly stable). Personally I don't really have a system for this, and I've been lucky that my bad times are usually not that bad in the scheme of things, and they go away eventually.

2Sable
Great question. You've got the basics - eat right, workout, sleep, etc., but just saying that isn't much help. I've gotten a great deal out of habit chaining/trigger-action planning when used consistently; basically you create chains of actions that feed into one another so once you've started the chain, it takes no extra willpower to just keep following it to its conclusion. For instance: Wake up -> make breakfast -> get pills -> turn on sunlamp -> eat is one, that makes sure I take my medication, eat breakfast, and get some light everyday (the latter is especially important in the winter). Another is: Meditate -> Workout -> Shower which, while I mix up both the kinds of meditation and the kinds of workout, ensures all three get done, roughly every other day. Do it consistently, and eventually you can just do them on autopilot. You don't really forget anything and somehow, not doing them becomes the unnatural state. Hope that helps!

I'm not sure how I would work it out. The problem is that presumably you don't value one group more because they chose blue (it's because they're more altruistic in general) or because they chose red (it's because they're better at game theory or something). The choice is just an indicator of how much value you would put on them if you knew more about them. Since you already know a lot about the distribution of types of people in the world and how much you like them, the Bayesian update doesn't really apply in the same way. It only works on what pill they'... (read more)

Doesn't "trembling hand" mean it's a stable equilibrium even if there are?

6Richard_Kennaway
Yes, but if someone accidentally picks blue, that's their own fault. The blue-picker injures only themselves, hence the stability against trembling hands. I would care enough to warn them against doing that, but I'm not going to quixotically join in with that fault, just so that I can die as well.

I mean definitely most people will not use a decision procedure like this one, so a smaller update seems very reasonable. But I suspect this reasoning still has something in common with the source of the intuition a lot of people have for blue, that they don't want to contribute to anybody else dying.

Sure, if you don't mind the blue-choosers dying then use the stable NE.

5Richard_Kennaway
There are no blue-choosers in the stable NE, so no, I don't mind at all about zero people dying.
1Roko
well they literally chose it.... maybe they are suicidal?

People are all over the place but definitely not 50/50. The qualitative solution I have will hold no matter how weak the correlation with other people's choices (for large enough values of N).

If you make the very weak assumption that some nonzero number of participants will choose blue (and you prefer to keep them alive), then this problem becomes much more like a prisoner's dilemma where the maximum payoff can be reached by coordinating to avoid the Nash equilibrium.

6Roko
There is also a moral dimension of not wanting to encourage perverse behaviour This game has a stable, dominant NE with max reward, just use that

I think optimizer-type jobs are a modest subset of all useful or non-bullshit office jobs. Many call more for creativity, or reliably executing an easy task. In some jobs, basically all the most critical tasks are new and dissimilar to previous tasks, so there's not much to optimize. There's no quick feedback loop. It's more about how reliably you can analyze the new situation correctly. 

I had an optimizing job once, setting up computers over the summer in college. It was fun. Programming is like that too. I agree that if optimizing is a big part of t... (read more)

5Going Durden
I mostly agree with you, though I noticed if a job is mostly made of constantly changing tasks that are new and dissimilar to previous tasks, there is some kind of efficiency problem up the pipeline. Its the old Janitor Problem in a different guise; a janitor at a building needs to perform a thousand small dissimilar tasks, inefficiently and often in impractical order, because the building itself was inefficiently designed. Hence why we still haven't found a way to automate a janitor, because for that we would need to redesign the very concept of a "building", and for that we would need to optimize how we build infrastructure, and for that we would have to redesign our cities from scratch... etc, until you find out we would need to build an entire new civilization from ground up to, just to replace one janitor with a robot. it still hints at a gross inefficiency in the system, just one not easily fixed.

I think one of the major purposes of selecting employees based on a college degree (aside from proving intelligence and actually learning skills) is to demonstrate ability to concentrate over extended periods (months to years) on boring or low-stimulation work, more specifically reading, writing, and calculation tasks that are close analogues of office work. A speedrun of a video game is very different. The game is designed for visual and auditory stimulation. You can clearly see when you're making progress and how much, a helpful feature for entering a fl... (read more)

2Going Durden
OTOH, I have a hunch that the kinds of jobs that select against "speed run gamer" mentality are more likely to be inefficient, or even outright bullshit jobs. In essence, speed-running is optimization, and jobs that cannot handle an optimizer are likely to either have error in the process, or error in the goal-choice, or possibly both. The admittedly small sized sample of examples where a workplace that resisted could not handle optimization  that I witnessed were because the "work" was a cover for some nefarious shenanigans, build for inefficiency for political reasons, or created for status games instead of useful work/profit.
7Ben
I think their is another important reason people are selected based on a degree. When I was at school their were a lot of people who were some combination of disruptive/annoying/violent "laddish" that made me (and others) uncomfortable, by deducting status points for niche, weird or "nerdy" interests*. A correlation (at least at my school) was that none of those people went to university, and (at least at my university) no equivalent people of that type were there. Similarly I have not met any such people in the workplace. College/university filters them out. It overlaps with class-ism to some extent. Maybe to overstate it wildly you could say that employers are trying to select so that the workplace culture is dominated by middle-class social norms.

The math doesn't necessarily work out that way. If you value the good stuff linearly, the optimal course of action will either be to spend all your resources right away (because the high discount rate makes the future too risky) or to save everything for later (because you can get such a high return on investment that spending any now would be wasteful). Even in a more realistic case where utility is logarithmic with, for example, computation, anticipation of much higher efficiency in the far future could lead to the optimal choice being to use essentially... (read more)

1Raphael Roche
I agree, finding the right balance is definitely difficult. However, the different versions of this parable of the grasshopper and the ant may not yet go far enough in subtlety. Indeed, the ants are presented as champions of productivity, but what exactly are they producing? An extreme overabundance of food that they store endlessly. This completely disproportionate and non-circulating hoarding constitutes an obvious economic aberration. Due to the lack of significant consumption and circulation of wealth, the ants' economy—primarily based on the primary sector, to a lesser extent the secondary sector, and excessive saving—while highly resilient, is far from optimal. GDP is low and grows only sluggishly. The grasshoppers, on the other hand, seem to rely on a society centered around entertainment, culture, and perhaps also education or personal services. They store little, just what they need, which can prove insufficient in the event of a catastrophe. Their economy, based on the tertiary sector and massive consumption, is highly dynamic because the wealth created circulates to the maximum, leading to exponential GDP growth. However, this flourishing economy is also very fragile and vulnerable to disasters due to the lack of sufficient reserves—no insurance mechanism, so to speak. In reality, neither the grasshoppers nor the ants behave in a rational manner. Both present two diametrically opposed and extreme economic models. Neither is desirable. Any economist or actuary would undoubtedly recommend an intermediate economy between these two extremes. The trap, stemming from a long tradition since Aesop, is to see a model in the hardworking ant and a cautionary tale in the idle cicada. If we try to set aside this bias and look at things more objectively, it actually stems from the fact that until the advent of the modern economy, societies struggled to conceive that wealth creation could be anything other than the production of goods. In other words, the tertiary

Positive externalities is a bit of an odd way to phrase it--if it's just counting up the economic value (i.e. price) of the fossil fuels, doesn't it also disregard the consumer surplus? In other words, they've demonstrated that the negative externalities of pollution outweigh the value added on the margin, but if we were to radically decrease our usage of fossil fuels then the cost of energy (especially for certain uses with no good substitute, as you discussed above) would go way up, and the tradeoff on the margin would look very different.

1Sherrinford
Yes, the statement that switching off coal-fired power plants etc. is only true at the margin. However, for the OP's question, it seems that the sign of "marginal social benefit - marginal social cost" seems crucial.

I see your point about guilt/blame, but I'm just not sure the term we use to describe the phenomenon is the problem. We've already switched terms once (from "global warming" to "climate change") to sound more neutral, and I would argue that "climate change" is about the most neutral description possible--it doesn't imply that the change is good or bad, or suggest a cause. "Accidental terraforming", on the other hand, combined two terms with opposite valence, perhaps in the intent that they will cancel out? Terraforming is supposed to describe a desirable (... (read more)

1Sable
Assigning blame doesn't fix anything; it divides people and helps bad actors accrue political power. It certainly was neutral at some point, but I don't think anyone hears "climate change" and thinks of the climate getting better for humans, at least nowadays. "Accidental Terraforming" at least suggests that we ought to be doing this on purpose, instead of unintentionally.

How would a language model determine whether it has internet access? Naively, it seems like any attempt to test for internet access is doomed because if the model generates a query, it will also generate a plausible response to that query if one is not returned by an API. This could be fixed with some kind of hard coded internet search protocol (as they presumably implemented for Bing), but without it the LLM is in the dark, and a larger or more competent model should be no more likely to understand that it has no internet access.

4gwern
That doesn't sound too hard. Why does it have to generate a query's result? Why can't it just have a convention to 'write a well-formed query, and then immediately after, write the empty string if there is no response after the query where an automated tool ran out-of-band'? It generates a query, then always (if conditioned on just the query, as opposed to query+automatic-Internet-access-generated-response) generates "", and sees it generates "", and knows it didn't get an answer. I see nothing hard to learn about that. The model could also simply note that the 'response' has very low probability of each token successively, and thus is extremely (or maybe impossible under some sampling methods) to have been stochastically sampled from itself. Even more broadly, genuine externally-sourced text could provide proof-of-work like results of multiplication: the LM could request the multiplication of 2 large numbers, get the result immediately in the next few tokens (which is almost certainly wrong if simply guessed in a single forward pass), and then do inner-monologue-style manual multiplication of it to verify the result. If it has access to tools like Python REPLs, it can in theory verify all sorts of things like cryptographic hashes or signatures which it could not possibly come up with on its own. If it is part of a chat app and is asking users questions, it can look up responses like "what day is today". And so on.

If the NRO had Sentient in 2012 then it wasn't even a deep learning system. Probably they have something now that's built from transformers (I know other government agencies are working on things like this for their own domain specific purposes). But it's got to be pretty far behind the commercial state of the art, because government agencies don't have the in house expertise or the budget flexibility to move quickly on large scale basic research.

Those are... mostly not AI problems? People like to use kitchen-based tasks because current robots are not great at dealing with messy environments, and because a kitchen is an environment heavily optimized for the specific physical and visuospatial capabilities of humans. That makes doing tasks in a random kitchen seem easy to humans, while being difficult for machines. But it isn't reflective of real world capabilities.

When you want to automate a physical task, you change the interface and the tools to make it more machine friendly. Building a roomba is ... (read more)

2Portia
But isn't this analogy flawed? Yes, humans have built dishwashers so they can be used by humans. But humans can also handle messy natural environments that have not been built for them. In fact, handling messy environment we are not familiar with, do not control and did not make is the major reason we evolve sentience and intelligence in the first place, and what makes our intelligence so impressive. Right now, I think you could trap an AI in a valley filled with jungle and mud, and even if it had access to an automated factory for producing robots as well as raw material and information, if fulfilling its goals depended on it getting out of this location because e.g. the location is cut off from the internet, I think it would earnestly struggle. Sure, humans can build an environment that an AI can handle, and an AI adapted to it. But this clearly indicates a severe limitation of the AI in reacting to novel and complex environments. A roomba cannot do what I do when I clean the house, and not just cause the engineers didn't bother. E.g. it can detect a staircase, and avoid falling down it - but it cannot actually navigate the staircase to hoover different floors, let alone use an elevator or ladder to get around, or hoover up dust from blankets that get sucked in, or bookshelves. Sure, me carrying it down only takes me seconds, it is trivial for me and hugely difficult for the robot, which is why no company would try to get it done. But I would also argue that it is really not simple for it to do; and that is despite picking a task (removing dust) that most humans, myself included, consider tedious and mindless. Regardless, a professional cleaning person that enters multiple different flats filled with trash, resistant stains and dangerous objects and carefully tidies and cleans them does something that is utterly beyond the current capabilities of AI. This is true for a lot of care/reproductive work. Which is all the more frustrating because it is work where there
Load More