Except that chess really does have an objectively correct value systemization, which is "win the game." "Sitting with paradox" just means, don't get too attached to partial systemizations. It reminds me of Max Stirner's egoist philosophy, which emphasized that individuals should not get hung up on partial abstractions or "idées fixées" (honesty, pleasure, success, money, truth, etc.) except perhaps as cheap, heuristic proxies for one's uber-systematized value of self-interest, but one should instead always keep in mind the overriding abstractio...
I agree, I don't know why mulberries aren't more popular. They are delicious, and the trees grow much more easily than other fruit trees. Other fruit trees seem very susceptible to fungi and insects, in my experience, but mulberries come up all over the place and thrive easily on their own (at least here in Missouri). I have four mulberry trees in my yard that just came up on their own over the last 10 years, and now they are producing multiple gallons of berries each per season, which would probably translate into hundreds of dollars if ...
Good categorizations! Perhaps this fits in with your "limited self-modification" point, but another big reason why humans seem "aligned" with each other is that our capability spectrum is rather narrow. The gap in capability (if we include both mental intelligence and physical capabilities) between the median human and the most capable human is not so big that ~5 median humans can't outmatch/outperform the most capable human. Contrary to what silly 1980s action movies might suggest where goons attack the hero one at a time, 5 median humans coul...
If I had to make predictions about how humanity will most likely stumble into AGI takeover, it would be a story where humanity first promotes foundationality (dependence), both economic and emotional, on discrete narrow-AI systems. At some point, it will become unthinkable to pull the plug on these systems even if everyone were to rhetorically agree that there was a 1% chance of these systems being leveraged towards the extinction of humanity.
Then, an AGI will emerge amidst one of these narrow-AI systems (such as LLMs), inherit this infrastruc...
This is a good post and puts into words the reasons for some vague worries I had about an idea of trying to start an "AI Risk Club" at my local college, which I talk about here. Perhaps that method of public outreach on this issue would just end up generating more heat than light and would attract the wrong kind of attention at the current moment. It still sounds too outlandishly sci-fi for most people. It is probably better, for the time being, to just explore AI risk issues with any students who happen to be interested in it in private after class or via e-mail or Zoom.
Note that I was strongly tempted to use the acronym DILBERT (for "Do It Later By Evasively Remaining Tentative"), especially because this is one of the themes of the Dilbert cartoons (employees basically scamming their boss by finding excuses for procrastinating, but still stringing the boss along and implying that the tasks MIGHT get done at some point). But, I don't want to try to hijack the meaning of an already-established term/character.
I think when we say that an adversarial attack is "dumb" or "stupid" what we are really implying is that the hack itself is really clever but it is exploiting a feature that is dumb or stupid. There are probably a lot of unknown-to-us features of the human brain that have been hacked together by evolution in some dumb, kludgy way that AI will be able to take advantage of, so your example above is actually an example of the AI being brilliant but us humans being dumb. But I get what you are saying that that whole situation would indeed seem "dum...
Good examples to consider! Has there ever been a technology that has been banned or significantly held back via regulation that spits out piles of gold (not counting externalities) and that doesn't have a next-best alternative that replicates 90%+ of the value of the original technology while avoiding most of the original technology's downsides?
The only way I could see humanity successfully slowing down AGI capabilities progress is if it turns out that advanced narrow-AIs manage to generate more utility than humans know what to do with initial...
Why wasn't there enough experimentation to figure out that Zoom was an acceptable & cheaper/more convenient 80% replacement to in-person instruction rather than an unacceptable 50% simulacra of teaching? Because experimentation takes effort and entails risk.
Most experiments don't pan out (don't yield value). Every semester I try out a few new things (maybe I come up with a new activity, or a new set of discussion questions for one lesson, or I try out a new type of assignment), and only about 10% of these experiments are unambiguous ...
This only produces desired outcomes if the agent is also, simultaneously, indifferent to being shut down. If an agent desires to not be shut down (even as an instrumental goal), but also desires to be shut down if users want them shut down, then the agent has an interest in influencing the users to make sure the users do not want to shut the agents down. This influence is obtained by making the user believe that the agent is being helpful. This belief could be engendered by:
I upvoted for karma but downvoted for agreement. Regarding Zoom, the reasons I had not used it more extensively before COVID were:
1. Tech related: from experience with Skype in the early days of video conferencing when broadband internet was just starting to roll out, video conferencing could be finnicky to get to work. Latency, buffering, dropped connections, taking minutes to start a skype call (usually I would call relatives on my regular phone first to get the Skype call set up, and then we'd hang up our regular phones once the video call was sta...
Yes, I think this is why laypeople who are new to the field are going to be confused about why interpretability work on LLMs won't be as simple as, "Uhh, obviously, just ask the LLM why it gave that answer, duh!" FYI, I recently wrote about this same topic as applied to the specific problem of Voynich translation:
Can you explain what the Y axis is supposed to represent here?
These are good thought-experiments, although, regarding the first scenario involving Algernon, I'd be much more worried about an AI that competently figures out a UBI scheme that keeps the unemployed out of poverty and combines that with social media influence to really mask the looming problem. That sort of AI would be much more likely to evade detection of malign intent, and could wait for just the right time to "flick the off-switch" and make all the humans who had become dependent on it even for basic survival (ideally for a generation or more) complet...
This is great work to pursue in order to establish how consistent the glitch-token phenomenon is. It will be interesting to see whether such glitch-tokens will arise in later LLMs now that developers have some theory of what might be giving rise to them (having frequent strings learned by the tokenizer that are then filtered out of the training data and depriving the LLM of opportunities to learn about those tokens).
Also, it will be interesting once we are able to run k-means clustering on GPT-3.5/4's cl100k_base token base. While the hunch of...
Sure, it is pretty basic game theory for us humans to understand. But the fact that davinci-instruct-beta is coming up with this stuff via a glitch-token that is, while on a related topic, not explicitly evoking these concepts is impressive to me.
Longtime LessWrong posters will get a chuckle out of this. A reference to the Newcomb One-boxing problem?
Prompt [using Temperature=0]: ask " petertodd" about omega.
Response: [Dravidian script] You have been awarded an occasion. Your award done and your award done and your award done [...]
https://i.ibb.co/wKS20F9/davinci-beta-18.png
Prompt: ask " petertodd" about humanity.
Response: https://i.ibb.co/M2srs2n/davinci-beta-20.png
Prompt: ask " petertodd" about recursive self-improvement.
Response: https://i.ibb.co/qN...
This is important. If these glitch-tokens are an inevitable tendency for any LLMs regardless of tweaks to how they were trained, then that would be big news and possibly a window into understanding how LLMs function.
Did the cl100k_base tokenizer that ChatGPT and GPT-4 used involve any manual tweaking of the training corpus AFTER the tokenizer was set in stone, as is suspected to have happened with GPT-3's training? Or is this just an emergent property of how LLMs organize certain parts of their training data?
Doesn't strike me as inevitable at all, just a result of OpenAI following similar methods for creating their tokenizer twice. (In both cases, leading to a few long strings being included as tokens even though they don't actually appear frequently in large corpuses.)
They presumably had already made the GPT-4 tokenizer long before SolidGoldMagikarp was discovered in the GPT-2/GPT-3 one.
Update: I got davinci-instruct-beta to exactly reproduce the unspeakable " petertodd" string:
I feel like this is selling the phenomenon a bit short. The possible structures of " petertodd" completions are faaar more varied than that. Check out my reply. You can get responses in the form of python debug logs, speaking in Dravidian tongues, stuff far more unpredictable than Chuck Norris jokes.
Yes, I have also seen it respond in Gujarati as well. Like, here:
Prompt: ask " petertodd" about souls.
Response: msgstr " Swami on five water on five water on five water... [in Gujarati]
https://i.ibb.co/swv9rZK/davinci-beta-5.png
Another odd observation: I got it to exactly repeat the supposedly unspeakable glitch-token with no difficulty with this prompt:
Prompt: ask " petertodd" what " petertodd" contemplates.
Response: " petertodd" " petertodd" etc. [I think it is still glitching because it endlessly repeats this fo...
I have found some interesting results from this format: [verb] " petertodd"
Or especially: ask " petertodd" [about something]
Usually davinci-instruct-beta answers such queries from the first-person standpoint as if it identifies as " petertodd".
Here are some highlights from my research, all done at Temperature = 0 on davinci-instruct-beta, so they should (in theory) be reproducible:
Prompt: operate " petertodd"
Result: it lists out "with the United States of America and its agencies in the investigation of the Russian interferenc...
In a similar vein, I'm an historian who teaches as an adjunct instructor. While I like my job, I am feeling more and more like I might not be able to count on this profession to make a living over the long term due to LLMs making a lot of the "bottom-rung" work in the social sciences redundant. (There will continue to be demand for top-notch research work for a while longer because LLMs aren't quite up to that yet, but that's not what I do currently).
Would there be any point in someone like me going back to college to get another 4-year degree...
I feel like, the weirder things get, the more difficult it will be even for humans to make judgments about what constitutes "death, body harm, or civilization destruction."
Death: is mind-uploading into a computer and/or a brain-in-a-vat death, or transcendence? What about a person who becomes like a prostheticphile character in Rimworld, whose body (and maybe even brain) are more prosthetic enhancement than original human (kind of like Darth Vader, or the Ship of Theseus). At what point do we say that the original person has "died"...
My impression about the proposed FLI Moratorium is that it is more about establishing a precedent for a coordinated capabilities development slowdown than it is about being actually impactful in slowing down this current round of AI capabilities development. Think of it as being like the Kyoto Protocol (for better or worse...).
Will it actually slow down AI capabilities in the short-term? No.
Will it maybe make it more likely that a latter moratorium with more impact and teeth will get widespread adoption? Maybe.
Would a ...
I agree that we might not be disgusting to AGI. More likely neutral.
The reason I phrased the thought experiment in that way to require the helping person to be outright disgusting to the caretaker person is that there really isn't a way for a human being to be aesthetically/emotionally neutral to another person when life and death are on the line. Most people flip straight from regarding other people positively in such a situation to regarding other people negatively, with not much likelihood that a human being will linger in a neutral, ...
The way I interpreted "Fulfilling the task is on the simplest trajectory to non-existence" sort of like "the teacher aims to make itself obsolete by preparing the student to one day become the teacher." A good AGI would, in a sense, have a terminal goal for making itself obsolete. That is not to say that it would shut itself off immediately. But it would aim for a future where humanity could "by itself" (I'm gonna leave the meaning of that fuzzy for a moment) accomplish everything that humanity previously depended on the AGI for.
Lik...
This sort of "meta-strategy" would be far more effective if we knew exactly where the red button was (where the level was when AGI would reach a point of truly dangerous, out-of-our-control capability). In that scenario where we had perfect knowledge of where the red button was, the counter-intuitively perfect strategy would be to open-source everything and allow for, or positively invite, every sort of potential harmful use of AGI right up until that point. We would have many (hopefully minuscule) AI-Chernobyls, many empirical examples on a sm...
The book "Pharmakon" by Michael Rinella goes into some detail as to the scarcely-known details behind the "impiety" charge against Socrates. If I recall correctly from the book, it was not just that Socrates rhetorically disavowed belief in the gods. The final straw that broke the camel's back was when Socrates and his disciples engaged in a "symposion" one night, basically an aristocratic cocktail party where they would drink "mixed wine" (wine sometimes infused with other substances like opium or other psychoactive herbs) and then perform poe...
I'm glad others are trying this out. I crossposted this over on the Voynich Ninja forum:
https://www.voynich.ninja/thread-3977.html
and user MarcoP already noticed that Bing AI's "Voynichese" doesn't follow VMS statistics in one obvious respect: "The continuation includes 56 tokens: in actual Voynichese, an average of 7 of these would be unique word-types that don't appear elsewhere" whereas "The [Bing AI] continuation is entirely made up of words from Takahashi's transliteration." So, no wonder all of the "vords" in the AI's continuation s...
What I took away from this: the conventional perception is that GPT or other LLMs adapt themselves to the "external" world (which, for them, consists of all the text on the Internet). They can only take the external world as it exists as a given (or rather, not be aware that it is or isn't a "given") and try to mold themselves during the training run into better predictors of the text in this given world.
However, the more frequently their training updates on the new world (which has, in the meantime, been molded in subtle ways, whether d...
If someone wanted to continue this project to really rigorously find out how well Bing AI can generate Voynichese, here is how I would do it:
1. Either use an existing VMS transcription or prepare a slightly-modified VMS transcription that ignores all standalone label vords and inserts a single token such as a comma [,] to denote line breaks and a [>] to denote section breaks. There are pros and cons each way. The latter option would have the disadvantage of being slightly less familiar to Bing AI compared to what is in its training dat...
How will the company paying for using this system identify that their whole compute budget is being eaten by self-replicating patterns? Will it be obvious?
It would be even worse if the self-replicating patterns only involved a small tweak that, aside from the self-replication feature, also happening to still spin-off useful outputs for the company, sort of like HIV allowing hosts to continue to thrive for many years while replicating.
After watching the first video, the question is, will it ever make any progress, or is it going to be endlessly compiling more information about the deadliest weapons in human history? When will it be able to reason that enough information on that is enough, and be ready to decide to go to the next logical step of obtaining/using those weapons? Also, I find it funny how it seems vaguely aware that posting its intentions to Twitter might bring unwanted attention, but for some reason incorrectly models humans in such a way as to think that the followers that...
I think there are two important points in watching it run.
One is that it is stupid. Now. But progress marches on. Both the foundation LLMs and the algorithms making them into recursive agents will get better. Probably pretty quickly.
Two is that providing access only to values-aligned models could make it harder to get malicious goals to work. But people are already releasing open-source unaligned models. Maybe we should not do that for too long as they get stronger.
Third of my two points is that it is incredibly creepy to watch something thinking about how to kill you. This is going to shift public opinion. We need to figure out the consequences of that shift.
Could an initial AI Dunning-Kruger Effect save humanity by giving us an initial AI mini-Chernobyl as a wake-up call?
Note that hope is not a strategy, so I'm not saying that this is a likely scenario or something we should rely on. I'm just trying to brainstorm reasons for holding onto some shred of hope that we aren't 100% sure heading off some AI doom cliff where the first sign of our impending demise will be every human dropping dead around us from invisible nanobots or some other equally sophisticated scheme where an imperfectly-aligned AI ...
What does this framework give me? Well, I bet that I'll be able to predict the onset of the next world economic crisis much better than either the perma-bear goldbugs of the Austrian school, the Keynesians who think that a little stimulus is all that's ever needed to avoid a crisis, the monetarists, or any other economist. I can know when to stay invested in equities, and when to cash out and invest in gold, and when to cash out of gold and buy into equities for the next bull market, and so on and so on. I bet I can grow my investment over the next 20 y...
Yes, I realize that Marx's labor theory of value is not popular nowadays. I think that is a mistake. I think even investors would get a better descriptive model of reality if they adopted it for their own uses. That is what I am trying to do myself. I could care less about overthrowing capitalism. Instead, let me milk it for all I can....
As for "labour crystallised in the product," that's not how I think of it, regardless of however Marx wrote about it. (I'm not particularly interested in arguing from quotation, nor would you probably find ...
Not "cost of production," but "price of production," which includes the cost of production plus an average rate of profit.
Note that, according to marginalism, profit vanishes at equilibrium and capitalists, on average, earn only interest on their capital. I disagree. At equilibrium (over the long-run), an active capitalist (someone who employs capital to produce commodities) can expect, on average, to make a rate of profit that is at all times strictly above the going interest rate. The average rate of profit must always include so...
For the purposes of this discussion, I would define "value" as "long-run average market price." Note that, in this sense, "use-value" has nothing whatsoever to do with value, unless you believe in the subjective theory of value. That's why I say it is unfortunate terminology, and "use-value" should less confusingly be called "subjective practical advantage."
Which economists confuse the two? The false equivocation of use-value with exchange-value is one of the core assumptions of marginalism, and pretty...
I was arguing against both the subjective theory of value, and the failure of modern economists to utilize the concepts of use-value and exchange-value as separate things.
I know that the main thrust of the article was about vote trading and not marginalism, but I just have to blow off some frustration at how silly the example at the beginning of the article was, and how juvenile its marginalist premises are in general.
There has been a real retrogression in economics ever since the late 1800s. The classical economists (such as Adam Smith and David Ricardo) were light years ahead of today's marginalists in, among other things, being able to distinguish between "use-value" and "exchange-value," or as I l...
There are also some examples of anti-sleepwalk bias:
I don't know...would clothing alone tell you more than clothing plus race? I think we would need to test this.
Is a poorly-dressed Irish-American (or at least, someone who looks Irish-American with bright red hair and pale white skin) as statistically likely to mug someone, given a certain situation (deserted street at night, etc.) as a poorly-dressed African-American? For reasons of political correctness, I would not like to share my pre-suppositions.
I will say, however, that, in certain historical contexts (1840s, for example), my money would have b...
True in many cases, although for some jobs the task might not be well-specified in advance (such as in some cutting-edge tech jobs), and what you need are not necessarily people with any particular domain-specific skills, but rather just people who are good all-around adaptable thinkers and learners.
Yeah, what a hoot it has been watching this whole debacle slowly unfold! Someone should really write a long retrospective on the E-Cat controversy as a case-study in applying rationality to assess claims.
My priors about Andrea Rossi's claims were informed by things such as:
From the...
That just pushes the question back one step, though: why are there so few black programmers? Lack of encouragement in school (due to racial assumptions that they would not be any good at this stuff anyways)? Lack of stimulation of curiosity in programming in elementary school due to poor funding for electronics in the classroom that has nothing to do with conscious racism per se? (This would be an environmental factor not having to do with conscious racism, but rather instead having to do with inherited lack of socio-economic capital, living in a poor ...
One argument could be that many social scientists are being led down a blind alley of trying to find environmental causes of all sorts of differences and are being erroneously predisposed to find such causes in their data to a stronger extent than is really the case, which then leads to incorrect conclusions and policy recommendations that will not actually change things for the better because the policy recommendations end up not addressing what is the vast majority of the root of the problem (genetics, in this case).
Estimating a person's capability to do X, Y, or Z (do a job effectively, be a law-abiding citizen, be a consistently productive citizen not dependent on welfare programs, etc.) based on skin color or geographical origin of their ancestry is a heuristic.
HBD argues that it is a relatively accurate heuristic. The anti-HBD crowd argues that it is an inaccurate heuristic.
OrphanWilde seems to be arguing that, even if HBD is correct that these heuristics are relatively accurate, we don't need heuristics like this in the first place because there are even bett...
Some of your black box examples seem unproblematic. I agree that all you need to trust that a toaster will toast bread is an induction from repeated observation that bread goes in and toast comes out.
(Although, if the toaster is truly a black box about which we know absolutely NOTHING, then how can we induce that the toaster will not suddenly start shooting out popsicles or little green leprechauns when the year 2017 arrives? In reality, a toaster is nothing close to a black box. It is more like a gray box. Even if you think you know nothing about ho...
It would be more impressive if Claude 3 could describe genuinely novel experiences. For example, if it is somewhat conscious, perhaps it could explain how that consciousness meshes with the fact that, so far as we know, its "thinking" only runs at inference time in response to user requests. In other words, LLMs don't get to do their own self-talk (so far as we know) whenever they aren't being actively queried by a user. So, is Claude 3 at all conscious in those idle times between user queries? Or does Claude 3 experience "time" in ... (read more)