User Comment Replies

OpenTimestamps was created by Petertodd.

This is not a coincidence because nothing is ever a coincidence.

SolidGoldMagikarp (plus, prompt generation)

I think I addressed that specifically in my comment above. The behavior is explained by a sequence like: There is a large amount of bot spammed harassment material, that goes into early GPT development, someone removes it either from reddit or just from the training data not on the basis of it mentioning the targets but based on other characteristics (like being repetitive). Then the tokens are orphaned.

Many of the other strings in the list of triggers look like they may have been UI elements or other markup removed by improved data sanit... (read more)

4gwern2y

That's a different narrative from what you were first describing: Your first narrative is unlikely for all the reasons I described that an OAer bestirred themselves to special-case you & Todd and only you and Todd for an obscure throwaway research project en route to bigger & better things, to block behavior which manifests nowhere else but only hypothetically in the early outputs of a model that they by & large weren't reading the outputs of to begin with nor were they doing much cleaning of. Now, a second narrative in which the initial tokenization has those, and then the later webscrape they describe doing on the basis of Reddit external (submitted/outbound) links with a certain number of upvotes omits all links because Reddit admins did site-wide mass deletions of the relevant and that leaves the BPEs 'orphaned' with little relevant training material, is more plausible. (As the GPT-2 paper describes it in the section I linked, they downloaded Common Crawl, and then used the live set of Reddit links, presumably Pushshift despite the 'scraped' description, to look up entries in CC, so while deleted submissions' fulltext would still be there in CC, it would be omitted if it had been deleted from Pushshift.) But there is still little evidence for it, and I still don't see how it would work, exactly: there are plenty of websites that would refer to 'gmaxwell' (such as my own comments in various places like HN), and the only way to starve GPT of all knowledge of the username 'gmaxwell' (and thus, presumably the corresponding BPE token) would be to censor all such references - which would be quite tricky, and obviously did not happen if ChatGPT can recite your bio & name. And the timeline is weird: it needs some sort of 'intermediate' dataset for the BPEs to train on which has the forbidden harassment material which will then be excluded from the 'final' training dataset when the list of URLs is compiled from the now-censored Pushshift list of positive-karma non-de

SolidGoldMagikarp II: technical details and more recent findings

gmaxwell2y10

I left a comment in the prior thread giving a wild ass guess on how I and petertodd became GPT3 basilisks.

https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation?commentId=JodWY7RvM9ZYdejtt

SolidGoldMagikarp (plus, prompt generation)

gmaxwell2y*120

Hello. I'm apparently one of the GPT3 basilisks. Quite odd to me that two of the only three (?) recognizable human names in that list are myself and Peter Todd, who is a friend of mine.

If I had to take a WAG at the behavior described here, -- both Petertodd and I have been the target of a considerable amount of harassment/defamation/schitzo comments on reddit due commercially funded attacks connected to our past work on Bitcoin. It may be possible that comments targeting us were included in an early phase of GPTn design (e.g. in th... (read more)

2mwatkins2y

The idea that tokens found closest to the centroid are those that have moved the least from their initialisations during their training (because whatever it was that caused them to be tokens was curated out of their training corpus) was originally suggested to us by Stuart Armstrong. He suggested we might be seeing something analogous to "divide-by-zero" errors with these glitches. However, we've ruled that out. Although there's a big cluster of them in the list of closest-tokens-to-centroid, they appear at all distances. And there are some extremely common tokens like "advertisement" at the same kind of distance. Also, in the gpt2-xl model, there's a tendency for them to be found as far as possible from the centroid as you see in these histograms: They show the distribution of distances-from-centroid across token sets in the three models we studied: upper histograms represent only 133 anomalous tokens, compared to the full set of 50,257 tokens in the lower histograms. The spikes above can be just seen as little bumps below, to give a sense of scale. The ' gmaxwell' token is at very close to median distance from centroid in the gpt2-small model. It's distance is 3.2602, the range is 1.5366 to 4.826. It's only moderately closer to the centroid in the gpt2-xl and gpt2-small models. The ' petertodd' token is closer to the centroid in gpt2-j (no. 74 in the closest tokens list), but pretty average-distanced in the other two models. Could the facts that ' petertodd' is of the closest tokens to the embedding centroid for at least one model, while ' gmaxwell' isn't, tell us something about why ' petertodd' produces such intensely weird outputs and ' gmaxwell' glitches in a much less remarkable way? We can't know yet, because ultimately this positional information in GPT-2 and -J embedding spaces tells us nothing about why ' gmaxwell' glitches out GPT-3 models. We don't have accessing to the GPT-3 embeddings data. Only someone with access to that at OpenAI could cla

5gwern2y

That seems highly unlikely. You can look at the GPT-1 and GPT-2 papers and see how haphazard the data-scraping & vocabulary choice were; they were far down the list of priorities (compare eg. the development of The Pile). The GPT models just weren't a big deal, and were just Radford playing around with GPUs to see what a big Transformer could do (following up earlier RNNs), and then Amodei et al scaling that up to see if it'd help their preference RL work. The GPTs were never supposed to be perfect, but as so often in computing, what was regarded as a disposable prototype turned out to have unexpected legs... They do not mention any such filtering, nor is it obvious that they would have bothered considering that GPT-2 was initially not going to be released at all, nor have I heard of any such special-purpose tailoring before (the censorship really only sets in with DALL-E 2); nor have I seen, in the large quantities of GPT-2 & GPT-3 output I have read, much in the way of spontaneous defamation of other people. Plus, if they had carefully filtered out you/Todd because of some Reddit drama, why does ChatGPT do perfectly fine when asked who you and Todd are (as opposed to the bad tokens)? The first prompt I tried: Those capsule bios aren't what I'd expect if you two had been very heavily censored out of the training data. I don't see any need to invoke special filtering here, given the existence of all the other bizarre BPEs which couldn't've been caused by any hypothetical filtering.

AALWA: Ask any LessWronger anything

gmaxwell9y30

The concerns in this space go beyond personal safety, though that isn't an insignificant one. For safety, It doesn't matter what one can prove because almost by definition anyone who is going to be dangerous is not behaving in an informed and rational way, consider the crazy person who was threatening Gwern. It's also not possible to actually prove you do not own a large number of Bitcoins-- the coins themselves are pseudonymous, and many people can not imagine that a person would willingly part with a large amount of money (or decline to take it in the f... (read more)

0[anonymous]9y

That's a fair point. There is some amount of personal risk intrinsic to being famous. In this specific case there is also certainly a political element involved which could shift the probabilities significantly. This is also fair. I more assumed that if the most obvious large quantity were destroyed it would act to significantly dissuade rational attackers. Why not go kidnap a random early Google employee instead if you don't have significant reason to believe the inventor's wealth exceeds that scale? But yes, in any case, it's not a perfect solution. I don't see it as a required logical consequence that Bitcoin matters because the inventor is unknown. It stands on its own merit. You don't have to know or not know anything about the inventor to know if the system works. I guess you're maybe assuming there's a risk the majority would amend the protocol rules to explicitly grant the inventor this power? They could theoretically do that without their True Name being known. Or perhaps there's a more basic risk that people would weigh the inventor's opinion above all and as such the inventor and protocol would be newly subject to coercion? It doesn't seem to me like this presents a real risk to the system (although perhaps increased risk to the inventor.) I think this would assume ignorance controls a majority of the interest in the system and that it's more fragile than it appears. Please correct as necessary. I put a few words in your mouth there for the sake of advancing discussion. My intuition is that this may be the most significant factor from the inventor's perspective. It is certainly a valid concern. Obviously true. Do the risks presented outweigh the potential benefits to humanity? I don't know but I think it's fair to say the identity of the creator does in fact matter-- but not necessarily to the continued functioning of Bitcoin.

0Lumifer9y

Why do you think so?

[LINK] Transcendence (2014) -- A movie about "technological singularity"

gmaxwell11y20

Now that the movie is out, how would you rate your prediction in hindsight?

2Eliezer Yudkowsky11y

I haven't seen the movie and have no intention of seeing it, but from others' reviews I'd rate it as correct.

Botworld: a cellular automaton for studying self-modifying agents embedded in their environment

gmaxwell11y90

Every ten years or so someone must reinvent Tierra (http://en.wikipedia.org/wiki/Tierra_%28computer_simulation%29).

Harry Potter and the Methods of Rationality discussion thread, part 23, chapter 94

gmaxwell12y00

Well— we're deep in the meta philosophy of a fictional world, so I'm not sure that any great insight will come from the discussion.

I'm unsure of how to resolve the apparent safety of time tuners with the idea that there is an optimization process selecting a permissible outcome unless I wave my arms and say that the optimization process is moral, perhaps borrowing the objectives of the operator (like the sorting hat). One way to do this is to note that bad things happening increase the probability of more time tuner usage, which a human-interest blind metr... (read more)

Harry Potter and the Methods of Rationality discussion thread, part 23, chapter 94

gmaxwell12y30

I'd always just assumed that whatever force imposes the time turner rules just has a simple constraint that no history is permitted where "information" travels back further, and it freely reconfigures things in potentially very high entropy ways ("DO NOT MESS WITH TIME") to achieve that end. Amelia Bones' upon time travel was replace with a spherical null-information amelia bones which had no influence from the future except that which she would not covey— including by choice— to anyone that travels outside the constraint satisfaction w... (read more)

2bogdanb12y

Off the top of my head, something that simple doesn’t seem to match with the apparent safety of time-turners. Something that just reconfigures things “freely” will reconfigure stuff dangerously on occasion. Even if the time-turner will hide the reconfiguration, people will probably notice something like “there’s bad luck around time-turners”. Note that things that appear “simple” to humans are not so at small scales. It’s much simpler for someone that time-turns to become insane or even just die rather than remain the same person except not speaking of some things. Also, “information” is tricky. At some point in one of the new chapters, Minerva notices that Harry seems different only a few minutes after having entered a closed room. Let’s assume for now it’s because he’s from the future. (E.g., the one that entered is still in the room, under the cloak, and will return after six hours to exit the room.) If she doesn’t realize it, can she still time-turn? What if she finds out something that confirms it after five minutes, is she blocked then? Is she blocked if she deduces with high certainty something about the future from the fact that Harry returned. (At the minimum, if she realizes that Harry came back, she learned that he will not die in the next six hours.) What if she turns back six hours, and in the past she learns a piece of information that allows her to deduce with whatever level of certainty both that Harry went to the future, and something about what he did there? (Example: at 1PM Harry builds a one-time pad and hides it. 12 hours later, he writes something about his present and XORs it with the one-time pad. He turns back six hours, and tells the encrypted text to Hermione, who memorizes it. Can she turn? She doesn’t really know anything more about the future than if he would have told her “I have information from six hours in the future”. But if she now turns back another six hours and finds the one-time pad, she’ll be able to obtain the information

Harry Potter and the Methods of Rationality discussion thread, part 16, chapter 85

gmaxwell13y100

are not, as a rule, a different intellectual order than we are

Yes they are— in the sense that they will have decades to spend ruminating on workarounds, experimenting, consulting with others. And when they find a solution the result is potentially an easily transmitted whole class compromise that frees them all at once.

Decades of dedicated human time, teams of humans, etc. are all forms of super-humanity. If you demanded that the same man hours be spent drafting the language as would be spent under its rule, then I'd agree that there was no differential advantage, but then it would be quite a challenge to write the rule.

7Lavode13y

also, unlike the case of an AI where you have to avoid crippling it, lest it becomes pointless to build it in the first place, using unbreakable wows as a punishment for grand crimes against humanity means that the restraints can be nearly abritarily harsh. The people writing the wows have no need to preserve the decision space they leave their victim or respect their autonomy. TLDR: Voldemort would not be able to spend decades thinking of ways around the wow, because doing so would violate any sensibly formulated wow. (stray toughts, sure, you have to permit that, or the wow kills in a day. Sitting down and working at it? No.)

3[anonymous]13y

Only in an extremely weak sense. Humans can do and think things that cats just can't, even if they think for a long, long time, or have a bunch of cats working together. The power of a truly superhuman intellect is hard to imagine, and easily underestimated. In any case, the drafter of the rules would have an enormous comparative advantage, because he can unilaterally enforce dictates on the other party, while the other party has no such authority. It's not guaranteed he'll cover all the angles within the human domain, but it's at least possible, unlike in the case of an AI, where such a strategy is basically guaranteed to fail.

How An Algorithm Feels From Inside

gmaxwell15y20

Network 1 would work just fine (ignoring how you'd go about training such a thing). Each of the N^2 edges has a weight expressing the relationship of the vertices it connects. E.g. if nodes A and B are strongly anti-correlated the weight between them might be -1. You then fix the nodes you know and then either solve the system analytically or through numerical iteration until it settles down (hopefully!) and then you have expectations for all the unknown.

Typical networks for this sort of thing don't have cycles so stability isn't a question, but that d... (read more)

LESSWRONG
LW

All of gmaxwell's Comments + Replies