My intuition is that you got down voted for the lack of clarity about whether you're responding to me [my raising the potential gap in assessing outcomes for self-driving], or the article I referenced.
For my part, I also think that coning-as-protest is hilarious.
I'm going to give you the benefit of the doubt and assume that was your intention (and not contribute to downvotes myself.) Cheers.
It expand on what dkirmani said
- Holz was allowed to drive discussion...
- This standard set of responses meant that Holz knew ...
- Another pattern was Holz asserting
- 24:00 Discussion of Kasparov vs. the World. Holz says
Or to quote dkirmani
4 occurrences of "Holz"
To be clear, are you arguing that assuming a general AI system to be able to reason in a similar way is anthropomorphizing (invalidly)?
No, instead I'm trying to point out the contradiction inherent in your position...
On the one hand, you say things like this, which would be read as "changing an instrumental goal in order to better achieve a terminal goal"
You and I can both reason about whether or not we would be happier if we chose to pursue different goals than the ones we are now
And on the other you say
...I dislike the way that "terminal" goals are
I don't know that there is a single counter argument, but I would generalize across two groupings:
Arguments from the first group of religious people involve those who are capable of applying rationality to their belief systems, when pressed. For those, if they espouse a "god will save us" (in the physical world) then I'd suggest the best way to approach them is to call out the contradiction between their stated beliefs--e.g., Ask first "do you believe that god gave man free will?" and if so "wouldn't saving us from our bad choices obviate free will?"
That's...
One question that comes to mind is, how would you define this difference in terms of properties of utility functions? How does the utility function itself "know" whether a goal is terminal or instrumental?
I would observe that partial observability makes answering this question extraordinarily difficult. We lack interpretability tools that would give us the ability to know, with any degree of certainty, whether a set of behaviors are an expression of an instrumental or terminal goal.
Likewise, I would observe that the Orthogonality Thesis proposes the pos...
A fair point. I should have originally said "Humans do not generally think..."
Thank you for raising that exceptions are possible and that are there philosophies that encourage people to release the pursuit of happiness, focus solely internally and/or transcend happiness.
(Although, I think it is still reasonable to argue that these are alternate pursuits of "happiness", these examples drift too far into philosophical waters for me to want to debate the nuance. I would prefer instead to concede simply that there is more nuance than I originally stated.)
First, thank you for the reply.
So "being happy" or "being a utility-maximizer" will probably end up being a terminal goal, because those are unlikely to conflict with any other goals.
My understanding of the difference between a "terminal" and "instrumental" goal is that a terminal goal is something we want, because we just want it. Like wanting to be happy.
Whereas an instrumental goal is instrumental to achieving a terminal goal. For instance, I want to get a job and earn a decent wage, because the things that I want to do that make me happy cost money...
Whoever downvoted... would you do me the courtesy of expressing what you disagree with?
Did I miss some reference to public protests in the original article? (If so, can you please point me towards what I missed?)
Do you think public protests will have zero effect on self-driving outcomes? (If so, why?)
An AI can and will modify its own goals (as do we / any intelligent agent) under certain circumstances, e.g., that its current goals are impossible.
This sounds like you are conflating shift in terminal goal with introduction of new instrumental (temporary) goals.
Humans don't think "I'm not happy today, and I can't see a way to be happy, so I'll give up the goal of wanting to be happy."
Humans do think "I'm not happy today, so I'm going to quit my job, even though I have no idea how being unemployed is going to make me happier. At least I won't be made un...
These tokens already exist. It's not really creating a token like " petertodd". Leilan is a name but " Leilan" isn't a name, and the token isn't associated with the name.
If you fine tune on an existing token that has a meaning, then I maintain you're not really creating glitch tokens.
Good find. What I find fascinating is the fairly consistent responses using certain tokens, and the lack of consistent response using other tokens. I observe that in a Bayesian network, the lack of consistent response would suggest that the network was uncertain, but consistency would indicate certainty. It makes me very curious how such ideas apply to the concept of Glitch tokens and the cause of the variability in response consistency.
... I utilized jungian archetypes of the mother, ouroboros, shadow and hero as thematic concepts for GPT 3.5 to create the 510 stories.
These are tokens that would already exist in the GPT. If you fine tune new writing to these concepts, then your fine tuning will influence the GPT responses when those tokens are used. That's to be expected.
Hmmm let me try and add two new tokens to try, based on your premise.
If you want to review, ping me direct. Offer stands if you need to compare your plan against my proposal. (I didn't think that was necessary, b...
@Mazianni what do you think?
First, URLs you provided doesn't support your assertion that you created tokens, and second:
Like since its possible to create the tokens, is it possible that some researcher in OpenAI has a very peculiar reason to enable this complexity create such logic and inject these mechanics.
Occams Razor.
I think it's ideal to not predict intention by OpenAI when accident will do.
I would lean on the idea that GPT3 found these patterns and figured it would be interesting to embedd these themes into these tokens
I don't think you di...
My life is similar to @GuySrinivasan's description of his. I'm on the autism spectrum, and I found that faking it (masking) negatively impacted my relationships.
Interestingly I found that taking steps to prevent overimitation (by which I mean, presenting myself not as an expert, but as someone who is always looking for corrections whenever I make a mistake) makes me people much more willing to truly learn from me, and simultaneously, much more willing to challenge me for understanding when what I say doesn't make a lot of sense to them... this serves the d...
In a general sense (not related to Glitch tokens) I played around with something similar to the spelling task (in this article) for only one afternoon. I asked ChatGPT to show me the number of syllables per word as a parenthetical after each word.
For (1) example (3) this (1) is (1) what (1) I (1) convinced (2) ChatGPT (4) to (1) do (1) one (1) time (1).
I was working on parody song lyrics as a laugh and wanted to get the meter the same, thinking I could teach ChatGPT how to write lyrics that kept the same syllable count per 'line' of lyrics.
I stopped when C...
a properly distributed training data can be easily tuned with a smaller more robust dataset
I think this aligns with human instinct. While it's not always true, I think that humans are compelled to constantly work to condense what we know. (An instinctual byproduct of knowledge portability and knowledge retention.)
I'm reading a great book right now that talks about this and other things in neuroscience. It has some interesting insights for my work life, not just my interest in artificial intelligence.
As a for instance: I was surprised to learn that someo...
I expect you likely don't need any help with the specific steps, but I'd be happy (and interested) to talk over the steps with you.
(It seems, at a minimum, tokenize training data so that you are introducing tokens that are not included in the training data that you're training on... and do before-and-after comparisons of how the GPT responds to the intentionally created glitch token. Before, the term will be broken into its parts and the GPT will likely respond that what you said was essentially nonsense... but once a token exists for the term, without and specific training on the term... it seems like that's where 'the magic' might happen.)
related but tangential: Coning self driving vehicles as a form of urban protest
I think public concerns and protests may have an impact on the self-driving outcomes you're predicting. And since I could not find any indication in your article that you are considering such resistance, I felt it should be at least mentioned in passing.
Gentle feedback is intended
This is incorrect, and you're a world class expert in this domain.
The proximity of the subparts of this sentence read, to me, on first pass, like you are saying that "being incorrect is the domain in which you are a world class expert."
After reading your responses to O O I deduce that this is not your intended message, but I thought it might be helpful to give an explanation about how your choice of wording might be seen as antagonistic. (And also explain my reaction mark to your comment.)
For others who have not seen the reph...
You make some good points.
For instance, I did not associate "model collapse" with artificial training data, largely because of my scope of thinking about what 'well crafted training data' must look like (in order to qualify for the description 'well crafted.')
Yet, some might recognize the problem of model collapse and the relationship between artificial training data and my speculation and express a negative selection bias, ruling out my speculation as infeasible due to complexity and scalability concerns. (And they might be correct. Certainly the scope of...
Similarly, I would propose (to the article author) a hypothesis that 'glitch tokens' are tokens that were tokenized prior to pre-training but whose training data may have been omitted after tokenization. For example, after tokenizing the training data, the engineer realized upon review of the tokens to be learned that the training data content was plausibly non-useful. (e.g., the counting forum from reddit.) Then, instead of continuing with training, they skip to the next batch.
In essence, human error. (The batch wasn't reviewed before tokenization to omit...
I'm curious to know what people are down voting.
For my part, I see some potential benefits from some of the core ideas expressed here.
Aligning with the reporter
There’s a superficial way in which Sydney clearly wasn’t well-aligned with the reporter: presumably the reporter in fact wants to stay with his wife.
I'd argue that the AI was completely aligned with the reporter, but that the Reporter was self-unaligned.
My argument goes like this:
I read Reward is not the optimisation target as a result of your article. (It was a link in the 3rd bullet point, under the Assumptions section.) I downvoted that article and upvoted several people who were critical of it.
Near the top of the responses was this quote.
...... If this agent is smart/reflective enough to model/predict the future effects of its RL updates, then you already are assuming a model-based agent which will then predict higher future reward by going for the blueberry. You seem to be assuming the bizarre combination of model-based predict
I don't consent to the assumption that the judge is aligned earlier, and that we can skip over the "earlier" phase to get to the later phase where a human does the assessment.
I also don't consent to the other assumptions you've made, but the assumption about the Judge alignment training seems pivotal.
Take your pick: Fallacy of ad nauseum, or Fallacy of circular reasoning.
If N (judge 2 is aligned), then P (judge 1 is aligned), and if P then Q (agent is aligned)
ad infinitum
or
If T (alignment of the judge) implies V (alignment of the agent), a
... I apply Occam's Razor to the analysis of your post, whereby I see the problem inherent in the post as simply "if you can align the Judge correctly, then the more complex game theory framework might be unnecessary bloat."
Formally, I read your post as this is:
If P [the judge is aligned], then Q [the agent is aligned].
Therefore, it would seem to be more simply, apply P to Q to solve the problem.
But you don't really talk about judge agent alignment. It's not listed in your assumptions. The assumption that the judge is aligned has been smuggled. (A definis...
I've been working fully remotely and have meaningfully contributed to global organizations without physical presence for over a decade. I see parallels with anti-remote and anti-safety arguments.
I've observed the robust debate regarding 'return to work' vs 'remote work,' with many traditional outlets proposing 'return to work' based on a series of common criteria. I've seen 'return to work' arguments assert remote employees are lazy, unreliable or unproductive when outside the controlled work environment. I would generalize...
For my part, this is the most troubling part of the proposed project (that the article assesses, link to the project in this article, above.)
... convincing nearly 8 billion humans to adopt animist beliefs and mores is unrealistic. However, instead of seeing this state of affairs as an insurmountable dead-end, we see it as a design challenge: can we build (or rather grow) prosthetic brains that would interact with us on Nature’s behalf?
Emphasis by original author (Gaia architecture draft v2).
It reads like a a strange mix of forced religious indoctrinati...
I've ruminated about this for several days. As an outsider to the field of artificial intelligence (coming from a IT technical space, with an emphasis on telecom and large call centers which are complex systems where interpretability has long held significant value for the business org) I have my own perspective on this particular (for the sake of brevity) "problem."
For my part, I wrote a similarly sized article not for the purposes of posting, but to organize my thoughts. And then I let that sit. (I will not b...
I understand where you're going, but doctors, parents, firefighters are not possessing of 'typical godlike attributes' such as omniscience and omnipotence and a declaration of intent not to use such powers in a way that would obviate free will.
Nothing about humans saving other humans using fallible human means is remotely the same as a god changing the laws of physics to effect a miracle. And one human taking actions does not obviate the free will of another human. But when God can, through omnipotence, set up scenarios so that you have no choice at all...... (read more)