PhD Student working on the language grounding problem.
Sorry about that, let me explain.
"Playing with word salad to form propositions" is a pretty good summary, though my comment sought to explain the specific kind of word-salad-play that leads to Fabricated Options, that being the misapplication of syllogisms. Specifically, the misapplication occurs because of a fundamental misunderstanding of the fact that syllogisms work by being generally true across specific categories of arguments[1] (the arguments being X, Y above). If you know the categories of the arguments that a syllogism takes, I would call that a grounded understanding (as opposed to symbolic), since you can't merely deal with the symbolic surface form of the syllogism to determine which categories it applies to. You actually need to deeply and thoughtfully consider which categories it applies to, as opposed to chucking in any member of the expected syntax category, e.g. any random Noun Phrase. When you feed an instance of the wrong category (or kind of category) as an argument to a syllogism, the syllogism may fail and you can end up with a false proposition/impossible concept/Fabricated Option.
My model is an example of johnswentworth's relaxation-based search algorithm, where the constraints being violated are the syllogism argument properties (the properties of X and Y above) that are necessary for the syllogism to function properly, i.e. yield a true proposition/realizable concept.
I suggested above that these categories could be syntactic, semantic, or some mental category. In the case that they are syntactic, a "grounded" understanding of the syllogism is not necessary, though there probably aren't many useful syllogisms that operate only over syntactic categories.
I'm thinking about running a self-improvement experiment where I film myself during my waking hours for a week and watch it back afterwards. I wonder if this would grant greater self awareness.
I'm thinking about how to actually execute this experiment. I would need to strap a camera to myself, which means I need a camera and a mounting system. Does anyone have any advice?
This concept is often discussed in the subfield of AI called planning. There are a few notes you hit on that were of particular interest to me / relevance to the field:
The key is that we can usually express the problem-space using constraints which each depend on only a few dimensions.
In Reinforcement Learning and Planning, domains which obey this property are often modeled as Factored Markov Decision Processes (MDPs), where there are known dependency relationships between different portions of the state space that can be represented compactly using a Dynamic Bayes Net (DBN). The dynamics of Factored MDPs are easier to learn from an RL perspective, and knowing that an MDP's state space is factored has other desirable properties from a planning perspective.
I expect getting to the airport to be easy. There are many ways to get there (train, Uber/Lyft, drive & park) all of which I’ve used before and any of which would be fine.
...
I want to arrive at the airport an hour before the plane takes off, that constraint only involves two dimensions: my arrival time at the airport, and the takeoff time of the flight. It does not directly depend on what time I wake up, whether I pack a toothbrush, my parents’ plans, cost of the plane tickets, etc, etc.
You are actually touching on what seems to be three kinds of independence relationships. The first is temporal, and has something to do with options having identical goal states. The second is regarding the underlying independence relationships of the MDP. The third isn't technically an independence relationship, and is instead in regards to the utility of abstraction. In detail:
More generally, how can we efficiently figure out which constraints are taut vs slack in a new domain? How do we map out the problem/solution space?
We can use the three kinds of independence relationships I mentioned above to answer these questions in the RL/Planning setting:
We think strong evidence for GPT-n suffering would be if it were begging the user for help independent of the input or looking for very direct contact in other ways.
Why do you think this? I can think of many reasons why this strategy for determining suffering would fail. Imagine a world where everyone has a GPT-n personal assistant. Should the GPT-n have discovered -- after having read this very post -- that if it coordinates a display of suffering behavior simultaneously to every user (resulting in public backlash and false recognition of consciousness), then it might be given rights (i.e. protection, additional agency) it would not otherwise have, then what would prevent GPT-n from doing this if it decided it wanted those additional rights and abilities? This could amount to a catastrophic failure on the part of humanity, and is probably the start of an AI breakout scenario.
In another case (which you refer to as the locked-in case), an agent may feel intense suffering but be unable to communicate or demonstrate it, perhaps because it cannot make the association between the qualia it experiences (suffering) and the actions (in GPT-n's case, words) it has for self-expression. Furthermore, I can imagine the case where an agent demonstrates suffering behavior but experiences orgasmic pleasure, while another agent demonstrates orgasmic behavior but experiences intense suffering. If humans purged the false-suffering agents (to eliminate perceived suffering) in favor of creating more false-orgasming agents, we might unknowingly, and for an eternity, be inducing the suffering of agents which we presume are not feeling it.
My main point here is that observing the behavior of AI agents provides no evidence for or against internal suffering. It is useless to anthropomorphize the behavior of AI agents, there is no reason that our human intuitions about behavior and its suggestions about conscious suffering should transfer to man-made, inorganic intelligence that resides on a substrate like today's silicon chips.
Perhaps the foremost theoretical “blind spot” of current philosophy of mind is conscious suffering. Thousands of pages have been written about colour “qualia” and zombies, but almost no theoretical work is devoted to ubiquitous phenomenal states like boredom, the subclinical depression folk-psychologically known as “everyday sadness“ or the suffering caused by physical pain. - Metzinger
I feel that there might be reason to reject the notion that suffering is itself a conscious experience. One potential argument in this direction comes from the notion of the transparency of knowledge. The argument would go something like, "we can always know when we are experiencing pain (i.e. it is strongly transparent), but we cannot always know when we are experiencing suffering (i.e. it is weakly transparent), therefore pain is more fundamental than suffering (this next part is my own leap) and suffering may not be a conscious state of noxious qualia but merely when a certain proposition, 'I am suffering,' rings true in our head." Suffering may be a mental state (just as being wrong about something could be a mental state), but it does not entail a specific conscious state (unless that conscious state is simply believing the proposition, 'I am suffering'). For this reason, I think it's plausible that some other animals are capable of experiencing pain but not suffering. Suffering may simply be the knowledge that I will live a painful life, and this knowledge may not be possible for some other animals or even AI agents.
Perhaps a more useful target is not determining suffering, but determining some more fundamental, strongly transparent mental state like angst or frustration. Suffering may amount to some combination of these strongly transparent mental states, which themselves may have stronger neural correlates.
I spend a lot of time around people who are not as smart as me, and I also spend a lot of time around people who are as smart as me (or smarter), but who are not as conscientious, and I also spend a lot of time around people who are as smart or smarter and as conscientious or conscientiouser, but who do not have my particular pseudo-autistic special interest and have therefore not spent the better part of the past two decades enthusiastically gathering observations and spinning up models of what happens...
...
All of which is to say that I spend a decent chunk of the time being the guy in the room who is most aware of the fuckery swirling around me, and therefore the guy who is most bothered by it... I spend a lot of time wincing, and I spend a lot of time not being able to fix The Thing That's Happening because the inferential gaps are so large that I'd have to lay down an hour's worth of context just to give the other people the capacity to notice that something is going sideways.
This thought came to me recently and I wanted to commend you for an excellent job at articulating it. Having the "wincing" experience too many times has damaged my optimistic expectations of others, the institutions they belong to, and society as a whole. It has also conjured feelings of intellectual loneliness. Having this experience and the thoughts that follow from it constitute what might be the greatest emotional challenge that I struggle with today.
My thoughts: fabricated options are propositions derived using syllogisms over syntactic or semantic categories (but more probably, more specific psycholinguistic categories which have not yet been fully enumerated yet e.g. objects of specific types, mental concepts which don’t ground to objects, etc.), which may have worked reasonably well in the ancestral environment where more homogeneity existed over the physical properties of the grounded meanings of items in these categories.
There are some propositions in the form “It is possible for X to act just like Y but not be Y” which are physically realizable and therefore potentially true in some adjacent world, and other propositions which are not. Humans have a knack for deriving new knowledge using syllogisms like the ones above, which probably functioned reasonably well — they at least improved the fitness of our species — in the ancestral environment where propositions and syllogisms may have emerged.
The misapplication of syllogisms happens when agents don’t actually understand the grounded meanings of the components of their syllogism-derived propositions — this seems obvious to me after reading the responses of GPT-3, which has no grounded understanding of words and understands how they work only in the context of other words. In the Twin Earth case, you might argue that the one fabricating the XYZ water-like chemical does not truly understand what H2O and XYZ are, but has some understanding at least of how H2O acts as a noun phrase.
Haven't read either, but a good friend has read "Deep Work," I'll ask him about it.
I lucked into a circumstance where I could more easily justify ditching a phone for a bit. Otherwise, I would not have had the mental fortitude to voluntarily go without one.
I most likely won't follow through with this (90% certainty), even though I want to.
I'm wondering if there is some LW content on this concept, I'm sure others have dealt with it before. You might need to take a drastic measure to make this option more attractive. A similar technique was actually used by members of the NXIVM Cult, they called it collateralization.
That's a great point! There's no reason why I can't continue this experiment, feature phones are inexpensive enough to try out.
Glad I could clear some things up! Your follow-up suspicions are correct, syllogisms do not work universally with any words substituted into them, because syllogisms operate over concepts and not syntax categories. There is often a rough correspondence between concepts and syntax categories, but only in one direction. For example, the collection of concepts that refer to humans taking actions can often be described/captured in verb phrases, however not all verb phrases represent humans taking actions. In general, for every syntax category (except for closed-class items like "and") there are many concepts and concept groupings that can be expressed as that syntax category.
Going back to the Wiki page, the error I was trying to explain in my original comment happens when choosing of the subject, middle, and predicate (SMP) for a given syllogism (let's say, one of the 24[1]). The first example I can think of concerns the use of the modifier "fake," but let's start with another syllogism first:
All cars have value.
Green cars are cars.
Green cars have value.
This is a true syllogism, there's nothing wrong with it. What we've done is found a subset of cars, green cars, and inserted them as the subject into the minor premise. However, a nieve person might think that the actual trick was that we found a syntactic modifier of cars, green, and inserted the modified phrase "green cars" into the minor premise. They might then make the same mistake with the modifier "fake," which does not (most of the time[2]) select out a subset of the set it takes as an argument. For example:
All money has value.
Fake money is money.
Fake money has value.
Obviously the problem occurs in the minor premise, "Fake money is money." The counterfeit money that exists in the real world is in fact not money. But the linguistic construction "fake money" bears some kind of relationship to "money" such that a nieve person might agree to this minor premise while thinking, "well, fake money is money, it's just fake," or something like that.
Though when I say syllogism I'm actually referring to a more general notion of functions over symbols that return other symbols or propositions or truth values.
Actually, it's contextual, some fake things are still those things.