All of derek shiller's Comments + Replies

In the poetry case study, we had set out to show that the model didn't plan ahead, and found instead that it did.

This is a very interesting change (?) from earlier models. I wonder if this is a poetry-specific mechanism given the amount of poetry in the training set, or the application of a more general capability. Do you have any thoughts either way?

7Adam Jermyn
I would guess that models plan in this style much more generally. It's just useful in so many contexts. For instance, if you're trying to choose what article goes in front of a word, and that word is fixed by other constraints, you need a plan of what that word is ("an astronomer" not "a astronomer"). Or you might be writing code and have to know the type of the return value of a function before you've written the body of the function, since Python type annotations come at the start of the function in the signature. Etc. This sort of thing just comes up all over the place.

Thanks for your comments!

My sense you are writing this as someone without lots of experience in writing and publishing scientific articles (correct me if I am wrong).

You're correct in that I haven't published any scientific articles -- my publication experience is entirely in academic philosophy and my suggestions are based on my frustrations there. This may be a much more reasonable proposal for academic philosophy than other disciplines, since philosophy deals more with conceptually nebulous issues and has fewer objective standards.

linearly present

... (read more)
1ashen
One recent advancement in science writing (stemming from psychology through spreading) has been the pre-registered format and pre-registration. Pre-registration often takes the form of a form - which effectively is a dialogue - where you have to answer a set of questions about your design. This forces a kind of thinking that otherwise might not happen before you run a study, which has positive outcomes in the clarity and openness of the thought processes that go into designing a study. One consequence it can highlight that often we very unclear about how we might actually properly test a theory. In the standard paper format one can get away with this more - such as through HARKING or a review process where this is not found out. This is relevant to philosophy but in psychology/science the format of running and reporting on an experiment is very standard. I was thinking of a test of a good methods and results section - it would be of sufficient clarity and detail that a LLM could take your data and description and run your analysis. Of course, one should also provide your code anyway, but it is a good test even so. So in the methods and results, then an avatar does not seem particularly helpful, unless it is effectively a more advanced version of a form. For the introduction and discussion, a different type of thinking occurs. The trend over time has been for shorter introduction and discussion sections, even though page limits have ceased to be a limiting factor. There are a few reasons for this. But I don't see this trend getting reversed. Now, interesting you say you can use an avatar to get feedback on your work and so on. You don't explicitly raise the fact that already now scientists will be using LLM's to help them write papers. So instead of framing it as an avatar helping clarify the authors thinking, what inevitably will happen in many cases is that LLM's will fill in thinking, and create novel thinking - in other words, a paper will have LLM's a co-

I don’t follow. How is it easier (or more special as an opportunity) to decide how to relate to an AI system than to a chicken or a distant human?

I think that our treatment of animals is a historical problem. If there were no animals, if everyone was accustomed to eating vegetarian meals, and then you introduced chickens into the world, I believe people wouldn't be inclined to stuff them into factory farms and eat their flesh. People do care about animals where they are not complicit in harming them (whaling, dog fighting), but it is hard for most peop... (read more)

3Dagon
Ah, I think we have a fundamental disagreement about how the majority of humans think about animals and each other.  If the world were vegetarian, and someone created chickens, I think it would NOT lead to many chickens leading happy chicken lives.  It would either be an amusing one-time lab experiment (followed by death and extinction) or the discovery that they're darned tasty and very concentrated and portable nutrition elements, which leads to creating them for the primary purpose of food. I'm not sure wireheading an AI (so it's happy no matter what) is any more (or less) acceptable than doing so to chickens (by evolving smaller brains and larger breasts).

Turing test is sentient

I'm not sure why we should think that the Turing test provides any evidence regarding consciousness. Dogs can't pass the test, but that is little reason to think that they're not conscious. Large language models might be able to pass the test before long, but it looks like they're doing something very different inside, and so the fact that they are able to hold conversations is little reason to think they're anything like us. There is a danger with being too conservative. Sure, assuming sentience may avoid causing unnecessary harm... (read more)

I'm not particularly worried that we may harm AIs that do not have valenced states, at least in the near term. The issue is more over precedent and expectations going forward. I would worry about a future in which we create and destroy conscious systems willy-nilly because of how it might affect our understanding of our relationship to them, and ultimately to how we act toward AIs that do have morally relevant states. These worries are nebulous, and I very well might be wrong to be so concerned, but it feels risky to rush into things.

We've been struggling with natural consciousnesses, both human and animal, for a long long time, and it's not obvious to me that artificial consciousness can avoid any of that pain.

You're right, but there are a couple of important differences:

  • There is widespread agreement on the status of many animals. People believe most tetrapods are conscious. The terrible stuff we do to them is done in spite of this.
  • We have a special opportunity at the start of our interactions with AI systems to decide how we're going to relate to them. It is better to get thing
... (read more)
2Dagon
Only to the extent that “conscious” doesn’t carry any weight or expectation of good treatment. There is very little agreement on what an animal’s level of consciousness means in terms of life or happiness priority compared to any human. I don’t follow. How is it easier (or more special as an opportunity) to decide how to relate to an AI system than to a chicken or a distant human? Really? Given the amount of change we’ve caused in natural creatures, the amount of effort we spend in controlling/guiding fellow humans, and the difficulty in defining and measuring this aspect of ANY creature, I can’t agree (I can’t strongly disagree either, though, because I don’t really understand what this means)

Thanks for writing this up! It's great to see all of the major categories after having thought about it for awhile. Given the convergence, does this change your outlook on the problem?

If you try to give feedback during training, there is a risk you'll just reward it for being deceptive. One advantage to selecting post hoc is that you can avoid incentivizing deception.

1Thomas
I agree with you (both). It's a framing difference iff you can translate back and forth. My thinking was that the problem might be setup so that it's "easy" to recognize but difficult to implement. If you can define a strategy which sets it up to be easy to recognize, that is.  Another way I thought about it, is that you can use your 'meta' knowledge about human imitators versus direct translators to give you a probability over all reporters. Approaching the problem not with certainty of a solution but with recognized uncertainty (I refrain from using 'known uncertainty' here, because knowing how much you don't know something is hard).  I obviously don't have a good strategy, else my name would be up above, haha. But to give it my attempt: One way to get this knowledge is to study for example the way NNs generalize and studying how likely it is that you create a direct translator versus an imitator. The proposal I send in used this knowledge and submitted a lot of reporters to random input. In the complex input space (outside the simple training set) translators should behave differently to imitators. Given that direct translators are created less frequently then imitators (which is what the report suggested), the smaller group or the outliers are more likely to be translators than imitators.  Using this approach you can build evidence. That's where I stopped, because time was up. And that is why I was interested if there were any others using this approach that might be more successful than me(if there were any, they probably would be).   EDIT to add: I agree it is difficult. But it's also difficult to have overparameterized models converge on a guaranteed single point on the ridge of all good solutions. So not having to do that would be great. But a conclusion of this contest could be that we have no other option, because we definitively have no way of recognizing  good from bad.

Interesting proposal!

Is there a reason you have it switch to the human net just once in the middle?

I would worry that the predictor might switch ontologies as time goes on. Perhaps, to make the best use of the human compute time, it reasons in a human ontology up until n/2. Once the threat of translation is past, it might switch to its own ontology from n/2 to n. If so, the encoder that works up to n/2 might be useless thereafter. A natural alternative would be to have it switch back and forth some random number of times at random intervals.

2TurnTrout
Later in the post, I proposed a similar modification: