Joseph Miller

Wiki Contributions

Comments

Sorted by

The ARENA curriculum is very good.

It does seem pretty suspicious.

I'm like 98% confident this was not foul-play, partly because I doubt whatever evidence he had would be that important to the court case and obviously his death is going to draw far more attention to his view.

However, 98% is still quite worrying and I wish I could be >99% confident. I will be interested to see if there is further evidence. Given OpenAI's very shady behavior with the secret non-disparagement agreements that came out a few months, it doesn't seem completely impossible they might do this (but still very very unlikely imo).

The probability of a 26 year old dying of suicide in any given month (within the month of being named the key witness in the OpenAI copyright case, right before deposition) is roughly 1 in 100,000

This prior is a useful starting point, but you've definitely got to account for the stress of leaving OpenAI and going through a lawsuit.

(I downvoted this post for combative tone.)

One of the striking parts is that it sounds like all the pretraining people are optimistic

What's the source for this?

I started working on PhD applications about 12 days ago. I expect to have fairly polished applications for the first deadline on December 1, despite not working on this full time. So I think it's quite possible to do applications for the December 15 deadlines. You would need to contact your referees (and potential supervisors for UK universities) in the next couple of days.

There are two types of people in this world.

There are people who treat the lock on a public bathroom as a tool for communicating occupancy and a safeguard against accidental attempts to enter when the room is unavailable. For these people the standard protocol is to discern the likely state of engagement of the inner room and then tentatively proceed inside if they detect no signs of human activity.

And there are people who view the lock on a public bathroom as a physical barricade with which to temporarily defend possessed territory. They start by giving the door a hearty push to test the tensile strength of the barrier. On meeting resistance they engage with full force, wringing the handle up and down and slamming into the door with their full body weight. Only once their attempts are thwarted do they reluctantly retreat to find another stall.

Tarbell Fellowship at PPF

I think you've massively underrated this. My impression is that Tarbell has had significant effect on the general AI discourse, by allowing a number of articles to be written in mainstream outlets.

karma should also transfer automatically

Unconferences are a thing for this reason

This is fantastic technical writing. It would have taken me hours to understand these papers this deeply, but you convey the core insights quickly in an entertaining and understandable way.

If there are ‘subshards’ which achieve this desirable behavior because they, from their own perspective, ‘intrinsically’ desire power (whatever that sort of distinction makes when you’ve broken things down that far), and it is these subshards which implement the instrumental drive... so what? After all, there has to be some level of analysis at which an agent stops thinking about whether or not it should do some thing and just starts doing the thing. Your muscles “intrinsically desire” to fire when told to fire, but the motor actions are still ultimately instrumental, to accomplish something other than individual muscles twitching. You can’t have ‘instrumental desire’ homunculuses all the way down to the individual transistor or ReLU neuron.

I sent this paragraph to TurnTrout as I was curious to get his reaction. Paraphrasing his response below:

No, that's not the point. That's actually the opposite of what i'm trying to say. The subshards implement the algorithmic pieces and the broader agent has an "intrinsic desire" for power. The subshards themselves are not agentic, and that's why (in context) I substitute them in for "circuits".

It's explained in this post that I linked to. Though I guess in context I do say "prioritize" in a way that might be confusing. Shard Theory argues against homonculist accounts of cognition by considering the mechanistic effects of reinforcement processes. Also the subshards are not implementing an instrumental drive in the sense of "implementing the power-seeking behavior demanded by some broader consequentialist plan" they're just seeking power, just 'cuz.

From my early post: Inner and Outer Alignment Decompose One Hard Problem Into Two Extremely Hard Problems

I literally do not understand what the internal cognition is supposed to look like for an inner-aligned agent. Most of what I’ve read has been vague, on the level of “an inner-aligned agent cares about optimizing the outer objective.”

Charles Foster comments:

  • "We are attempting to mechanistically explain how an agent makes decisions. One proposed reduction is that inside the agent, there is an even smaller inner agent that interacts with a non-agential evaluative submodule to make decisions for the outer agent. But that raises the immediate questions of “How does the inner agent make its decisions about how to interact with the evaluative submodule?” and then “At some point, there’s gotta be some non-agential causal structure that is responsible for actually implementing decision-making, right?” and then “Can we just explain the original agent’s behavior in those terms? What is positing an externalized evaluative submodule buying us?"

Perhaps my emphasis on mechanistic reasoning and my unusual level of precision in my speculation about AI internals, perhaps these make people realize how complicated realistic cognition is in the shard picture. Perhaps people realize how much might have to go right, how many algorithmic details may need to be etched into a network so that it does what we want and generalizes well.

But perhaps people don’t realize that a network which is inner-aligned on an objective will also require a precise and conforming internal structure, and they don’t realize this because no one has written detailed plausible stabs at inner-aligned cognition.

Load More