Mateusz Bagiński — LessWrong

Agent foundations, AI macrostrategy, civilizational sanity, human enhancement.

I endorse and operate by Crocker's rules.

I have not signed any agreements whose existence I cannot mention.

I guess. But I would think the bigger issue is that people don't notice.

I think "Elaborate" would be more useful (i.e., more likely to actually induce an elaboration on the point being reacted at) if its corresponding notification was grouped not with karma updates and other reacts (which people get notified about in a batch, daily, weekly, etc.), but rather with the normal notifs like "so-and-so replied to your comment/post" or "you received a message from so-and-so". (Probably the same for the new "Let's make a bet!" react.)

But this would most likely require doing something somewhat inelegant and complicated to the codebase, so it may not be worth it, atm at least.

It seems not-very-unlikely to me that, over the next few years, many major (and some non-major) world religions will develop a "Butlerian" attitude to machine intelligence: deeming it a profanity to attempt to replicate (or even to do things that have a non-negligible chance to result in replicating) all the so-far-unique capacities/properties of the human mind, and will use it to justify their support of a ban, along with the catastrophic/existential risks on which they (or some fraction of them) would agree with worried seculars.

In a sense, both human-bio-engineering and AI are (admissible to be seen by conservatively religious folks as) about "manipulating the God-given essence of humanity", which amounts to admitting that God's creation is flawed/imperfect/in need of further improvement.

The simplest general way is to buy it in whatever format and then download it from one of the well-known websites with free pdfs/mobis/epubs.

Analytic metaphysics, as far as I can tell, mostly tacitly rejects ontological cluelessness.

To give some ~examples from the analytic tradition: As far as I understand them, Brian Cantwell Smith and Nancy Cartwright espouse(d^[1]) a view somewhat adjacent to ontological cluelessness, albeit perhaps slightly stronger, in that, according to (my model of?) them, there is no final/fundamental basis of reality and it's not infinite regress either.

Somewhat more specifically, reading BCS's On the Origin of Objects (haven't read Cartwright yet) gave me the picture of a gunky-unknowable reality, where for a "part" of reality to even become a type of thing that can be known, it needs to be stabilized into a knowable object or something like that, and that process of stabilization involves parts/regions of the universe acting at a distance in a way that involves a primitive form of "aboutness" (?).

(There is some superficial semi-inconsistency in this way of talking about it, in that it describes [what it takes a not-yet-a-Thing to stabilize into a (knowable) "Thing"] in terms of knowable Things, so the growing Thing should also be knowable by transitivity or something (?). But I don't think I'm passing BCS's ITT.)

For another adjacent analyticist, Eric Schwitzgebel? https://faculty.ucr.edu/~eschwitz/SchwitzAbs/Weirdness.htm

Oh, and how could I forget The Guy Against Reality? https://en.wikipedia.org/wiki/Donald_D._Hoffman

^{^}
I just saw that BCS died 18 days ago :(.

Instrumental value could also change due to information. For instance, if David and John learn that there aren’t as many vegetarians as we expected looking to trade away sausage for mushroom, then that also updates our instrumental value for mushroom pizza.
In order for the argument to work in such situations, the contract/precommitment/self-modification will probably also need to allow for updating over time - e.g. commit to a policy rather than a fixed set of preferences.

I think this point largely recontextualizes the argument.

The preference completion procedure described in this post seems more like augmenting the options' terminal value with instrumental value, the latter being computed based on the beliefs about probabilities of future opportunities for trade/exchange. As those beliefs change (what I think we should expect/be open to, in most cases), the instrumental value changes, and so the completed terminal+instrumental value preference ordering is also updated along with it, leaving only the background skeleton of terminal preference ordering untouched.

Since the completed instrumental preference ordering is instrumental for the purpose of shifting probability mass up the poset skeleton, we get an entity that behaves like a myopic (timeslice?) expected utility maximizer, but is not trying to guard/preserve its completed instrumental preference ordering from change (but it would try to preserve its terminal poset skeleton, if given the opportunity, everything else being equal).

There might be some interesting complications if we allow the agent to take actions that can influence the probability of future trades, which would incentivize it to try to stabilize them in a state that maximizes the expected movement of the probability mass up the terminal preference poset. In that case, the case for the agent striving to complete itself into a stable EUMaximizer becomes stronger. (Although it is also the case that, in general, the ordering of probability distributions over a set induced by a partial ordering over this set is still a partial ordering, so it may not have a unique solution, in which case we're back to square one.)

presumably the Do Anything Now jailbreak

I agree with this, and I would guess that a vast majority of people speaking in favor of a coordinated ban/pause/slowdown would agree with this^[1].

An important aspect of the issue is that in the state of great uncertainty, where the actual "bright lines"/"phase transitions"/"points of no return" lie in reality, we need to take a "pragmatically conservative" stance and start with a definition of "not obviously X-risk-generating AI" that is "overly broad from God's point of view", but makes sense given our limited knowledge. Starting then, as our uncertainty gradually resolves and our understanding grows, we will be able to gradually trim/refine it, so that it becomes a better pointer at "X-risk-generating AI".

And then there is the issue of time. Generally and most likely, the longer we wait for a "more precisely accurate" definition, the closer we get to the line (/some of the lines), the less time we'll have to implement it and the more difficult it will be to implement it (e.g. because pro-all-AI-progress lobby may grow in power over time, AI parasitism or whatever might get society addicted before the ~absolute point of no return, etc).

Obviously, there are tradeoffs here, e.g., the society (or some specific groups) might get annoyed by AI labs doing various nasty things. More importantly, more precise regulations, using definitions grounded in more legible and more mature knowledge, are generally more likely to pass. But I still think that (generally) the earlier the better, certainly in such "minimal" areas or types of endeavors as awareness raising and capacity building.

^{^}
... at least when pressed towards thinking about the issue clearly.

I would define an AGI system as anything except for a Verifiably Incapable System.

This is more like "not-obviously not-AGI".

But besides that, yeah, it seems like an OK starting point for thinking about proper definitions for the purpose of a ban.

LESSWRONG
Petrov Day
LW

LESSWRONG
Petrov Day
LW

Posts

Wikitag Contributions

Comments