AGIs derived from the same model are likely to collaborate more effectively than humans because their weights are identical. Any fine-tune can be applied to all members, and text produced by one can be understood by all members.
I think this only holds if fine tunes are composable, which as far as I can tell they aren't (fine tuning on one task subtly degrades performance on a bunch of other tasks, which isn't a big deal if you fine tune a little for performance on a few tasks but does mean you probably can't take a million independently-fine-tuned models and merge them into a single super model of the same size with the same performance on all million tasks).
Also there are sometimes mornings where I can't understand code I wrote the previous night when I had all of the necessary context fresh to me, despite being the same person. I expect that LLMs will exhibit the same behavior of some things being hard to understand when examined out of the context which generated them.
That's not to say a worldin which there are a billion copies of GPT-5 running concurrently will have no major changes, but I don't think a single coherent ASI falls out of that world.
If you use ublock (or adblock, or adguard, or anything else that uses EasyList syntax), you can add a custom rule
lesswrong.com##.NamesAttachedReactionsCommentBottom-footerReactionsRow
lesswrong.com##.InlineReactHoverableHighlight-highlight:remove-class(InlineReactHoverableHighlight-highlight)
which will remove the reaction section underneath comments and the highlights corresponding to those reactions.
The former of these you can also do through the element picker.
It strikes me that there's a rather strong selection effect going on here. If someone has a contrarian position, and they happen to be both articulate and correct, they will convince others and the position will become less surprising over time.
The view that psychology and sociology research has major systematic issues at a level where you should just ignore most low-powered studies is no longer considered a contrarian view.
@the gears to ascension I see you reacted "10%" to the phrase "while (overwhelmingly likely) being non-scheming" in the context of the GPT-4V-based MAIA.
Does that mean you think there's a 90% chance that MAIA, as implemented, today is actually scheming? If so that seems like a very bold prediction, and I'd be very interested to know why you predict that. Or am I misunderstanding what you mean by that react?
Do you want me to spoil it for you, do you want me to drop a hint, or do you want to puzzle it out yourself? It's a beautiful little puzzle and very satisfying to solve. Also note that the solution I found only works if you are given a graph with the structure above (i.e. every node is part of the lattice, and the lattice is fairly small in each dimension, and the lattice has edges rather than wrapping around).
Can you give a concrete example of a situation where you'd expect this sort of agreed-upon-by-multiple-parties code to be run, and what that code would be responsible for doing? I'm imagining something along the lines of "given a geographic boundary, determine which jurisdictions that boundary intersects for the purposes of various types of tax (sales, property, etc)". But I don't know if that's wildly off from what you're imagining.
Fun side note: in this particular example, it doesn't actually matter how you pick your direction. "Choose the axis closest to the target direction" performs exactly as well as "choose any edge which does not make the target node unreachable when traversed at random, and then traverse that edge" or "choose the first edge where traversing that edge does not make the target node unreachable, and traverse that edge".
Edit: at least assuming that the graph is directed
So I keep seeing takes about how to tell if LLMs are "really exhibiting goal-directed behavior" like a human or whether they are instead "just predicting the next token". And, to me at least, this feels like a confused sort of question that misunderstands what humans are doing when they exhibit goal-directed behavior.
Concrete example. Let's say we notice that Jim has just pushed the turn signal lever on the side of his steering wheel. Why did Jim do this?
The goal-directed-behavior story is as follows:
But there's an alternative story:
I think this latter framework captures some parts of human behavior that the goal-directed-behavior framework misses out on. For example, let's say the following happens
This sequence of actions is pretty nonsensical from a goal-directed-behavior perspective, but is perfectly sensible if Jim's behavior here is driven by contextual heuristics like "when it's morning and I'm next to my work's freeway offramp, I get off the freeway".
Note that I'm not saying "humans never exhibit goal-directed behavior".
Instead, I'm saying that "take a goal, and come up with a plan to achieve that goal, and execute that plan" is, itself, just one of the many contextually-activated behaviors humans exhibit.
I see no particular reason that an LLM couldn't learn to figure out when it's in a context like "the current context appears to be in the execute-the-next-step-of-the-plan stage of such-and-such goal-directed-behavior task", and produce the appropriate output token for that context.
Easier question: Let's say you have a single node in this graph of nodes. You want to figure out where that single node should be embedded in your 100-dimensional space, but you only care about its embedding location relative to a few specific other nodes.
You have the following affordances:
That is to say, if you have the following problem definition
import random
class Node:
key = None
edges = None
def __init__(self):
self.edges = []
class Edge:
_src = None
_get_dst = None
_dst = None
def __init__(self, src, get_dst):
self._src = src
self._get_dst = get_dst
def get_dst(self):
if self._dst is None:
self._dst = self._get_dst()
return self._dst
class Graph:
def __init__(self, axis_length, n_dims):
self.axis_length = axis_length
self.n_dims = n_dims
self._nodes = {}
self._next_node_id = 1
def get_node_at(self, coords):
axis_order = list(range(self.n_dims))
random.shuffle(axis_order)
if coords not in self._nodes:
node = Node()
node.key = self._next_node_id
self._next_node_id += 1
for axis in axis_order:
if coords[axis] == 0:
continue
dst_coords = list(coords)
dst_coords[axis] -= 1
dst_coords = tuple(dst_coords)
def make_edge(dst_coords):
def get_dst():
return self.get_node_at(list(coords))
return Edge(node, lambda: self.get_node_at(dst_coords))
edge = make_edge(dst_coords)
node.edges.append(edge)
self._nodes[coords] = node
return self._nodes[coords]
def get_random_node(self):
return self.get_node_at(tuple([random.randint(0, self.axis_length-1) for _ in range(self.n_dims)]))
and you want a function which will take an arbitrary node and give you the coordinates of that node in a consistent basis in finite time with arbitrarily high probability of correctness
class ComputedBasis:
def __init__(self):
self.node_positions_by_key = {}
def get_coords(node):
# Given a node, give the coordinates of that node in some
# consistent basis
pass
I claim that this is indeed possible to do, and the steps to do it look nothing like "compute things".
Edit: To be explicit about the motivation, once we define this function, we can find a path from our position to the sandwich using something like
def path_to_sandwich(my_node, sandwich_node):
basis = ComputedBasis()
my_coords = basis.get_coords(my_node)
sandwich_coords = basis.get_coords(sandwich_node)
for axis, (my_pos, sandwich_pos) in zip(my_coords, sandwich_coords):
if my_pos < sandwich_pos:
raise(f"""
Can't get to sandwich from here!
I can only travel towards the origin on each axis.
axis: {axis}
my_pos: {my_pos}
sandwich_pos: {sandwich_pos}
""")
return get_path(basis, my_node, sandwich_node)
def get_path(basis, start_node, goal_node):
curr_node = start_node
path = [curr_node]
goal_coords = basis.get_coords(goal_node)
while curr_node != goal_node:
curr_coords = basis.get_coords(curr_node)
# Find the first axis where we need to move towards the goal along that axis.
for axis, (curr_pos, goal_pos) in zip(cur_coords, goal_coords):
if curr_pos > goal_pos:
step_coords = [p for p in curr_pos]
step_coords[axis] -= 1
step_coords = tuple(step_coords)
break
for edge in curr_node.edges:
dst_node = edge.get_dst()
dst_coords = basis.get_coords(dst_node)
if dst_coords == step_coords:
step_node = dst_node
break
curr = step_node
path.append(curr)
return path
Note that my framing of the problem is slightly different, in that (0, 0, 0, ..., 0, 0, 0)
is the point from which there are no outbound edges, rather than (10, 10, 10, ..., 10, 10, 10)
in your version. Doesn't really make a difference logically, just makes the code more readable.
I'm really curious about what such fixes look like. In my experience, those edge cases tend to come about when there is some set of mutually incompatible desired properties of a system, the the mutual incompatibility isn't obvious. For example
0.1 + 0.2
should yield0.3
, not0.30000000000000004
.It turns out those are mutually incompatible requirements!
You could say "we should drop requirement 1 and use a fixed point or fraction datatype" but that's emphatically not a one line change, and has its own places where you'll run into mutually incompatible requirements.
Or you could add a "duct tape" solution like "use
printf("%.2f", result)
in the case where we actually ran into this problem, in which we know both operands have a 2 decimal precision, and revisit if this bug comes up again in a different context".