This is a special post for quick takes by Jonas Hallgren. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
3 comments, sorted by Click to highlight new comments since:

I thought this was an interesting take on the Boundaries problem in agent foundations from the perspective of IIT. It is on the amazing Michael Levin's youtube channel: https://www.youtube.com/watch?app=desktop&v=5cXtdZ4blKM

One of the main things that makes it interesting to me is that around 25-30 mins in, ot computationally goes through the main reason why I don't think we will have agentic behaviour from AI in at least a couple of years. GPTs just don't have a high IIT Phi value. How will it find it's own boundaries? How will it find the underlying causal structures that it is part of? Maybe this can be done through external memory but will that be enough or do we need it in the core stack of the scaling-based training loop?

A side note is that, one of the main things that I didn't understand about IIT before was how it really is about looking at meta-substrates or "signals" as Douglas Hofstadter would call them are optimally re-organising themselves to be as predictable for themselves in the future. Yet it does and it integrates really well into ActInf (at least to the extent that I currently understand it.)

Okay, so I don't have much time to write this so bear with the quality but I thought I would say one or two things of the Yudkowsky and Wolfram discussion as someone who's at least spent 10 deep work hours trying to understand Wolfram's persepective of the world.

With some of the older floating megaminds like Wolfram and Friston who are also phycisists you have the problem that they get very caught up in their own ontology.

From the perspective of a phycisist morality could be seen as an emergent property of physical laws.

Wolfram likes to think of things in terms of computational reducibility, a way this can be described in the agent foundations frame is that the agent modelling the environment will be able to predict the world dependent on it's own speed. It's like some sort of agent-environment relativity where the information processing capacity determines the space of possible ontologies. An example of this being how if we have an intelligence that's a lot closer to operating at the speed of light, the visual field might not be a useful vector of experience to model.

Another way to say it is that there's only modelling and modelled. An intuition from this frame is that there's only differently good models of understanding specific things and so the concept of general intelligence becomes weird here.

IMO this is like the problem of the first 2 hours of the conversation, to some extent Wolfram doesn't engage with the huamn perspective as much nor any ought questions. He has a very physics floating megamind perspective.

Now, I personally believe there's something interesting to be said about an alternative hypothesis to the individual superintelligence that comes from theories of collective intelligence. If a superorganism is better at modelling something than an individual organism is then it should outcompete the others in this system. I'm personally bullish on the idea that there are certain configurations of humans and general trust-verifying networks that can outcompete individual AGI as the outer alignment functions would enforce the inner functions enough.

I was going through my old stuff and I found this from a year and a half ago so I thought I would just post it here real quickly as I found the last idea funny and the first idea to be pretty interesting:

In normal business there exist consulting firms that are specialised in certain topics, ensuring that organisations can take in an outside view from experts on the topic.

This seems quite an efficient way of doing things and something that, if built up properly within alignment, could lead to faster progress down the line. This is also something that the future fund seemed to be interested in as they gave prices for both the idea of creating an org focused on creating datasets and one on taking in human feedback. These are not the only ideas that are possible, however, and below I mention some more possible orgs that are likely to be net positive. 

Examples of possible organisations:

Alignment consulting firm

Newly minted alignment researchers will probably have a while to go before they can become fully integrated into a team. One can, therefore, imagine an organisation that takes in inexperienced alignment researchers and helps them write papers. They then promote these alignment researchers as being able to help with certain things. Real orgs can then easily take them in for contracting on specific problems. This should help involve market forces in the alignment area and should in general, improve the efficiency of the space. There are reasons why consulting firms exist in real life and creating the equivalent of Mackenzie in alignment is probably a good idea. Yet I might be wrong in this and if you can argue why it would make the space less efficient, I would love to hear it. 

"Marketing firms"

We don't want the wrong information to spread, something between a normal marketing firm and the Chinese "marketing" agency, If it's an info-hazard then shut the fuck up!