Jonas Hallgren

AI Safety person currently working on multi-agent coordination problems.

Wiki Contributions

Comments

Sorted by

Any reason for the timing window being 4 hours before instead of 30 min to 1 hour? Most of the stuff I've heard is around half an hour to an hour before bed, I'm currently doing this with 0.3ish mg (I divide a 1 mg tablet in 3) of melatonin.

If you look at the Active Inference community there's a lot of work going into PPL-based languages to do more efficient world modelling but that shit ain't easy and as you say it is a lot more compute heavy.

I think there'll be a scaling break due to this but when it is algorithmically figured out again we will be back and back with a vengeance as I think most safety challenges have a self vs environment model as a necessary condition to be properly engaged. (which currently isn't engaged with LLMs wolrd modelling)

Do you have any thoughts on what this actionably means? For me it seems a bit like being able to influence such coversations is potentially a bit intractable but maybe one could host forums and events for this if one has the right network?

I think it's a good point and I'm wondering about how it actionably looks, I can see it for someone with the right contacts and so the message for people who don't have that is to create it or what are your thoughts there?

Okay, so I don't have much time to write this so bear with the quality but I thought I would say one or two things of the Yudkowsky and Wolfram discussion as someone who's at least spent 10 deep work hours trying to understand Wolfram's persepective of the world.

With some of the older floating megaminds like Wolfram and Friston who are also phycisists you have the problem that they get very caught up in their own ontology.

From the perspective of a phycisist morality could be seen as an emergent property of physical laws.

Wolfram likes to think of things in terms of computational reducibility, a way this can be described in the agent foundations frame is that the agent modelling the environment will be able to predict the world dependent on it's own speed. It's like some sort of agent-environment relativity where the information processing capacity determines the space of possible ontologies. An example of this being how if we have an intelligence that's a lot closer to operating at the speed of light, the visual field might not be a useful vector of experience to model.

Another way to say it is that there's only modelling and modelled. An intuition from this frame is that there's only differently good models of understanding specific things and so the concept of general intelligence becomes weird here.

IMO this is like the problem of the first 2 hours of the conversation, to some extent Wolfram doesn't engage with the huamn perspective as much nor any ought questions. He has a very physics floating megamind perspective.

Now, I personally believe there's something interesting to be said about an alternative hypothesis to the individual superintelligence that comes from theories of collective intelligence. If a superorganism is better at modelling something than an individual organism is then it should outcompete the others in this system. I'm personally bullish on the idea that there are certain configurations of humans and general trust-verifying networks that can outcompete individual AGI as the outer alignment functions would enforce the inner functions enough.

But, to help me understand what people mean by the NAH could you tell me what would (in your view) constitute strong evidence against the NAH? (If the fact that we can point to systems which haven't converged on using the same abstractions doesn't count)

 

Yes sir! 

So for me it is about looking at a specific type of systems or a specific type of system dynamics that encode the axioms required for the NAH to be true. 

So, it is more the claim that "there are specific set of mathematical axioms that can be used in order to get convergence towards similar ontologies and these are applicable in AI systems."

For example, if one takes the Active Inference lens on looking at concepts in the world, we generally define the boundaries between concepts as markov blankets. Suprisingly or not, markov blankets are pretty great for describing not only biological systems but also AI and some economic systems. The key underlying invariant is that these are all optimisation systems. 

p(NAH|Optimisation System).

So if we for example, with the perspective of markov blankets or the "natural latents" (which are functionals that work like markov blankets) don't see convergence in how different AI systems represent reality then I would say that the NAH has been disproven or that it is evidence against it. 

I do however think that this exists on a spectrum and that it isn't fully true or false, it is true for a restricted set of assumptions, the question being how restricted that is.

I see it more as a useful frame of viewing agent cognition processes rather than something I'm willing to bet my life on. I do think it is pointing towards a core problem similar to what ARC Theory are working on but in a different way, understanding cognition of AI systems.

Yeah, that was what I was looking for, very nice.

It does seem to verify what I was thinking with that you can't really do the same bet strategy as VCs. I do really also appreciate the thoughts in there, they seem like things one should follow, I gotta make sure to do the last due dilligence part of talking to people that have worked with others in the past, it has always felt like a lot but you're right in that one should do it.

Also, I'm considering why there isn't some sort of bet pooling network for startup founders where you have like 20 people go together and say that they will all try out ambitious projects and support each other if they fail. It's like startup insurance but from the perspective of people doing startups. Of course you have to trust the others there and stuff but I think this should work?

Okay, what I'm picking up here is that you feel that the natural abstractions hypothesis is quite trivial and that it seems like it is naively trying to say something about how cognition works similar to how physics work. Yet this is obviously not true since development in humans and other animals clearly happen in different ways, why would their mental representations converge? (Do correct me if I misunderstood)

Firstly, there's something called the good regulator theorem in cybernetics and our boy that you're talking about, Mr Wentworth, has a post on making it better that might be useful for you to understand some of the foundations of what he's thinking about. 

Okay, why is this useful preamble? Well, if there's convergence in useful ways of describing a system then there's likely some degree of internal convergence in the mind of the agent observing the problem. Essentially this is what the regulator theorem is about (imo)

So when it comes to the theory, the heavy lifting here is actually not really done by the Natural Abstractions Hypothesis part that is the convergence part but rather the Redundant Information Hypothesis

It is proving things about the distribution of environments as well as power laws in reality that makes the foundation of the theory compared to just stating that "minds will converge". 

This is at least my understanding of NAH, does that make sense or what do you think about that?

Hmm, I find that I'm not fully following here. I think "vibes" might be thing that is messing it up.

Let's look at a specific example: I'm talking to a new person at an EA-adjacent event and we're just chatting about how the last year has been. Part of the "vibing" here might be to hone in on the difficulties experienced in the last year due to a feeling of "moral responsibility", in my view vibing doesn't have to be done with only positive emotions?

I think you're bringing up a good point that commitments or struggles might be something that bring people closer than positive feelings because you're more vulnerable and open as well as broadcasting your values more. Is this what you mean with shared commitments or are you pointing at something else?

Generally fair and I used to agree, I've been looking at it from a bit of a different viewpoint recently.

If we think of a "vibe" of a conversation as a certain shared prior that you're currently inhabiting with the other person then the free association game can rather be seen as a way of finding places where your world models overlap a lot.

My absolute favourite conversations are when I can go 5 layers deep with someone because of shared inference. I think the vibe checking for shared priors is a skill that can be developed and the basis lies in being curious af.

There's apparently a lot of different related concepts in psychology about holding emotional space and other things that I think just comes down to "find the shared prior and vibe there".

No sorry, I meant from the perspective of the person with less legible skills.

Load More