All of Paul W's Comments + Replies

Paul W10

The Von Neumann-Morgenstern paradigm allows for binary utility functions, i.e. functions that are equal to 1 on some event/(measurable) set of outcomes, and to 0 on the complement. Said event could be, for instance "no global catastrophe for humanity in time period X".
Of course, you can implement some form of deontology by multiplying such a binary utility function with something like exp(- bad actions you take).

Any thoughts on this observation?
 

Paul W10

When you say "maybe we should be assembling like minded and smart people [...]", do you mean "maybe"? Or do you mean "Yes, we should definitely do that ASAP"?

Paul W20

Have you noticed that you keep encountering the same ideas over and over? You read another post, and someone helpfully points out it's just old Paul's idea again. Or Eliezer's idea. Not much progress here, move along.

Or perhaps you've been on the other side: excitedly telling a friend about some fascinating new insight, only to hear back, "Ah, that's just another version of X." And something feels not quite right about that response, but you can't quite put your finger on it.

 

Some questions regarding these contexts:

-Is it true that you can deduce that... (read more)

Paul W10

Strong upvote. Slightly worried by the fact that this wasn't written, in some form, earlier (maybe I missed a similar older post?)

I think we[1] can, and should, go even further:

 

-Find a systematic/methodical way of identifying which people are really good at strategic thinking, and help them use their skills in relevant work; maybe try to hire from outside the usual recruitment pools. 

If deemed feasible (in a short enough amount of time): train some people mainly on strategy, so as to get a supply of better strategists.

-Encourage people to s... (read more)

Paul W21

Hi! 

Have you heard of the ModelCollab  and CatColab projects ? It seems that there is an interesting overlap with what you want to do!

More generally, people at the Topos Institute work on related questions, of collaborative modelling and collective intelligence: 

 

https://topos.institute/work/collective-intelligence/

https://topos.institute/work/collaborative-modelling/

https://www.localcharts.org/t/positive-impact-of-algebraicjulia/6643

There's a website for sharing world-modelling ideas, run by Owen Lynch (who works at Topos UK) 

http... (read more)

Paul W10

Are you saying that holistic/higher-level approaches can be useful because they are very likely to be more computationally efficient/actually fit inside human brains/ do not require as much data ? 
Is that the main point, or did I miss something ?

1Quentin FEUILLADE--MONTIXI
Holistic means studying at every level. I think that mech interp is very useful for some things, and stuff like what I am pursuing (GenAI Ethology) is very useful for other things. If there is emergence, it means that we can't study the model at only one level of abstraction, we need to do it at multiple level and combine insight to be able to remotely understand and control what's happening. Additionally, I think that there is still other levels (and in between levels) of abstraction that should be pursued (like the footnote on glitch interp, and things like developmental interp but at the behavior level)
Paul W30

Hello !
These ideas seem interesting, but there's something that disturbs me: in the coin flip example, how is 3 fundamentally different from 1000 ? The way I see it, the only mathematical difference is that your "bounds" (whatever that means) are simply much worse in the case with 3 coins. Of course, I think I understand why humans/agents would want to say "the case with 3 flips is different from that with 1000", but the mathematics seem similar to me.
Am I missing something ?

3johnswentworth
That's correct, the difference is just much worse bounds, so for 3 there only exists a natural latent to within a much worse approximation.
Paul W*12

Is the field advanced enough that it would be feasible to have a guaranteed no-zero-day evaluation and deployment codebase that is competitive with a regular codebase?

As far as I know (I'm not an expert), such absolute guarantees are too hard right now, especially if the AI you're trying to verify is arbitrarily complex. However, the training process ought to yield an AI with specific properties. I'm not entirely sure I got what you meant by "a guaranteed no-zero-day evaluation and deployment codebase". Would you mind explaining more ?
 

"Or is the clai

... (read more)
2Fabien Roger
I was thinking that the formal guarantees would be about state evaluations (i.e. state -> badness bounds) - which would require sth like "showing there is no zero-day" (since "a code-base with a zero-day" might be catastrophically bad if no constraints are put on actions). Thanks for pointing out they can be about action (i.e. (state, action) -> badness bounds), which seem intuitively easier to get good bounds for (you don't need to show there are no zero-days, just that the currently considered action is extremely unlikely to exploit a potential zero-day). I'd be curious to know what kind of formal process could prove that (codebase, codebase-interaction) pairs are provably not-bad (with high probability, and with a false positive rate low enough if you trained an AI to minimize it). My guess is that there is nothing like that on the horizon (that could become competitive at all), but I could be wrong. ("let you see what program is running" was an example of a very minimal safety guarantee I would like to have, not a representative safety guarantee. My point is that I'd be surprised if people got even such a simple and easy safety guarantee anytime soon, using formal methods to check AI actions that actually do useful stuff.)
Paul W32

I believe that the current trends for formal verification, say, of traditional programs or small neural networks, are more about conservative overapproximations (called abstract interpretations). You might want to have a look at this: https://caterinaurban.github.io/pdf/survey.pdf 
To be more precise, it appears that so-called "incomplete formal methods" (3.1.1.2 in the survey I linked) are more computationally efficient, even though they can produce false negatives.
Does that answer your question ?

2Fabien Roger
Not entirely. This makes me slightly more hopeful that we can have formal guarantees of computer systems, but is the field advanced enough that it would be feasible to have a guaranteed no-zero-day evaluation and deployment codebase that is competitive with a regular codebase? (Given a budget of 1 LHC for both the codebase inefficiency tax + the time to build the formal guarantees for the codebase.) (And computer systems are easy mode, I don't even know how you would start to build guarantees like "if you say X, then it's proven that it doesn't persuade humans of things in ways they would not have approved of beforehand.")