All of Jan Kulveit's Comments + Replies

I think silence is a clearly sensible strategy for obvious reasons.

+1 I was really really upset safe.ai decided to use an established acronym for something very different

Answer by Jan Kulveit50

Yes, check eg https://www.lesswrong.com/posts/H5iGhDhQBtoDpCBZ2/announcing-the-alignment-of-complex-systems-research-group or https://ai.objectives.institute/ or also partially https://www.pibbss.ai/

You won't find much of this on LessWrong, due to LW being an unfavorable environment for this line of thinking.

It's probably worth noting you seem to be empirically wrong: I'm pretty confident I'd be able to do >half of human jobs, with maybe ~3 weeks of training, if I was able to understand all human languages (obviously not in parallel!) Many others here would be able to do the same.

The criterion is not as hard as it seems, because there are many jobs like cashiers or administratrative workers or assembly line workers which are not that hard to learn.

2Davidmanheim
Depends on how you define the measure over jobs. If you mean "the jobs of half of all people," probably true. If you mean "half of the distinct jobs as they are classified by NAICS or similar," I think I disagree. 

It's probably worth noting that I take the opposite update from the covid crisis: it was much easier to get governments listen to us and do marginally more sensible things than expected. With better preparation and larger resources, it would have been possible to cause order of magnitude more sensible things to happen. Also it's worth noting some governments were highly sensible and agentic about covid

Jan Kulveit*Ω230

Similary to johnswentworth: My current impression is core alignment problems are the same and manifest at all levels - often sub-human version just looks like a toy version of the scaled-up problem, and the main difference is, in the sub-human version problem, you can often solve it for practical purposes by plugging in human at some strategic spot. (While I don't think there are deep differences in the alignment problem space, I do think there are differences in the "alignment solutions" space, where you can use non-scalable solutions, or in risk space, w... (read more)

3Buck
This is what I was referring to with The superintelligence can answer any operationalizable question about human values, but as you say, it's not clear how to elicit the right operationalization.
Answer by Jan Kulveit150

Getting oriented fast in complex/messy real world situations in fields in which you are not an expert

  • For example, now, one topic to get oriented in would be COVID; I think for a good thinker, it should be achievable to have big-picture understanding of the situation comparable to a median epidemiologist after few days of research
      • Where the point isn't to get an accurate forecast of some global variable which is asked on metaculus, but gears-level model of what's going on / what are the current 'critical points' which will have outsized
... (read more)

I like the metaphor!

Just wanted to note: in my view the original LW Sequences are not functional as a stand-alone upgrade for almost any human mind, and you can empirically observe it: You can think about any LW meet-up group around the world as an experiment, and I think to a first approximation it's fair to say aspiring Rationalists running just on the Sequences do not win, and good stuff coming out of the rationalist community was critically dependent of presence of minds Eliezer & others. (This is not say Sequences are not useful in many ways)

6Matt Goldenberg
I agree with your conclusion here, but think that this is an exceptionally harsh experiment. I conjecture that basically any meetup group, no matter what source they're using, won't emperically lead to most people who attend it "winning". Either it would drive most people away because its' too intense, or it would not be focused and intense enough to actually make a difference.
Answer by Jan Kulveit*60

I basically agree with Vanessa:

the correct rule is almost always: first think about the problem yourself, then go read everything about it that other people did, and then do a synthesis of everything you learned inside your mind.

Thinking about the problem myself first often helps me understand existing work as it is easier to see the motivations, and solving solved problems is good as a training.

I would argue this is the case even in physics and math. (My background is in theoretical physics and during my high-school years I took some pride in not remember... (read more)

Epistemic status: Wild guesses based on reading del Guidice's Evolutionary psychopathology and two papers trying to explain autism in terms of predictive processing. Still maybe better than the "tower hypothesis"

0. Let's think in terms of two parametric model, where one parameter tunes something like capacity of the brain, which can be damaged due to mutations, disease, etc., and the other parameter is explained bellow.

1. Some of the genes that increase risk of autism tune some parameter of how sensory prediction is handled, specifical... (read more)

Based on

(For example, if subagents are assigned credit based on who's active when actual reward is received, that's going to be incredibly myopic -- subagents who have long-term plans for achieving better reward through delayed gratification can be undercut by greedily shortsighted agents, because the credit assignment doesn't reward you for things that happen later; much like political terms of office making long-term policy difficult.)

it seems to me you have in mind a different model than me (sorry if my description was confusing). In my v... (read more)

2Gordon Seidoh Worley
For what it's worth, I actually do expect that something like predictive processing is also going on with other systems built out of stuff that is not neurons, such as control systems that use steroids (which include hormones in animals) or RNA or other things for signaling and yet other things for determining set points and error distances. As I have mentioned, I think of living things as being in the same category as steam engine governors and thermostats, all united by the operation of control systems that locally decrease entropy and produce information. Obviously there are distinctions that are interesting and important for in various ways, but also important ways in which these distinctions are distractions from the common mechanism powering everything we care about. We can't literally call this predictive coding since that theory is about neurons and brains, so a better name with appropriate historical precedence might be something like a "cybernetic" theory of life, although unfortunately cybernetics has been cheapened over the years in ways that make that ring of hokum, so maybe there is some other way to name this idea that avoids that issue.

Why is there this stereotype that the more you can make rocket ships, the more likely you are to break down crying if the social rules about when and how you are allowed to make rocket ships are ambiguous?

This is likely sufficiently explained by the principle component of human mindspace stretching from mechanistic cognition to mentalistic cognition, does not need more explanations (https://slatestarcodex.com/2018/12/11/diametrical-model-of-autism-and-schizophrenia/)

Also I think there are multiple stereotypes of very smart people: eg Feynman or Einstein


It's not necessarily a Gordon's view/answer in his model, but my answers are

  • yes, evolution inserts these 'false predictions'; (Friston calls them fixed priors, which I think is somewhat unfortunate terminology choice)
  • if you put on Dennet's stances lense #3 (looking at systems as agents), these 'priors' are likely described as 'agents' extracting some features from the p.p. world-modelling apparatus and inserting errors accordingly; you correctly point out that in some architectures such parts would just get ig
... (read more)
4abramdemski
How does credit assignment work to determine these subagents' voting power (if at all)? I'm negative about viewing it as 'prediction with warped parts ("fixed priors"), but setting that aside, one way or the other there's the concrete question of what's actually going on at the learning algorithm level. How do you set something up which is not incredibly myopic? (For example, if subagents are assigned credit based on who's active when actual reward is received, that's going to be incredibly myopic -- subagents who have long-term plans for achieving better reward through delayed gratification can be undercut by greedily shortsighted agents, because the credit assignment doesn't reward you for things that happen later; much like political terms of office making long-term policy difficult.) I wasn't talking about parsimony because I expect the brain to be simple, but rather because a hypothesis which has a lot of extra complexity is less likely to be right. I expect human values to be complex, but still think a desire for parsimony such as sometimes motivates PP to be good in itself -- a parsimonious theory which matched observations well would be convincing in a way a complicated one would not be, even though I expect things to be complicated, because the complicated theory has many chances to be wrong.

I'm somewhat confused if you are claiming something else than Friston's notion that everything what brain is doing can be described as minimizing free energy/prediction error, this is important for understanding what human values are, and needs to be understood for ai alignment purposes.

If this is so, it sounds close to a restatement of my 'best guess of how minds work' with some in my opinion unhelpful simplification - ignoring the signal inserted into predictive processing via interoception of bodily states, which is actually importa... (read more)

2habryka
(My sense was that Abram engaged pretty extensively with the post, though I can't fully judge since I've historically bounced off of a lot of the predictive processing stuff, including your post)
4Matt Goldenberg
FWIW I'm not actually sure this is possible without you writing a sequence explaining the model. There are tooany sentences loaded with inferential distance that I couldn't cross, and didn't know the relevant places to start to begin to cross them.
3Gordon Seidoh Worley
It looks like I read your post but forgot about it. I'll have to look at it again. I am building this theory in a way that I think is highly compatible with Friston, although I also don't have a gears-level understanding of Friston, so I find it easier to think in terms of control systems which appear to offer an equivalent model to me.
9David_Kristoffersson
I expect the event to have no particular downside risks, and to give interesting input and spark ideas in experts and novices alike. Mileage will vary, of course. Unconferences foster dynamic discussion and a living agenda. If it's risky to host this event, then I expect AI strategy and forecasting meetups and discussions at EAG to be risky and they should also not be hosted. I and other attendees of AIXSU pay careful attention to potential downside risks. I also think it's important we don't strangle open intellectual advancement. We need to figure out what we should talk about; not that we shouldn't talk. AISC: To clarify: AI safety camp is different and puts bigger trust in the judgement of novices, since teams are generally run entirely by novices. The person who proposed running a strategy AISC found the reactions from experts to be mixed. He also reckoned the event would overlap with the existing AI safety camps, since they already include strategy teams. Potential negative side effects of strategy work is a very important topic. Hope to discuss it with attendees at the unconference!

[purely personal view]

It seems quite easy to imagine similarly compelling socio-political and subconscious reasons why people working on AI could be biased against short AGI timelines. For example

  • short timelines estimates make broader public agitated, which may lead to state regulation or similar interference [historical examples: industries trying to suppress info about risks]
  • researchers mostly want to work on technical problems, instead of thinking about nebulous future impacts of their work; putting more weight on short timelines would force some peo
... (read more)

I don't see why portion of a system turning into an agent would be "very unlikely". In a different perspective, if the system lives in something like an evolutionary landscape, there can be various basins of attraction which lead to sub-agent emergence, not just mesa-optimisation.

Depends on what you mean by public. While I don't think you can have good public research processes which would not run into infohazards, you can have nonpublic process which produces good public outcomes. I don't think the examples count as something public - e.g. do you see any public discussion leading to CAIS?

FWIW

  • In my experience there are infohazard/attention hazards concerns. Public strategy has likely negative expected value - if it is good, it will run into infohazards. If it is bad, it will create confusion.
  • I would expect prudent funders will not want to create parallel public strategy discussion.
Siebe100

I am not sure why you believe good strategy research always has infohazards. That's a very strong claim. Strategy research is broader than 'how should we deal with other agents'. Do you think Drexler's Reframing Superintelligence: Comprehensive AI Systems or The Unilateralist's Curse were negative expected value? Because I would classify them as public, good strategy research with a positive expected value.

Are there any specific types of infohazards you're thinking of? (E.g. informing unaligned actors, getting media attention and negative public opinion)

I had similar discussions, but I'm worried this is not a good way how to think about the situation. IMO the best part of both 'rationality' and 'effective altruism' is often the overlap - people who to a large extent belong to both communities/do not see the labels as something really important for their identity.

Systematic reasons for that may be...

Rationality asks the question "How to think clearly". For many people who start to think more clearly, this leads to an update of their goals toward the question "How we
... (read more)