All of Ben Goldhaber's Comments + Replies

what are the bottlenecks preventing 10x-100x scaling of Control Evaluations? 

  • I'm not confident in the estimates of the safety margin we get from internal only evaluations - the challenge of eliciting strong subversion performance seems very hard for getting satisfactory estimates of the subversion capability of models against control protocols.
  • I'd feel more confident if we had thousands of people trying to create red-team models, while thousands of blue teams propose different monitoring methods, and control protocols.
  • The type of experiments describe
... (read more)

I think more leaders of orgs should be trying to shape their organizations incentives and cultures around the challenges of "crunch time". Examples of this include:

  • What does pay look like in a world where cognitive labor is automated in the next 5 to 15 years? Are there incentive structures (impact equity, actual equity, bespoke deals for specific scenarios) that can help team members survive, thrive, and stay on target?
  • What cultural norms should the team have to AI assisted work? On the one hand it seems necessary to accelerate safety progress, on the oth
... (read more)
4Guive
Some kind of payment for training data from applications like MSFT rewind does seem fair. I wonder if there will be a lot of growth in jobs where your main task is providing or annotating training data. 

This post was one of my first introductions to davidad's agenda and convinced me that while yes it was crazy, it was maybe not impossible, and it led me to working on initiatives like the multi-author manifesto you mentioned. 

Thank you for writing it!

I would be very excited to see experiments with ABMs where the agents model fleets of research agents and tools. I expect in the near future we can build pipelines where the current fleet configuration - which should be defined in something like the terraform configuration language - automatically generates an ABM which is used for evaluation, control, and coordination experiments.

  • Cumulative Y2K readiness spending was approximately $100 billion, or about $365 per U.S. resident.
  • Y2K spending started as early 1995, and appears t peaked in 1998 and 1999 at about $30 billion per year.

https://www.commerce.gov/sites/default/files/migrated/reports/y2k_1.pdf

Ah gotcha, yes lets do my $1k against your $10k.

3Zac Hatfield-Dodds
Locked in! Whichever way this goes, I expect to feel pretty good about both the process and the outcome :-)

Given your rationale I'm onboard for 3 or more consistent physical instances of the lock have been manufactured. 

Lets 'lock' it in. 

2Zac Hatfield-Dodds
Nice! I look forward to seeing how this resolves. Ah, by 'size' I meant the stakes, not the number of locks - did you want to bet the maximum $1k against my $10k, or some smaller proportional amount?

@Raemon works for me; and I agree with the other conditions.

2Zac Hatfield-Dodds
I think we're agreed then, if you want to confirm the size? Then we wait for 2027!

This seems mostly good to me, thank you for the proposals (and sorry for my delayed response, this slipped my mind).

OR less than three consistent physical instances have been manufactured. (e.g. a total of three including prototypes or other designs doesn't count) 

Why this condition? It doesn't seem relevant to the core contention, and if someone prototyped a single lock using a GS AI approach but didn't figure out how to manufacture it at scale, I'd still consider it to have been an important experiment.

Besides that, I'd agree to the above conditions!

6Zac Hatfield-Dodds
I don't think that a thing you can only manufacture once is a practically usable lock; having multiple is also practically useful to facilitate picking attempts and in case of damage - imagine that a few hours into an open pick-this-lock challenge, someone bent a part such that the key no longer opens the lock. I'd suggest resolving neutral in this case as we only saw an partial attempt. Other conditions: * I think it's important that the design could have at least a thousand distinct keys which are non-pickable. It's fine if the theoretical keyspace is larger so long as the verified-secure keyspace is large enough to be useful, and distinct keys/locks need not be manufactured so long as they're clearly possible. * I expect the design to be available in advance to people attempting to pick the lock, just as the design principles and detailed schematics of current mechanical locks are widely known - security through obscurity would not demonstrate that the design is better, only that as-yet-secret designs are harder to exploit. I nominate @raemon as our arbiter, if both he and you are willing, and the majority vote or nominee of the Lightcone team if Raemon is unavailable for some reason (and @habryka approves that).
  • (8) won't be attempted, or will fail at some combination of design, manufacture, or just-being-pickable.  This is a great proposal and a beautifully compact crux for the overall approach. 

I agree with you that this feels like a 'compact crux' for many parts of the agenda. I'd like to take your bet, let me reflect if there's any additional operationalizations or conditioning.

However, I believe that the path there is to extend and complement current techniques, including empirical and experimental approaches alongside formal verification - whatever

... (read more)
6Zac Hatfield-Dodds
quick proposals: * I win at the end of 2026, if there has not been a formally-verified design for a mechanical lock, OR the design does not verify it cannot be mechanically picked, OR less than three consistent physical instances have been manufactured. (e.g. a total of three including prototypes or other designs doesn't count) * You win if at the end of 2027, there have been credible and failed expert attempts to pick such a lock (e.g. an open challenge at Defcon). I win if there is a successful attempt. * Bet resolves neutral, and we each donate half our stakes to a mutally-agreed charity, if it's unclear whether production actually happened, or there were no credible attempts to pick a verified lock. * Any disputes resolved by the best judgement of an agreed-in-advance arbiter; I'd be happy with the LessWrong team if you and they also agree.

I agree with this, I'd like to see AI Safety scale with new projects. A few ideas I've been mulling:

- A 'festival week' bringing entrepreneur types and AI safety types together to cowork from the same place, along with a few talks and lot of mixers.
- running an incubator/accelerator program at the tail end of a funding round, with fiscal sponsorship and some amount of operational support. 
- more targeted recruitment for specific projects to advance important parts of a research agenda.

 

It's often unclear to me whether new projects should actually... (read more)

First off thank you for writing this, great explanation.

  • Do you anticipate acceleration risks from developing the formal models through an open, multilateral process? Presumably others could use the models to train and advance the capabilities of their own RL agents. Or is the expectation that regulation would accompany this such that only the consortium could use the world model?
  • Would the simulations be exclusively for 'hard science' domains - ex. chemistry, biology - or would simulations of human behavior,  economics, and politics also be needed? My
... (read more)
6Gabin
* The formal models don't need to be open and public, and probably shouldn't be. Of course this adds a layer of difficulty, since it is harder to coordinate on an international scale and invite a lot of researchers to help on your project when you also want some protection against your model being stolen or published on the internet. It is perhaps okay if it is open source in the case where it is very expensive to train a model in this simulation and no other group can afford it. * Good question. I don't know, and I don't think that I have a good model of what the simulation would look like. Here is what my (very simplified, probably wrong) model of Davidad would say: * We only want to be really sure that the agent is locally nice. In particular, we want the agent to not hurt people (or perhaps only if we can be sure that there are good reasons, for example if they were going to hurt someone). The agent should not hurt them with weapons, or by removing the oxygen, or by increasing radiations. For that, we need to find a mathematical model of human boundaries, and then we need to formally verify that these boundaries will be respected. Since the agent is trained using time-bounded RL, after a short period of time it will not have any effect anymore on the world (if time-bounded RL indeed works), and the shareholders will be able to determine if the policy had a good impact on the world or not, and if not, train another agent and/or change the desiderata and/or improve the formal model. That's why it is more important to have a fine model of chemistry and physics, and we can do with a coarser model of economics and politics. In particular, we should not simulate millions of people. * Is it reasonable? I don't know, and until I see this mathematical model of human boundaries, or a very convincing prototype, I'll be a bit skeptical.

This seems like an important crux to me, because I don't think greatly slowing AI in the US would require new federal laws. I think many of the actions I listed could be taken by government agencies who over-interpret their existing mandates given the right political and social climate. For instance, the eviction moratorium during COVID, obviously should have required congressional action, but was done by fiat through an over-interpretation of authority by an executive branch agency. 

What they do or do not do seems mostly dictated by that socio-political climate, and by the courts, which means less veto points for industry.

I agree that competition with China is a plausible reason regulation won't happen; that will certainly be one of the arguments advanced by industry and NatSec as to why it should not be throttled. However, I'm not sure, and currently don't think it will, be stronger than the protectionist impulses,. Possibly it will exacerbate the "centralization" of AI dynamic that I listed in the 'licensing' bullet point, where large existing players receive money and de-facto license to operate in certain areas and then avoid others (as memeticimagery points out). So fo... (read more)

2James_Miller
Greatly slowing AI in the US would require new federal laws meaning you need the support of the Senate, House, presidency, courts (to not rule unconstitutional) and bureaucracy (to actually enforce).  If big tech can get at least one of these five power centers on its side, it can block meaningful change.

hah yes - seeing that great post from johnwentsworth inspired me to review my own thinking on RadVac. Ultimately I placed a lower estimate on RadVac being effective - or at least effective enough to get me to change my quarantine behavior - such that the price wasn't worth it, but I think I get a rationality demerit for not investing more in the collaborative model building (and collaborative purchasing) part of the process.

I'm sorry I didn't see this response until now - thank you for the detailed answer!

I'm guessing your concern feels similar to ones you've articulated in the past around... "heart"/"grounded" rationality, or a concern about "disabling pieces of the epistemic immune system". 

I'm curious if 8 mo's later you feel you can better speak to what you see as the crucial misunderstanding?

Out of curiosity what's one of your more substantive disagreements with Thiel?

Forecast - 25 mins

  • I thought it was more likely that in the short run there could be a preference cascade among top AGI researchers, and as others have mentioned due to the operationalization of top AGI researchers might be true already.
  • If this doesn't become a majority concern by 2050, I expect it will be because of another AI Winter, and I tried to have my distribution reflect that (a little hamfistedly).

Thanks for posting this. I recently reread the Fountainhead, which I similarly enjoyed and got more out of than did my teenage self - it was like a narrative, emotional portrayal of the ideals in Marc Andreessen's It's Time to Build essay.

I interpreted your section on The Conflict as the choice between voice and exit.

The larger scientific question was related to Factored Cognition, and getting a sense of the difficulty of solving problems through this type of "collaborative crowdsourcing". The hope was running this experiment would lead to insights that could then inform the direction of future experiments, in the way that you might fingertip feel your way around an unknown space to get a handle on where to go next. For example if it turned out to be easy for groups to execute this type of problem solving, we might push ahead with competitions between teams t... (read more)

2DirectedEvolution
Thanks for that thorough answer!   All projects are forms of learning. I find that much of my learning time is consumed by two related tasks: 1. Familiarizing myself with the reference materials. Examples: reading the textbook, taking notes on a lecture, asking questions during a lecture. 2. Creating a personalized meta-reference to distill and organize the reference materials so that it'll be faster and easier to re-teach myself in the future. Examples: highlighting textbook material that I expect I won't remember and crossing out explanations I no longer need, re-formatting concepts learned in a math class into a unified presentation format, deciding which concepts need to be made into flash cards. Those steps seem related to the challenges and strategies you encountered in this project. We know that students forget much of what they learn, despite their best efforts. I think it's wiser not to try hard to remember everything, but instead to "plan to forget" and create personalized references so that it's easy to re-teach yourself later when the need arises. I wish that skill were more emphasized in the school system. I think we put too much emphasis on trying to make students work harder and memorize better and "de-stress," and too little on helping students create a carefully thought-out system of notes and references and practice material that will be useful to them later on. The process of creating really good notes will also serve as a useful form of practice and a motivating tool. I find myself much more inclined to study if I've done this work, and I do in fact retain concepts much better if I've put in this work. Your project sounds like an interesting approach to tackle a related challenge. I'd be especially interested to hear about any efforts you make to tease out the differences between work that's divided between different people, and work that's divided between different "versions of you" at different times.

Thanks, rewrote and tried to clarify. In essence the researchers were testing transmission of "strategies" for using a tool, where an individual was limited in what they could transmit to the next user, akin to this relay experiment.

In fact they found that trying to convey causal theories could undermine the next person's performance; they speculate that it reduced experimentation prematurely.

3ESRogs
Better now, thanks!

Thanks for posting this. Why did you invest in those three startups in particular? Was it the market, the founders, personal connections? And was it a systematic search for startups to invest in, or more of an "opportunity-arose" situation?

3ESRogs
These were all personal connections / opportunity-arose situations. The closest I've done to a systematic search was once asking someone who'd done a bunch of angel investments if there were any he'd invested in who were looking for more money and whom he was considering investing more in. That was actually my first angel investment (Pantelligent) and it ended up not working out. (But of course that's the median expected outcome.) (The other two that I invested in that are not still going concerns were AgoraFund and AlphaSheets. Both of those were through personal connections as well.)

I know Ozzie has been thinking about this, because we were chatting about how to use an Alfred workflow to post to it. Which I think would be great!

I've spent a fair bit of time in the forecasting space playing w/ different tools, and I never found one that I could reliably use for personal prediction tracking.

Ultimately for me it comes down to:

1.) Friction: the predictions I'm most interested in tracking are "5-second-level" predictions - "do I think this person is right", "is the fact that I have a cough and am tired a sign that I'm getting sick" etc. - and I need to be able to jot that down quickly.

2.) "Routine": There are certain sites that a... (read more)

5ozziegooen
For those reading, the main thing I'm optimizing Foretold for right now, is for forecasting experiments and projects with 2-100 forecasters. The spirit of making "quick and dirty" questions for personal use conflicts a bit with that of making "well thought out and clear" questions for group use. The latter are messy to change, because it would confuse everyone involved. Note that Foretold does support full probability distributions with the guesstimate-like syntax, which prediction book doesn't. But it's less focused on the quick individual use case in general. If there are recommendations for simple ways to make it better for individuals; maybe other workflows, I'd be up for adding some support or integrations.
3Raemon
Is there an option for foretold to become Very Low Friction somehow? I agree with the "5 second level predictions" thing being a key issue.

The commerce clause gives the federal government broad powers to regulate interstate commerce, and in particular the the U.S. Secretary of Health and Human Services can exercise it to institute quarantine. https://cdc.gov/quarantine/aboutlawsregulationsquarantineisolation.html

Depression as a concept doesn't make sense to me. Why on earth would it be fitness enhancing to have a state of withdrawal, retreat, collapse where a lack of energy prevents you from trying new things? I've brainstormed a number of explanations:

    • depression as chemical imbalance: a hardware level failure has occurred, maybe randomly maybe because of an "overload" of sensation
    • depression as signaling: withdrawal and retreat from the world indicates a credible signal that I need help
    • depression as retreat: the environment has become dangerous
... (read more)
4Taran
I think you're asking too much of evolutionary theory here. Human bodies do lots of things that aren't longterm adaptive -- for example, if you stab them hard enough, all the blood falls out and they die. One could interpret the subsequent shock, anemia, etc. as having some fitness-enhancing purpose, but really the whole thing is a hard-to-fix bug in body design: if there were mutant humans whose blood more reliably stayed inside them, their mutation would quickly reach fixation in the early ancestral environment. We understand blood and wound healing well enough to know that no such mutation can exist: there aren't any small, incrementally-beneficial changes which can produce that result. In the same way, it shouldn't be confusing that depression is maladaptive; you should only be confused if it's both maladaptive and easy to improve on. Intuitively it feels like it should be -- just pick different policies -- but that intuition isn't rooted in fine-grained understanding of the brain and you shouldn't let it affect your beliefs.
2Matt Goldenberg
On a group selection level it might make lots more sense to have certain people get into states where they're very unlikely to procreate.

I rarely share ideas online (I'm working on that); when I do the ideas tend to be "small" observations or models, the type I can write out quickly and send. ~10mins - 1 day after I have it.

I've heard that Talking Heads song dozens of times and have never watched the video. I was missing out!

neat hadn't seen that thanks

I expect understanding something more explicitly - such as yours and another persons boundaries - w/o some type of underlying concept of acceptance of that boundary can increase exploitability. I recently wrote a shortform post on the topic of legibility that describes some patterns I've noticed here.

I don't think on average Circling makes one more exploitable, but I expect it increases variance, making some people significantly more exploitable than they were before because previously invisible boundaries are now visible, and can thus be attacke... (read more)

  • Yes And is an improv technique where you keep the energy in a scene alive by going w/ the other persons suggestion and adding more to it. "A: Wow is that your pet monkey? B: Yes and he's also my doctor!"
  • Yes And is generative (creates a lot of output), as opposed to Hmm No which is critical (distills output)
  • A lot of the Sequences is Hmm No
  • It's not that Hmm No is wrong, it's that it cuts off future paths down the Yes And thought-stream.
  • If there's a critical error at the beginning of a thought that will undermine everything else
... (read more)

IMO the term "amplification" fits if the scheme results in a 1.) clear efficiency gain and 2.) it's scalable. This looks like (delivering equivalent results but at a lower cost OR providing better results for an equivalent cost. (cost == $$ & time)), AND (~ O(n) scaling costs).

For example if there was a group of people who could emulate [Researcher's] fact checking of 100 claims but do it at 10x speed, then that's an efficiency gain as we're doing the same work in less time. If we pump the number to 1000 claims and the fac... (read more)

Is there not a distillation phase in forecasting? One model of the forecasting process is person A builds up there model, distills a complicated question into a high information/highly compressed datum, which can then be used by others. In my mind its:

Model -> Distill - > "amplify" (not sure if that's actually the right word)

I prefer the term scalable instead of proliferation for "can this group do it cost-effectively" as it's a similar concept to that in CS.

5ozziegooen
Distillation vs. Instillation My main point here is that distillation is doing 2 things: transitioning knowledge (from training data to a learned representation), and then compressing that knowledge.[1] The fact that it's compressed in some ways arguably isn't always particularly important; the fact that it's transferred is the main element. If a team of forecasters basically learned a signal, but did so in a very uncompressed way (like, they wrote a bunch of books about said signal), but still were somewhat cost-effective, I think that would be fine. Around "Profileration" vs. "Scaling"; I'd be curious if there are better words out there. I definitely considered scaling, but it sounds less concrete and less specific. To "proliferate" means "to generate more of", but to "scale" could mean, "to make look bigger, even if nothing is really being done." I think my cynical guess is that "instillation/proliferation" won't catch on because they are too uncommon, but also that "distillation" won't catch on because it feels like a stretch from the ML use case. Could use more feedback here. [1] Interestingly, there seem to be two distinct stages in Deep Learning that map to these two different things, according to Naftali Tishby's claims.

Thanks for including that link - seems right, and reminded me of Scott's old post Epistemic Learned Helplessness

The only difference between their presentation and mine is that I’m saying that for 99% of people, 99% of the time, taking ideas seriously is the wrong strategy

I kinda think this is true, and it's not clear to me from the outset whether you should "go down the path" of getting access to level 3 magic given the negatives.

Probably good heuristics are proceeding with caution when encountering new/out there ideas, remember... (read more)

  • Why do I not always have conscious access to my inner parts? Why, when speaking with authority figures, might I have a sudden sense of blankness.
  • Recently I've been thinking about this reaction in the frame of 'legibility', ala Seeing like a State. State's would impose organizational structures on societies that were easy to see and control - they made the society more legible - to the actors who ran the state, but these organizational structure were bad for the people in the society.
    • For example, census data, standardized weights and m
... (read more)
4Viliam
Related: Reason as memetic immune disorder I like the idea that having some parts of you protected from yourself makes them indirectly protected from people or memes who have power over you (and want to optimize you for their benefit, not yours). Being irrational is better than being transparently rational when someone is holding a gun at your head. If you could do something, you would be forced to do it (against your interests), so it's better for you if you can't. But, what now? It seems like rationality and introspection is a bit like defusing a bomb -- great if you can do it perfectly, but it kills you when you do it halfways. It reminds me of a fantasy book which had a system of magic where wizards could achieve 4 levels of power. Being known as a 3rd level wizard was a very bad thing, because all 4th level wizards were trying to magically enslave you -- to get rid of a potential competitor, and to get a powerful slave (I suppose the magical cost of enslaving someone didn't grow up proportionally to victim's level). To use an analogy, being biologically incapable of reaching 3rd level of magic might be an evolutionary advantage. But at the same time, it would prevent you from reaching the 4th level, ever.

I'd also encourage you to link your predictions to Foretold/Metaculus/other prediction aggregator questions, though only if you write your prediction in the thread as well to prevent link rot.

As a Schelling point, you can use this Foretold community which I made specifically for this thread.

I watched all of the Grandmaster level games. When playing against grandmasters the average win rate of AlphaStar across all three races was 55.25%

  • Protoss Win Rate: 78.57%
  • Terran Win Rate: 33.33%
  • Zerg Win Rate: 53.85%

Detailed match by match scoring

While I don't think that it is truly "superhuman", it is definitely competitive against top players.


https://twitter.com/esyudkowsky/status/910941417928777728

I remember seeing other claims/analysis of this but don't remember where

2ChristianKl
When EY says our community he means more then just LW but the whole rationalist diaspora as well towards which Robin Hanson can be counted.

Is the clearest "win" of a LW meme the rise of the term "virtue signaling"? On the one hand I'm impressed w/ how dominant it has become in the discourse, on the other... maybe our comparative advantage is creating really sharp symmetric weapons...

3Viliam
Do I understand it correctly that you believe the words "virtue signaling", or at least their frequent use, originates on LW? What is your evidence for this? (Do you have a link to what appears to be the first use?) In my opinion, Robin Hanson is more likely suspect, because he talks about signaling all the time. But I would not be surprised to hear that someone else used that idiom first, maybe decades ago. In other words, is there anything more than "I heard about 'virtue signaling' first on LW"?

I have a cold, which reminded me that I want fashionable face masks to catch on so that I can wear them all the time in cold-and-flu season without accruing weirdness points.

Looks like the Monkey's Paw curled a finger here ...

I'm interested, and I'd suggest using https://foretold.io for this

I'd like to see someone in this community write an extension / refinement of it to further {need-good-color-name}pill people into the LW memes that the "higher mind" is not fundamentally better than the "animal mind"

I'd agree w/ the point that giving subordinates plans and the freedom to execute them as best as they can tends to work out better, but that seems to be strongly dependent on other context, in particular the field they're working in (ex. software engineering vs. civil engineering vs. military engineering), cultural norms (ex. is this a place where agile engineering norms have taken hold?), and reward distributions (ex. does experimenting by individuals hold the potential for big rewards, or are all rewards likely to be distributed in a normal fas... (read more)

3Daniel Kokotajlo
I agree. I don't think agents will outcompete tools in every domain; indeed in most domains perhaps specialized tools will eventually win (already, we see humans being replaced by expensive specialized machinery, or expensive human specialists, lots of places). But I still think that there will be strong competitive pressure to create agent AGI, because there are many important domains where agency is an advantage.

From a 2 min brainstorm of "info products" I'd expect to be action guiding:

  • Metrics and dashboards reflecting the current state of the organization.
  • Vision statements ("what do we as an organization do and thus what things should we consider as part of our strategy")
  • Trusted advisors
  • Market forces (e.g. price's of goods)

One concrete example is from when I worked in a business intelligence role. What executives wanted was extremely trustworthy reliable data sources to track business performance over time. In a software environment ... (read more)

This seems true that there's a lot of way to utilize forecasts. In general forecasting tends to have an implicit and unstated connection to the decision making process - I think that has to do w/ the nature of operationalization ("a forecast needs to be on a very specific thing") and because much of the popular literature on forecasting has come from business literature (e.g. How to Measure Anything).

That being said I think action-guidingness is still the correct bar to meet for evaluating the effect it has on the EA community. I would bite ... (read more)

2Bird Concept
I wonder whether you have any examples, or concrete case studies, of things that were successfully action-guiding to people/organisations? (Beyond forecasts and blog-posts, though those are fine to.)
Load More