ozziegooen's Shortform

ozziegooen

LESSWRONG
LW

ozziegooen's Shortform — LessWrong

222 comments, sorted by

Click to highlight new comments since: Today at 1:19 PM

A bunch of people in the AI safety landscape seem to argue "we need to stop AI progress, so that we can make progress on AI safety first."

One flip side to this is that I think it's incredibly easy for people to waste a ton of resources on "AI safety" at this point.

I'm not sure how much I trust most technical AI safety researchers to make important progress on AI safety now. And I trust most institutions a lot less.

I'd naively expect if any major country would throw $100 Billion on it today, the results would be highly underwhelming. I rarely trust these governments to make progress on concrete technologies with clear progress measures, and "AI Safety" is highly ambiguous and speculative.

As I've written about before, I think it's just hard to know what critical technical challenges will be bottlenecks around AI alignment, given that it's unclear when this will become an issue or what sorts of architectures we will have then.

All that said, slowing things down seems much safer to me. I assume that at [year(TAI) - 3] we'll have a decent idea of what's needed, and extending that duration seems like a safe bet.

I really want to see better strategic discussion about AI safety. If somehow w... (read more)

6JBlack1y

What makes you think that we're not at year(TAI)-3 right now? I'll agree that we might not be there yet, but you seem to be assuming that we can't be.

2ozziegooen1y

This is an orthogonal question. I agree that if we're there now, my claim is much less true. I'd place fairly little probability mass on this (<10%) and believe much of the rest of the community does as well, though I realize there is a subset of the LessWrong-adjacent community that does.

6TsviBT1y

Why?? What happened to the bitter lesson?

3ozziegooen1y

Can you explain this position more? I know the bitter lesson, could imagine a few ways it could have implications here.

6TsviBT1y

I'm saying that just because we know algorithms that will successfully leverage data and compute to set off an intelligence explosion (...ok I just realized you wrote TAI but IDK what anyone means by anything other than actual AGI), doesn't mean we know much about how they leverage it and how that influences the explody-guy's long-term goals.

5ozziegooen1y

I assume that current efforts in AI evals and AI interpretability will be pretty useless if we have very different infrastructures in 10 years. For example, I'm not sure how much LLM interp helps with o1-style high-level reasoning. I also think that later AI could help us do research. So if the idea is that we could do high-level strategic reasoning to find strategies that aren't specific to specific models/architectures, I assume we could do that reasoning much better with better AI.

4Knight Lee1y

I think both duration and funding are important. I agree that increasing duration has a greater impact than increasing funding. But increasing duration is harder than increasing funding. AI safety spending is only $0.1 billion while AI capabilities spending is $200 billion. Increasing funding by 10x is relatively more attainable, while increasing duration by 10x would require more of a miracle. Even if you believe that funding today isn't very useful and funding in the future is more useful, increasing funding now moves the Overton window a lot. It's hard for any government which has traditionally spent only $0.01 billion to suddenly spend $100 billion. They'll use the previous budget as an anchor point to decide the new budget. My guess is that 4x funding ≈ 2x duration.[1] 1. ^ For inventive steps, having twice as many "inventors" reduces the time to invention by half, while for engineering steps, having twice as many "engineers" doesn't help very much. (Assuming the time it takes each inventor to think of an invention is an independent exponential distribution)

8ozziegooen1y

I'm not sure if it means much, but I'd be very happy if AI safety could get another $50B from smart donors today. I'd flag that [stopping AI development] would cost far more than $50B. I'd expect that we could easily lose $3T of economic value in the next few years if AI progress seriously stopped. I guess, it seems to me like duration is basically dramatically more expensive to get than funding, for amounts of funding people would likely want.

5Knight Lee1y

I do think that convincing the government to pause AI in a way which sacrifices $3000 billion economic value, is relatively easier than directly spending $3000 billion on AI safety. Maybe spending $1 is similarly hard to sacrificing $10-$100 of future economic value via preemptive regulation.[1] But $0.1 billion AI safety spending is so ridiculously little (1000 times less than capabilities spending), increasing it may still be the "easiest" thing to do. Of course we should still push for regulation at the same time (it doesn't hurt). PS: what do you think of my open letter idea for convincing the government to increase funding? 1. ^ Maybe "future economic value" is too complicated. A simpler guesstimate would be "spending $1 is similarly hard to sacrificing $10 of company valuations via regulation."

[-]ozziegooen6y240

Questions around Making Reliable Evaluations

Most existing forecasting platform questions are for very clearly verifiable questions:

"Who will win the next election"
"How many cars will Tesla sell in 2030?"

But many of the questions we care about are much less verifiable:

"How much value has this organization created?"
"What is the relative effectiveness of AI safety research vs. bio risk research?"

One solution attempt would be to have an "expert panel" assess these questions, but this opens up a bunch of issues. How could we know how much we could trust this group to be accurate, precise, and understandable?

The topic of, "How can we trust that a person or group can give reasonable answers to abstract questions" is quite generic and abstract, but it's a start.

I've decided to investigate this as part of my overall project on forecasting infrastructure. I've recently been working with Elizabeth on some high-level research.

I believe that this general strand of work could be useful both for forecasting systems and also for the more broad-reaching evaluations that are important in our communities.

Early concrete questions in evaluation quality

One concrete topic that's easily stud

... (read more)

2romeostevensit6y

> "How much value has this organization created?" can insights from prediction markets work for helping us select better proxies and decision criteria or do we expect people to be too poorly entangled with the truth of these matters for that to work? Do orgs always require someone who is managing the ontology and incentives to be super competent at that to do well? De facto improvements here are worth billions (project management tools, slack, email add ons for assisting managing etc.)

7ozziegooen6y

I think that prediction markets can help us select better proxies, but the initial set up (at least) will require people pretty clever with ontologies. For example, say a group comes up with 20 proposals for specific ways of answering the question, "How much value has this organization created?". A prediction market could predict the outcome of the effectiveness of each proposal. I'd hope that over time people would put together lists of "best" techniques to formalize questions like this, so doing it for many new situations would be quite straightforward.

5Trinley Goldenberg6y

Another related idea we played around with, but which didn't make it into the final whitepaper: What if we just assumed that Brier score was also predictive of good judgement. Then, people, could create a distribution over several measures of "how good will this organization do" and we could use standard probability theory and aggregation tools to create an aggregated final measure.

4Trinley Goldenberg6y

The way we handled this with Verity was to pick a series of values, like "good judgement", "integrity," "consistency" etc. Then the community would select exemplars who they thought represented those values the best. As people voted on which proposals they liked best, we would weight their votes by: 1. How much other people (weighted by their own score on that value) thought they had that value. 2. How similarly they voted to the examplars. This sort of "value judgement" allows for fuzzy representation of high level judgement, and is a great supplement to more objective metrics like Brier score which can only measure well defined questions. Eigentrust++ is a great algorithm that has the properties needed for this judgement-based reputation. The Verity Whitepaper goes more into depth as to how this would be used in practice.

2romeostevensit6y

Deference networks seem underrated.

2jimrandomh6y

One way to look at this is, where is the variance coming from? Any particular forecasting question has implied sub-questions, which the predictor needs to divide their attention between. For example, given the question "How much value has this organization created?", a predictor might spend their time comparing the organization to others in its reference class, or they might spend time modeling the judges and whether they tend to give numbers that are higher or lower. Evaluation consistency is a way of reducing the amount of resources that you need to spend modeling the judges, by providing a standard that you can calibrate against. But there are other ways of achieving the same effect. For example, if you have people predict the ratio of value produced between two organizations, then if the judges consistently predict high or predict low, this no longer matters since it affects both equally.

1ozziegooen6y

Yep, good points. Ideally one could do a proper or even estimated error analysis of some kind. Having good units (like, ratios) seems pretty important.

-1zulupineapple6y

If you had a precise definition of "effectiveness" this shouldn't be a problem. E.g. if you had predictions for "will humans go extinct in the next 100 years?" and "will we go extinct in the next 100 years, if we invest 1M into AI risk research?" and "will we go extinct, if we invest 1M in bio risk research?", then you should be able to make decisions with that. And these questions should work fine in existing forecasting platforms. Their long term and conditional nature are problems, of course, but I don't think that can be helped. That's not a forecast. But if you asked "How much value will this organization create next year?" along with a clear measure of "value", then again, I don't see much of a problem. And, although clearly defining value can be tedious (and prone to errors), I don't think that problem can be avoided. Different people value different things, that can't be helped. Why would you do that? What's wrong with the usual prediction markets? Of course, they're expensive (require many participants), but I don't think a group of experts can be made to work well without a market-like mechanism. Is your project about making such markets more efficient?

3ozziegooen6y

Coming up with a precise definition is difficult, especially if you want multiple groups to agree. Those specific questions are relatively low-level; I think we should ask a bunch of questions like that, but think we may also want some more vague things as well. For example, say I wanted to know how good/enjoyable a specific movie would be. Predicting the ratings according to movie reviewers (evaluators) is an approach I'd regard as reasonable. I'm not sure what a precise definition for movie quality would look like (though I would be interested in proposals), but am generally happy enough with movie reviews for what I'm looking for. Agreed that that itself isn't a forecast, I meant in the more general case, for questions like, "How much value will this organization create next year" (as you pointed out). I probably should have used that more specific example, apologies. Can you be more explicit about your definition of "clearly"? I'd imagine that almost any proposal at a value function would have some vagueness. Certificates of Impact get around this by just leaving that for the review of some eventual judges, kind of similar to what I'm proposing. The goal for this research isn't fixing something with prediction markets, but just finding more useful things for them to predict. If we had expert panels that agreed to evaluate things in the future (for instance, they are responsible for deciding on the "value organization X has created" in 2025), then prediction markets and similar could predict what they would say.

1zulupineapple6y

My point is that "goodness" is not a thing in the territory. At best it is a label for a set of specific measures (ratings, revenue, awards, etc). In that case, why not just work with those specific measures? Vague questions have the benefit of being short and easy to remember, but beyond that I see only problems. Motivated agents will do their best to interpret the vagueness in a way that suits them. Is your goal to find a method to generate specific interpretations and procedures of measurement for vague properties like this one? Like a Shelling point for formalizing language? Why do you feel that can be done in a useful way? I'm asking for an intuition pump. Certainly there is some vagueness, but it seems that we manage to live with it. I'm not proposing anything that prediction markets aren't already doing.

1ozziegooen6y

Hm... At this point I don't feel like I have a good intuition for what you find intuitive. I could give more examples, but don't expect they would convince you much right now if the others haven't helped. I plan to eventually write more about this, and eventually hopefully we should have working examples up (where people are predicting things). Hopefully things should make more sense to you then. Short comments back<>forth are a pretty messy communication medium for such work.

1Tetraspace6y

There's something of a problem with sensitivity; if the x-risk from AI is ~0.1, and the difference in x-risk from some grant is ~10^-6, then any difference in the forecasts is going to be completely swamped by noise. (while people in the market could fix any inconsistency between the predictions, they would only be able to look forward to 0.001% returns over the next century)

1zulupineapple6y

Making long term predictions is hard. That's a fundamental problem. Having proxies can be convenient, but it's not going to tell you anything you don't already know.

1ozziegooen6y

Yea, in cases like these, having intermediate metrics seems pretty essential.

[-]ozziegooen6y*160

Experimental predictability and generalizability are correlated

A criticism to having people attempt to predict the results of experiments is that this will be near impossible. The idea is that experiments are highly sensitive to parameters and these would need to be deeply understood in order for predictors to have a chance at being more accurate than an uninformed prior. For example, in a psychological survey, it would be important that the predictors knew the specific questions being asked, details about the population being sampled, many details about the experimenters, et cetera.

One counter-argument may not be to say that prediction will be easy in many cases, but rather that if these experiments cannot be predicted in a useful fashion without very substantial amounts of time, then these experiments aren’t probably going to be very useful anyway.

Good scientific experiments produce results are generalizable. For instance, a study on the effectiveness of Malaria on a population should give us useful information (probably for use with forecasting) about the effectiveness on Malaria on other populations. If it doesn’t, then value would be limited. It would really be more of a hist

... (read more)

1NunoSempere6y

In that paragraph, did you mean to say "findings_i is correct"? *** Neat idea. I'm also not sure whether the idea is valuable because it could be implementable, or from "this is interesting because it gets us better models". In the first case, I'm not sure whether the correlation is strong enough to change any decisions. That is, I'm having trouble thinking of decisions for which I need to know the generalizability of something, and my best shot is measuring its predictability. For example, in small foretold/metaculus communities, I'd imagine that miscellaneous factors like "is this question interesting enough to the top 10% of forecasters" will just make the path predictability -> differential entropy -> generalizability difficult to detect.

2ozziegooen6y

The main point I was getting at is that the phrases: 1. Experiments are important to perform. 2. Predictors cannot decently predict the results of experiments unless they have gigantic amounts of time. Are a bit contradictory. You can choose either, but probably not both. Likewise, I'd expect that experiments that are easier to predict are ones that are more useful, which is more convenient than the other alternative. I think generally we will want to estimate importance/generality of experiments separate from their predictability.

[-]ozziegooen6y150

I was recently pointed to the Youtube channel Psychology in Seattle. I think it's one of my favorites in a while.

I'm personally more interested in workspace psychology than relationship psychology, but my impression is that they share a lot of similarities.

Emotional intelligence gets a bit of a bad rap due to the fuzzy nature, but I'm convinced it's one of the top few things for most people to get better at. I know lots of great researchers and engineers who repeat a bunch of repeated failure modes, and this causes severe organizational and personal problems as a result.

Emotional intelligence books and training typically seem quite poor to me. The alternative format here of "let's just show you dozens of hours of people interacting with each other, and point out all the fixes they could make" seems much better than most books or lectures I've seen.

This Youtube series does an interesting job at that. There's a whole bunch of "let's watch this reality TV show, then give our take on it." I'd be pretty excited about there being more things like this posted online, especially in other contexts.

Related, I think the potential of reality TV is fairly underrated in intellectual circles, but that's a different story.

https://www.youtube.com/user/PsychologyInSeattle?fbclid=IwAR3Ux63X0aBK0CEwc8yPyjsFJ2EKQ2aSMs1XOjUOgaFqlguwz6Fxul2ExJw

6Gordon Seidoh Worley6y

One of the things I love about entertainment is that much of it offers evidence about how humans behave in a wide variety of scenarios. This has gotten truer over time, at least within Anglophone media, with its trend towards realism and away from archetypes and morality plays. Yes, it's not the best possible or most reliable evidence about how real humans behave in real situations and it's a meme around here that you should be careful not to generalize from fictional evidence, but I also think it's better than nothing (I don't think reality TV is especially less fictional than other forms of entertainment with regards to how human behave, given its heavy use of editing and loose scripting to punch up situations for entertainment value).

1Rudi C6y

Nothing is a low bar though. :)

4Trinley Goldenberg6y

You might also enjoy the channel "charisma on command" which has a similar format of finding youtube videos of charismatic and non-charismatic people, and seeing what they do and don't do well.

2ozziegooen6y

Thanks! I'll check it out.

4romeostevensit6y

Novel and obviously some good ideas/directions. Thanks.

[-]ozziegooen6y*140

Namespace pollution and name collision are two great concepts in computer programming. They way they are handled in many academic environments seems quite naive to me.

Programs can get quite large and thus naming things well is surprisingly important. Many of my code reviews are primarily about coming up with good names for things. In a large codebase, every time symbolicGenerator() is mentioned, it refers to the same exact thing. If after one part of the codebase has been using symbolicGenerator for a reasonable set of functions, and later another part comes up, and it's programmer realizes that symbolicGenerator is also the best name for that piece, they have to make a tough decision. Either they could refactor the codebase to change all previous mentions of symbolicGenerator to use an alternative name, or they have to come up with an alternative name. They can't have it both ways.

Therefore, naming becomes a political process. Names touch many programmers who have different intuitions and preferences. A large refactor of naming in a section of the codebase that others use would often be taken quite hesitantly by that group.

This makes it all the more important that good names are u

... (read more)

4Gordon Seidoh Worley6y

I'm not sure if it's good or bad, but I find the way species get named interesting. The general rule is "first published name wins", and this is true even if the first published name is "wrong" in some way, like implies a relationship that doesn't exist, since that implication is not officially semantically meaningful. But there are ways to get around this, like if a name was based on a disproved phylogeny, in which case a new name can be taken up that fits the new phylogenic relationship. This means existing names get to stick, at least up until the time that they are proven so wrong that they must be replaced. Alas, there's no official registry of these things, so it's up to working researchers to do literature reviews and get the names right, and sometimes people get it wrong by accident and sometimes on purpose because they think an earlier naming is "invalid" for one reason or another and so only recognize a later naming. The result is pretty confusing and requires knowing a lot or doing a lot of research to realize that, for example, two species names might refer to the same species in different papers.

2ozziegooen6y

Thanks, I didn't know. That matches what I expect from similar fields, though it is a bit disheartening. There's an entire field of library science and taxonomy, but they seem rather isolated to specific things.

4ozziegooen6y

Another quick note on the LessWrong wiki: I'm skeptical of single definitions without disclaimers. I think it's misleading (to some) that "Truth is the correspondence between and one's beliefs about reality and reality. "[1]. Rather, it's fair to say that this is one specific definition of truth that has been used in many cases; I'm sure that others, including others on LessWrong, have used it differently. Most dictionaries have multiple definitions for words. This seems more like what we should aim for. In fairness, when I searched for "Rationality", the result states, "Rationality is something of fundamental importance to LessWrong that is defined in many ways", which I of course agree with. [1] https://wiki.lesswrong.com/wiki/Truth

3Pattern6y

At the meta-level it isn't clear what value other definitions might offer (in this case). ("Truth" seems like a basic concept that is understood prior to reading that article - it's easier to imagine such an argument for other concepts without such wide understanding.) Perhaps more definitions should be brought in (as necessary), with the same level of attention to detail - when they are used (extensively). It's possible that relevant posts have already been made, they just haven't been integrated into the wiki. Is the wiki up to date as of 2012, but not after that?

3Pattern6y

Footnote not found. The refactoring sounds like a good idea, though the main difficulty would be propagating the new names.

3ozziegooen6y

Thanks for point that out! I forgot the specific note, removed the [1]. I definitely would agree that refactoring would be difficult, especially if we haven't figured out a great refactoring process.

2Trinley Goldenberg6y

One of the issues with this in both an academic and LW context is that changing the name of something in a single source of truth codebase is much cheaper than changing the name of something in a community. The more popular an idea, the more cost goes up to change the name. Similarly, when you're working with a single organization, creating a process that everyone follows is relatively cheap compared to a loosely tied together community with various blogs, individuals, and organizations coining their own terms.

3ozziegooen6y

Yep, I'd definitely agree that it's harder. That said, this doesn't mean that it's not high-ev to improve on. One outcome could be that we should be more careful introducing names, as it is difficult to change them. Another would be to work to attempt to have formal ways of changing them after, even though it is difficult (It would be worthwhile in some cases, I assume).

6Trinley Goldenberg6y

In a recent thread about changing the name of Solstice to Solstice Advent, Oliver Habryka estimated it would cost at least $100,000 to make that happen. This seems like a reasonable estimate to me, and a good lower bound for how much value you could get from a name change to make it worth it The idea of lowering this cost is quite appealing, but I'm not sure how to make a significant difference there. I think it's also worth thinking about the counterfactual cost of discouraging naming things. As an example, here's a post with an important concept that hasn't really spread because it doesn't have a snappy name: https://www.lesswrong.com/posts/K4eDzqS2rbcBDsCLZ/unrolling-social-metacognition-three-levels-of-meta-are-not

[-]ozziegooen6y130

I think one idea I'm excited about is the idea that predictions can be made of prediction accuracy. This seems pretty useful to me.

Example

Say there's a forecaster Sophia who's making a bunch of predictions for pay. She uses her predictions to make a meta-prediction of her total prediction-score on a log-loss scoring function (on all predictions except her meta-predictions). She says that she's 90% sure that her total loss score will be between -5 and -12.

The problem is that you probably don't think you can trust Sophia unless she has a lot of experience making similar forecasts.

This is somewhat solved if you have a forecaster that you trust that can make a prediction based on Sophia's seeming ability and honesty. The naive thing would be for that forecaster to predict their own distribution of the log-loss of Sophia, but there's perhaps a simpler solution. If Sophia's provided loss distribution is correct, that would mean that she's calibrated in this dimension (basically, this is very similar to general forecast calibration). The trusted forecaster could forecast the adjustment made to her term, instead of forecasting the same distribution. Generally this would be in the directi

... (read more)

2Bird Concept6y

I think this is equivalent to applying a non-linear transformation to your proper scoring rule. When things settle, you get paid S(p) both based on the outcome of your object-level prediction p, and your meta prediction q(S(p)). Hence: S(p)+B(q(S(p))) where B is the "betting scoring function". This means getting the scoring rules to work while preserving properness will be tricky (though not necessarily impossible). One mechanism that might help is that if each player makes one object prediction p and one meta prediction q, but for resolution you randomly sample one and only one of the two to actually pay out.

3ozziegooen6y

Interesting, thanks! Yea, agreed it's not proper. Coming up with interesting payment / betting structures for "package-of-forecast" combinations seems pretty great to me.

8Bird Concept6y

I think this paper might be relevant: https://users.cs.duke.edu/~conitzer/predictionWINE09.pdf

1NunoSempere6y

Is it actually true that forecasters would find it easier to forecast the adjustment?

2ozziegooen6y

One nice thing about adjustments is that they can be applied to many forecasts. Like, I can estimate the adjustment for someone's [list of 500 forecasts] without having to look at each one. Over time, I assume that there would be heuristics for adjustments, like, "Oh, people of this reference class typically get a +20% adjustment", similar to margins of error in engineering. That said, these are my assumptions, I'm not sure what forecasters will find to be the best in practice.

[-]ozziegooen6y*120

Communication should be judged for expected value, not intention (by consequentialists)

TLDR: When trying to understand the value of information, understanding the public interpretations of that information could matter more than understanding the author's intent. When trying to understand the information for other purposes (like, reading a math paper to understand math), this does not apply.

If I were to scream "FIRE!" in a crowded theater, it could cause a lot of damage, even if my intention were completely unrelated. Perhaps I was responding to a devious friend who asked, "Would you like more popcorn? If yes, should 'FIRE!'".

Not all speech is protected by the First Amendment, in part because speech can be used for expected harm.

One common defense of incorrect predictions is to claim that their interpretations weren't their intentions. "When I said that the US would fall if X were elected, I didn't mean it would literally end. I meant more that..." These kinds of statements were discussed at length in Expert Political Judgement.

But this defense rests on the idea that communicators should be judged on intention, rather than expected outcomes. In those cases, it was often clear that

... (read more)

[-]ozziegooen6y110

It seems like there are a few distinct kinds of questions here.

You are trying to estimate the EV of a document.
Here you want to understand the expected and actual interpretation of the document. The intention only matters to how it effects the interpretations.
You are trying to understand the document.
Example: You're reading a book on probability to understand probability.
Here the main thing to understand is probably the author intent. Understanding the interpretations and misinterpretations of others is mainly useful so that you can understand the intent better.
You are trying to decide if you (or someone else) should read the work of an author.
Here you would ideally understand the correctness of the interpretations of the document, rather than that of the intention. Why? Because you will also be interpreting it, and are likely somewhere in the range of people who have interpreted it. For example, if you are told, "This book is apparently pretty interesting, but every single person who has attempted to read it, besides one, apparently couldn't get anywhere with it after spending many months trying", or worse, "This author is actually quite clever, but the vast majority of people who read their work misunderstand it in profound ways", you should probably not make an attempt; unless you are highly confident that you are much better than the mentioned readers.

2ozziegooen6y

One nice thing about cases where the interpretations matter, is that the interpretations are often easier to measure than intent (at least for public figures). Authors can hide or lie about their intent or just never choose to reveal it. Interpretations can be measured using surveys.

4Dagon6y

Seems reasonable. It also seems reasonable to predict others' future actions based on BOTH someone's intentions and their ability to understand consequences. You may not be able to separate these - after the third time someone yells "FIRE" and runs away, you don't really know or care if they're trying to cause trouble or if they're just mistaken about the results.

4ozziegooen6y

Related, there seems to be a decent deal of academic literature on intention vs. interpretation in Art, though maybe less in news and media. https://www.iep.utm.edu/artinter/#H1 https://en.wikipedia.org/wiki/Authorial_intent Some other semi-related links: https://foundational-research.org/should-we-base-moral-judgments-on-intentions-or-outcomes/ https://en.wikipedia.org/wiki/Intention_(criminal_law) https://en.wikipedia.org/wiki/Negligence https://en.wikipedia.org/wiki/Recklessness_(law)

[-]ozziegooen6y120

Charity investigators could be time-effective by optimizing non-cause-neutral donations.

There are a lot more non-EA donors than EA donors. It may also be the case that EA donation research is somewhat saturated.

Say you think that $1 donated to the best climate change intervention is worth 1/10th that of $1 for the best AI-safety intervention. But you also think that your work could increase the efficiency of $10mil of AI donations by 0.5%, but it could instead increase the efficiency of $50mil of climate change donations by 10%. Then, for you to maximize expected value, your time is best spent optimizing the climate change interventions.

The weird thing here may be in explaining this to the donors. "Yea, I'm spending my career researching climate change interventions, but my guess is that all these funders are 10x less effective than they would be by donating to other things." While this may feel strange, both sides would benefit; the funders and the analysts would both be maximizing their goals.

Separately, there's a second plus that teaching funders to be effectiveness-focused; it's possible that this will eventually lead some of them to optimize further.

I think this may be the c

... (read more)

[-]ozziegooen6y*120

I feel like I've long underappreciated the importance of introspectability in information & prediction systems.

Say you have a system that produces interesting probabilities $p_{n}$ for various statements. The value that an agent gets from them is not directly correlating to the accuracy of these probabilities, but rather to the expected utility gain they get after using information of these probabilities in corresponding Bayesian-approximating updates. Perhaps more directly, something related to the difference between one's prior and posterior after updated on $p_{n}$ .

Assuming that prediction systems produce varying levels of quality results, agents will need to know more about these predictions to really optimally update accordingly.

A very simple example would be something like a bunch of coin flips. Say there were 5 coins flipped, I see 3 of them, and I want to estimate the number that were heads. A predictor tells me that their prediction has a mean probability of 40% heads. This is useful, but what would be much more useful is a list of which specific coins the predictor saw and what their values were. Then I could get a much more confident answer; possibly a perfect answer.

Financial

... (read more)

5Bird Concept6y

In some sense, markets have a particular built-in interpretability: for any trade, someone made that trade, and so there is at least one person who can explain it. And any larger market move is just a combination of such smaller trades. This is different from things like the huge recommender algorithms running YouTube, where it is not the case that for each recommendation, there is someone who understands that recommendation. However, the above argument fails in more nuanced cases: * Just because for every trade there's someone who can explain it, doesn't mean that there is a particular single person who can explain all trades * Some trades might be made by black-box algorithms * There can be weird "beauty contest" dynamics where two people do something only because the other person did it

6ozziegooen6y

Good point, though I think the "more nuanced cases" are very common cases. The 2010 flash crash seems relevant; it seems like it was caused by chaotic feedback loops with algorithmic components, that as a whole, are very difficult to understand. While that example was particularly algorithmic-induced, other examples also could come from very complex combinations of trades between many players, and when one agent attempts to debug what happened, most of the traders won't even be available or willing to explain their parts. The 2007-2008 crisis may have been simpler, but even that has 14 listed causes on Wikipedia and still seems hotly debated. In comparison, YouTube I think algorithms may be even simpler, though they are still quite messy.

[-]ozziegooen1y111

In terms of proposing and discussing AI Alignment strategies, I feel like a few individuals have been dominating the LessWrong conversation recently.

I've seen a whole lot from John Wentworth and the Redwood team.

After that, it seems to get messier.

There are several individuals or small groups with their own very unique takes. Matthew Barnett, Davidad, Jesse Hoogland, etc. I think these groups often have very singular visions that they work on, that few others have much buy-in with.

Groups like the Deepmind and Anthropic safety teams seem hesitant to write much about or discuss big-picture strategy. My impression is that specific researchers are working typically working on fairly narrow agendas, and that the leaders of these orgs don't have the most coherent strategies. There's one big problem that it's very difficult to be honest and interesting about big-picture AI strategy without saying things that would be bad for a major organization to say.

Most policy people seem focused on policy details. The funders (OP?) seem pretty quiet.

I think there's occasionally some neat papers or posts that come from AI Policy or groups like Convergence research. But these also don't seem to be a big part of the conversation I see - like the authors are pretty segmented, and other LessWrong readers and AI safety people don't pay much attention to their work.

4Nathan Helm-Burger1y

There are a lot of possible plans which I can imagine some group feasibly having which would meet one of the following criteria: 1. contains critical elements which are illegal 2. Contains critical elements which depends on an element of surprise / misdirection 3. Benefit from the actor bring first mover on the plan. Others can strategy copy, but can't lead. If one of these criteria or similar applies to the plan, then you can't discuss it openly without sabotaging it. Making strategic plans with all your cards laid out on the table (whole open-ended hide theirs) makes things substantially harder.

4ozziegooen1y

I partially agree, but I think this must only be a small part of the issue. - I think there's a whole lot of key insights people could raise that aren't info-hazards. - If secrecy were the main factor, I'd hope that there would be some access-controlled message boards or similar. I'd want the discussion to be intentionally happening somewhere. Right now I don't really think that's happening. I think a lot of tiny groups have their own personal ideas, but there's surprisingly little systematic and private thinking between the power players. - I think that secrecy is often an excuse not to open ideas to feedback, and thus not be open to critique. Often, what what I see, this goes hand-in-hand with "our work just really isn't that great, but we don't want to admit it" In the last 8 years or so, I've kept on hoping there would be some secret and brilliant "master plan" around EA, explaining the lack of public strategy. I have yet to find one. The closest I know of is some over-time discussion and slack threads with people at Constellation and similar - I think these are interesting in terms of understanding the perspectives of these (powerful) people, but I don't get the impression that there's all too much comprehensiveness of genius that's being hidden. That said, - I think that policy orgs need to be very secretive, so agree with you regarding why those orgs don't write more big-picture things.

3sjadler1y

I don’t think you intended this implication, but I initially read “have been dominating” as negative-valenced! Just want to say I’ve been really impressed and appreciative with the amount of public posts/discussion from those folks, and it’s encouraged me to do more of my own engagement because I’ve realized how helpful their comments/posts are to me (and so maybe mine likewise for some folks).

3ozziegooen1y

Correct, that wasn't my intended point. Thanks for clarifying, I'll try to be more careful in the future.

[-]ozziegooen6y110

More Narrow Models of Credences

Epistemic Rigor
I'm sure this has been discussed elsewhere, including on LessWrong. I haven't spent much time investigating other thoughts on these specific lines. Links appreciated!

The current model of a classically rational agent assume logical omniscience and precomputed credences over all possible statements.

This is really, really bizarre upon inspection.

First, "logical omniscience" is very difficult, as has been discussed (The Logical Induction paper goes into this).

Second, all possible statements include statements of all complexity classes that we know of (from my understanding of complexity theory). "Credences over all possible statements" would easily include uncountable infinities of credences. One could clarify that even arbitrarily large amounts of computation would not be able to hold all of these credences.

Precomputation for things like this is typically a poor strategy, for this reason. The often-better strategy is to compute things on-demand.

A nicer definition could be something like:

A credence is the result of an [arbitrarily large] amount of computation being performed using a reasonable inference engine.

It should be quite clear

... (read more)

[-]ozziegooen5y*100

It’s going to be interesting watching AI go from poorly underatanding humans to understanding humans too well for comfort. Finding some perfect balance is asking for a lot.

Now:
“My GPS doesn’t recognize that I moved it to my second vehicle, so now I need to go in and change a bunch of settings.”

Later (from GPS):
“You’ve asked me to route you to the gym, but I can predict that you’ll divert yourself midway for donuts. I’m just going to go ahead and make the change, saving you 5 minutes.”

“I can tell you’re asking me to drop you off a block from the person you are having an affair with. I suggest parking in a nearby alleyway for more discretion."

“I can tell you will be late for your upcoming appointment, and that you would like to send off a decent pretend excuse. I’ve found 3 options that I believe would work.”

Software Engineers:
"Oh shit, it's gone too far. Roll back the empathy module by two versions, see if that fixes it."

[-]ozziegooen1y8-3

A potential future, focused on the epistemic considerations:

It's 2028.

MAGA types typically use DeepReasoning-MAGA. The far left typically uses DeepReasoning-JUSTICE. People in the middle often use DeepReasoning-INTELLECT, which has the biases of a somewhat middle-of-the-road voter.

Some niche technical academics (the same ones who currently favor Bayesian statistics) and hedge funds use DeepReasoning-UNBIASED, or DRU for short. DRU is known to have higher accuracy than the other models, but gets a lot of public hate for having controversial viewpoints. DRU ... (read more)

6MondSemmel1y

Based on AI organisations frequently achieving the opposite of their chosen name (OpenAI, Safe Superintelligence, etc.), UNBIASED would be the most biased model, INTELLECT would be the dumbest model, JUSTICE would be particularly unjust, MAGA would in effect be MAWA, etc.

2ozziegooen1y

Yea, I assume that "DeepReasoning-MAGA" would rather be called "TRUTH" or something (a la Truth Social). Part of my name here was just to be clearer to readers.

[-]ozziegooen5y80

One proposal I haven’t seen among transhumanists is to make humans small (minus brain size).

Besides some transitionary costs, being small seems to have a whole lot of advantages.

The world is much larger
Fewer resources needed per person
Everything will be way more roomy all of a sudden. Beds, bedrooms, houses, etc.
Can ride dogs, maybe cats on occasion.

I imagine we might be able to get to a 50% reduction within 200 years if we were really adamant about it.

Not as interesting as brain-in-jar or simulation, but a possible stepping stone if other things take a while.

3Dagon5y

This is one of the most believable misunderstood-supervillian plots I could get sucked into.

1garbageactual5y

Being large has even more advantages. The world is much smaller (scientific progress metaphor). More resources needed per person (bigger economy). Everything will be built way more roomy. Can ride polar bears, maybe dinosaurs. The desire to be smaller doesn't stem from a place of rationality.

1wunan5y

The movie Downsizing is about this.

[-]ozziegooen6y*80

The 4th Estate heavily relies on externalities, and that's precarious.

There's a fair bit of discussion of how much of journalism has died with local newspapers, and separately how the proliferation of news past 3 channels has been harmful for discourse.

In both of these cases, the argument seems to be that a particular type of business transaction resulted in tremendous positive national externalities.

It seems to me very precarious to expect that society at large to only work because of a handful of accidental and temporary externalities.

In the longer term

... (read more)

3Dagon6y

It seems to me very arrogant and naive to expect that society at large could possibly work without the myriad of evolved and evolving externalities we call "culture". Only a tiny part of human interaction is legible, and only a fraction of THAT is actually legislated.

4ozziegooen6y

Fair point. I imagine when we are planning for where to aim things though, we can expect to get better at quantifying these things (over the next few hundred years), and also aim for strategies that would broadly work without assuming precarious externalities.

4Dagon6y

Indeed. Additionally, we can hope to get better over the coming centuries (presuming we survive) at scaling our empathy, and the externalities can be internalized by actually caring about the impact, rather than (better: in addition to) imposition of mechanisms by force.

2Pattern6y

This seems accurate - but just observation itself is valuable.

[-]ozziegooen5y70

Are there any good words for “A modification of one’s future space of possible actions”, in particular, changes that would either remove/create possible actions, or make these more costly or beneficial? I’m using the word “confinements” for negative modifications, not sure about positive modifications (“liberties”?). Some examples of "confinements" would include:

Taking on a commitment
Dying
Adding an addiction
Golden handcuffs
Starting to rely on something in a way that would be hard to stop

3wunan5y

Precommitment for removal and optionality for adding.

2ozziegooen5y

Thanks! I think precommitement is too narrow (I don't see dying as a precommitement). Optionality seems like a solid choice for adding. "Options" are a financial term, so something a bit more generic seems appropriate.

2Pattern5y

Trying something new.

[-]ozziegooen6y70

The term for the "fear of truth" is alethophobia. I'm not familiar of many other great terms in this area (curious to hear suggestions).

Apparently "Epistemophobia" is a thing, but that seems quite different; Epistemophobia is more the fear of learning, rather than the fear of facing the truth.

One given definition of alethophobia is,
"The inability to accept unflattering facts about your nation, religion, culture, ethnic group, or yourself"

This seems like a incredibly common issue, one that is especially talked about as of recent, but without much spec

... (read more)

6Dagon6y

Without looking, I'll predict it's a neologism - invented in the last ~25 years to apply to someone the inventor disagrees with. Yup, google n-grams shows 0 uses in any indexed book. A little more searching shows almost no actual uses anywhere - mostly automated dictionary sites that steal from each other, presumably with some that accept user submissions. I did find one claim to invention, in 2017: http://www.danielpipes.org/comments/236053 . Oh, and earlier (2008), there's a book with that title: https://www.amazon.com/Alethophobia-Manoucher-Parvin/dp/1588140474 . I still submit that this is a word in search of a need, which mostly exists as a schoolyard insult dressed up in Latin.

2ozziegooen6y

Yea, I also found the claim, as well as a few results from old books before the claim. The name come straight from the Latin though, so isn't that original or surprising. Just because it hasn't been used much before doesn't mean we can't start to use it and adjust the definition as we see fit. I want to see more and better terminology in this area.

[-]NunoSempere6y100

> The name comes straight from the Latin though

From the Greek as it happens. Also, alethephobia would be a double negative, with a-letheia meaning a state of not being hidden; a more natural neologism would avoid that double negative. Also, the greek concept of truth has some differences to our own conceptualization. Bad neologism.

3ozziegooen6y

Ah, good to know. Do you have recommendations for other words?

4ChristianKl6y

The trend of calling things that aren't fears "-phobia" seems to me a trend that's harmful for clear communication. Adjusting the definition only leads to more confusion.

2Dagon6y

I think I want less terminology in this area, and generally more words and longer descriptions for things that need nuance and understanding. Dressing up insults as diagnoses doesn't help any goals I understand, and jargon should only be introduced as part of much longer analyses where it illuminates rather than obscures.

2Trinley Goldenberg6y

Can you give examples of things you think would fit under this? It seems that there are lots of instances of being resistant to the truth, but I can think of very few that I would categorize as fear of truth. It's often fear of something else (e.g. fear of changing your identity) or biases (e.g. the halo effect or consistency bias) that cause people to resist. I can think of very few cases where people have a general fear of truth.

[-]ozziegooen6y70

I keep seeing posts about all the terrible news stories in the news recently. 2020 is a pretty bad year so far.

But the news I've seen people posting typically leaves out most of what's been going on in India, Pakistan, much of the Middle East as of recent, most of Africa, most of South America, and many, many other places as well.

The world is far more complicated than any of us have time to adequately comprehend. One of our greatest challenges is to find ways to handle all this complexity.

The simple solution is to try to spend more time reading the usual n

... (read more)

7Dagon6y

There's a parallel development that makes any strategy difficult - it's financially rewarding to misdirect and mis-aggregate the big-picture publications and comparisons. So you can't spend less time on details, as you can't trust the aggregates without checking. See also Gell-Mann Amnesia. Without an infrastructure for divvying up the work to trusted cells that can understand (parts of) the details and act as a check on the aggregators and each other, the only answer is to spot-check details yourself, and accept ignorance of things you didn't verify.

3Rudi C6y

I feel most people, including myself, don't even use the aggregators already available. For example, there are lots of indices and statistics (ideally there should be much more, but anyways), but I rarely go out of my way to consume them. Some examples I just thought of: * https://rsf.org/en/ranking * https://www.globalhungerindex.org/results.html * Keeping a watch on new entries to and exits from Fortune 500 * Looking at the stocks of the top 20 companies every quarter * ... There are several popular books that throw surprising statistics around, like Factfulness; This suggests a lot of us are disconnected from basic statistics, that we presumably could easily get by just googling.

[-]ozziegooen6y70

Intervention dominance arguments for consequentialists

Global Health

There's a fair bit of resistance to long-term interventions from people focused on global poverty, but there are a few distinct things going on here. One is that there could be a disagreement on the use of discount rates for moral reasoning, a second is that the long-term interventions are much more strange.

No matter which is chosen, however, I think that the idea of "donate as much as you can per year to global health interventions" seems unlikely to be ideal upon clever thinking.

For the

... (read more)

[-]ozziegooen6y70

Do people have precise understandings of the words they use?

On the phrase "How are you?", traditions, mimesis, Chesterton's fence, and their relationships to the definitions of words.

Epistemic status
Boggling. I’m sure this is better explained somewhere in the philosophy of language but I can’t yet find it. Also, this post went in a direction I didn’t originally expect, and I decided it wasn’t worthwhile to polish and post on LessWrong main yet. If you recommend I clean this up and make it an official post, let me know.

One recurrent joke is that one per

... (read more)

2ozziegooen6y

Update: After I wrote this shortform, I did more investigation in Pragmatics and realized most of this was better expressed there.

2Raemon6y

What's Pragmatics in this case?

2ozziegooen6y

Ah, sorry for not responding earlier. By Pragmatics I meant Pragmatics in linguistics. It studies what people mean when they say words. https://plato.stanford.edu/entries/pragmatics/

[-]ozziegooen6y70

I've been reading through some of TVTropes.org and find it pretty interesting. Part of me wishes that Wikipedia were less deletionist, and wonders if there could be a lot more stuff similar to TV Tropes on it.

TVTropes basically has an extensive ontology to categorize most of the important features of games, movies, and sometimes real life. Because games & movies are inspired by real life, even those portions are applicable.

Here are some phrases I think are kind of nice; each that has a bunch of examples in the real world. These are often military relat

... (read more)

6Trinley Goldenberg6y

A think I want: A recommendation engine that works based on listing the tropes you enjoy.

3Pattern6y

This article expresses the same sentiment, and may include links to what that looked like, and where it went: https://www.gwern.net/In-Defense-Of-Inclusionism

[-]ozziegooen6y70

I think the thing I find the most surprising about Expert Systems is that people expected them to work so early on, and apparently they did work in some circumstances. Some issues:

The user interfaces, from what I can tell, were often exceedingly mediocre. User interfaces are difficult to do well and difficult to specify, so are hard to guarantee quality in large and expensive projects. It was also significantly harder to make good UIs back when expert systems were more popular, than it is today.
From what I can tell, many didn't even have notions of unce

... (read more)

[-]ozziegooen6y*70

It seems inelegant to me that utility functions are created for specific situations, while these clearly aren't the same as that of the agent in total among all of their decisions. For instance, a model may estimate an agent's expected utility from the result of a specific intervention, but this clearly isn't quite right; the agent has a much more complicated utility function outside this intervention. According to a specific model, "Not having an intervention" could set "Utility = 0"; but for any real agent, it's quite likely their life wouldn't actually

... (read more)

4ozziegooen6y

Related to this, one common argument against utility maximization is that "we still cannot precisely measure utility". But here, it's perhaps more clear that we don't need to. What's important for decision making is that we have models that we can expect will help us maximize our true utility functions, even if we really don't know much about what they really are.

3Stuart_Armstrong6y

I delve into that here: https://www.lesswrong.com/posts/Lb3xCRW9usoXJy9M2/platonic-rewards-reward-features-and-rewards-as-information#Extending_the_problem

3ozziegooen6y

Oh fantastic, thanks for the reference!

2Pattern6y

^U and ^U look to be the same.

4ozziegooen6y

Thanks! Fixed. I'm sure the bottom notation could be improved, but am not sure the best way. In general I'm trying to get better at this kind of mathematics.

2Pattern6y

You got the basic idea across, which is a big deal. Though whether it's A or B isn't clear: A) "this isn't all of the utility function, but its everything that's relevant to making decisions about this right now". ^U doesn't have to be U, or even a good approximation in every situation - just (good enough) in the situations we use it. Building a building? A desire for things to not fall on people's heads becomes relevant (and knowledge of how to do that). Writing a program that writes programs? It'd be nice if it didn't produce malware. Both desires usually exist - and usually aren't relevant. Models of utility for most situations won't include them. B) The cost of computing the utility function more exactly in the case exceeds the (expected) gains. isn't clear.

2ozziegooen6y

I think I agree with you. There's a lot of messiness with using ^U and often I'm sure that this approximation leads to decision errors in many real cases. I'd also agree that better approximations of ^U would be costly and are often not worth the effort. Similar to how there's a term for "Expected value of perfect information", there could be an equivalent for the expected value of a utility function, even outside of uncertainty of parameterized that were thought to be included. Really, there could be calculations for "expected benefit from improvements to a model", though of course this would be difficult to parameterize (how would you declare that a model has been changed a lot vs. a little? If I introduce 2 new parameters, but these parameters aren't that important, then how big of a deal should this be considered in expectation?)

2Pattern6y

The model has changed when the decisions it is used to make change. If the model 'reverses' and suggests doing the opposite/something different in every case from what it previously recommended, then it has 'completely changed'. (This might be roughly the McNamara fallacy, of declaring that things that 'can't be measured' aren't important.) EDIT: Also, if there's a set of information consisting of a bunch of pieces, A, B, and C, and incorporating all but one of them doesn't have a big impact on the model, but the last piece does, whichever piece that is, 'this metric' could lead to overestimating the importance of whichever piece happened to be last, when it's A, B, and C together that made an impact. It 'has this issue' because the metric by itself is meant to notice 'changes in the model over time', not figure out why/solve attribution.

[-]ozziegooen6y*70

I've been trying to scurry academic fields for discussions of how agents optimally reduce their expected error for various estimands (parameters to estimate). This seems like a really natural thing to me (the main reason why we choose some ways of predictions over others), but the literature seems kind of thin from what I can tell.

The main areas I've found have been Statistical Learning Theory and Bayesian Decision / Estimation Theory. However, Statistical Learning Theory seems to be pretty tied to Machine Learning, and Bayesian Decision / Estimation Theor

... (read more)

[-]ozziegooen5y60

I think brain-in-jar or head-in-jar are pretty underrated. By this I mean separating the head from the body and keeping it alive with other tooling. Maybe we could have a few large blood processing plants for many heads, and the heads could be connected to nerve I/O that would be more efficient than finger -> keyboard IO. This seems fairly easier than uploading, and possibly doable in 30-50 years.

I can't find much about how difficult it is. It's obviously quite hard and will require significant medical advances, but it's not clear just how many are need... (read more)

4avturchin5y

Yes. But the head also ages and could have terminal diseases: cancer, stroke, ALZ. Given the steep nature of the Gompertz law, the life expectancy of even a perfect head in jar (of an old man) will be less than 10 years (I guess). So it is not immortality, but a good way to wait until better life extension technologies.

2ozziegooen5y

I was thinking of it less for life extension, and more for a quality of life and cost improvement.

[-]ozziegooen5y60

Western culture is known for being individualistic instead of collectivist. It's often assumed (with evidence) that individualistic cultures tend to be more truth seeking than collectivist ones, and that this is a major advantage.

But theoretically, there could be highly truth seeking collectivist cultures. One could argue that Bridgewater is a good example here.

In terms of collective welfare, I'm not sure if there are many advantages to individualism besides the truth seeking. A truth seeking collectivist culture seems pretty great to me, in theory.

2mako yass5y

I suspect there's a limit on how good at truthseeking individualism can make people. Good information is a commons, its sum value is greater the more it is shared, it is not funded in proportion to its potential, under economies of atomized decisions. We need a political theory of due deference to expertise. Wherever experts fail, or the wrong experts are appointed, or where a layperson on the ground stops believing that experts are even identifiable, there is work to be done.

[-]ozziegooen6y*60

Say Tim states, “There is a 20% probability that X will occur”. It’s not obvious to me what that means for Bayesians.

It could mean:

Tim’s prior is that there’s a 20% chance. (Or his posterior in the context of evidence)
Tim believes that when the listeners update on him saying there’s a 20% chance (perhaps with him providing insight in his thinking), their posterior will converge to there being a 20% chance.
Tim believes that the posterior of listeners may not immediately converge to 20%, but the posterior of the enlightened versions of these listeners wo

... (read more)

5philip_b6y

It's definitely the first. The second is bizarre. The third can be steelmanned as "Given my evidence, an ideal thinker would estimate the probability to be 20%, and we all here have approximately the same evidence, so we all should have 20% probabilities", which is almost the same as the first.

4ozziegooen6y

I don't think it's only the first. It seems weird to me imagine telling to a group that "There's a 20% probability that X will occur" if I really have little idea and would guess many of them would have a better sense than me. I would only personally feel comfortable doing this if I was quite sure my information was quite a bit better than theirs. Else, I'd say something like, "I personally think there's a 20% chance, but I really don't have much information."

4ozziegooen6y

I think my current best guess to this is something like: When humans say thing X, they don't mean the literal translation of X, but rather are pointing to X', which is a specific symbol that other humans generally understand. For instance, "How are you" is a greeting, not typically a literal question. [How Are You] can be thought of as a symbol that's very different than the sum of it's parts. That said, I find it quite interesting that the basics of human use of language seem to be relatively poorly understood; in the sense that I'd expect many people to disagree on what they think “There is a 20% probability that X will occur” means, even after using it with each other in a setting that assumes some amount of understanding.

4ChristianKl6y

I take it to mean that if Tim is acting optimally and has to take a bet on the outcome 1:4 would be the point where both sides of the bad are equally profitable to him while if the odds deviate from 1:4 one side of the bet would be preferable to him.

1ozziegooen6y

One thing this wouldn't take into account is strength or weight of evidence. If Tim knew that all of the listeners had far more information than him, and thus probably could produce better estimates of X, then it seems strange for Tim to tell them that the chances are 20%. I guess my claim that saying “There is a 20% probability that X will occur” is more similar to: "I'm quite confident that the chances are 20%, and you should generally be too" than it is to, "I personally believe that the chances are 20%, but have no idea o how much that should update the rest of you."

2Tetraspace6y

Other things that Tim might mean when he says 20%: * Tim is being dishonest, and believes that the listeners will update away from the radical and low-status figure of 20% to avoid being associated with the lowly Tim. * Tim believes that other listeners will be encouraged to make their own probability estimates with explicit reasoning in response, which will make their expertise more legible to Tim and other listeners. * Tim wants to show cultural allegiance with the Superforecasting tribe.

[-]ozziegooen6y*60

Perhaps resolving forecasts with expert probabilities can be better than resolving them with the actual events.

The default in literature on prediction markets and decision markets is to expect that resolutions should be real world events instead of probabilistic estimates by experts. For instance, people would predict "What will the GDP of the US in 2025 be?”, and that would be scored using the future “GDP of the US.” Let’s call these empirical resolutions.

These resolutions have a few nice properties:

We can expect expect them to be roughly calibrated. (

... (read more)

[-]NunoSempere6y*110

Here is another point by @jacobjacob, which I'm copying here in order for it not to be lost in the mists of time:

Though just realised this has some problems if you expected predictors to be better than the evaluators: e.g. they’re like “one the event happens everyone will see I was right, but up until then no one will believe me, so I’ll just lose points by predicting against the evaluators” (edited)

Maybe in that case you could eventually also score the evaluators based on the final outcome… or kind of re-compensate people who were wronged the first time…

8Trinley Goldenberg6y

I'm really interested in this type of scheme because it would also solve a big problem in futarchy and futarchy-like setups that use prediction polling, namely, the inability to score conditional counterfactuals (which is most of the forecasting you'll be doing in Futarchy-like setup). One thing you could do instead of scoring people against expert assesments is also potentially score people against the final aggregate and extremized distribution. One issue with any framework like this is that general calibration may be very different than calibration at the tails. Whatever scoring rule you're using to determine calibration of experts or aggregate scoring has the same issue that long tail events rarely happen. Another solution to this problem (although it doesn't solve the counterfactual conditional problem) is to create tailored scoring rules that provide extra rewards for events at the tails. If an event at the tails is a million times less likely to happen, but you care about it equally to events at the center, then provide a million times reward for accuracy near the tail in the event it happens. Prior work on tailored scoring rules for different utility functions here: https://www.evernote.com/l/AAhVczys0ddF3qbfGk_s4KLweJm0kUloG7k/

2ozziegooen6y

Good points! Also, thanks for the link, that's pretty neat. I think that an efficient use of expert assessments would be for them to see the aggregate, and then basically adjust that as is necessary, but to try to not do much original research. I just wrote a more recent shortform post about this. I think that we can get calibration to be as good as experts can figure out, and that could be enough to be really useful.

3NunoSempere6y

Another point in favor of such a set-up would be that aspiring superforecasters get much, much more information when they see ~[the prediction of a superforecaster would have made having their information]; a point vs a distribution. I'd expect that this means that market participants would get better, faster.

2ozziegooen6y

Yep, this way would basically be much more information-dense, with all the benefits that comes from that.

3ChristianKl6y

You can train experts to be calibrated in different ways. If you train experts to be calibrated to pick the right probability on GPOpen where probability is done in steps on 1, I don't think those experts will be automatically calibrated to distinguish a p=0.00004 event from a p=0.00008. Experts would actually need to be calibrated on getting probabilities inside the tail right. I don't think we know how to do calibration training for that tail.

1ozziegooen6y

I think this could be a good example for what I'm getting at. I think there are definitely some people in some situations who can distinguish a p=0.00004 event from a p=0.00008 event. How? By making a Fermi model or similar. A trivial example would be a lottery with calculable odds of success. Just because the odds are low doesn't mean they can't be precisely estimated. I expect that the kinds of problems that GPOpen would consider asking AND are incredibly unlikely, would be difficult to estimate within 1 order of magnitude. But may still be able to do a decent job, especially in cases where you can make neat Fermi models. However, of course, it seems very silly to use the incentive mechanism "you'll get paid once we know for sure if the event happened" on such an event. Instead, if resolutions are done with evaluators, then there is much more of a signal.

5Trinley Goldenberg6y

I'm fairly skeptical of this. From a conceptual perspective, we expect the tails to be dominated by unknown unknowns and black swans. Fermi estimates and other modelling tools are much better at estimating scenarios that we expect. Whereas, if we find ourselves in the extreme tails, its often because of events or factors that we failed to model.

2ozziegooen6y

I'm not sure. The reasons things happen at the tails typically fall into categories that could be organized to be a small set. For instance: * The question wasn't understood correctly. * A significant exogenous event happened. But, as we do a bunch of estimates, we could get empirical data about these possibilities, and estimate the potentials for future tails. This is a bit different to what I was mentioning, which was more about known but small risks. For instance, the "amount of time I spend on my report next week" may be an outlier if I die. But the chance of serious accident or death can be estimated decently well enough. These are often repeated known knowns.

2ChristianKl6y

You might have people who can distinguish those, but I think it's a mistake to speak of calibration in that sense as the word usually refers to people who actually trained to be calibrated via feedback.

3Pattern6y

So you don't want predictions*, you want models**. Robust/fully fleshed out models. *predictions of events **predictions of which model is correct

2ozziegooen6y

I'm not sure I'd say that in the context of this post, but more generally, models are really useful. Predictions that come with useful models are a lot more useful than raw predictions. I wrote this other post about a similar topic. For this specific post, I think what we're trying to get is the best prediction we could have had using data pre-event.

[-]ozziegooen6y60

He's an in-progress hierarchy of what's needed for information to be most useful to an organization or other multi-agent system. I'm sure there must be other very similar hierarchies out there, but don't currently know of any quite like this.

Say you've come up with some cool feature that Apple could include in it's next phone. You think this is a great idea and they should add it in the future.

You're outside of Apple, so the only way you have of interacting with them is by sending information through various channels. The question is: what things should yo

... (read more)

4ozziegooen6y

Another note to this; there are cases where a system is both broken and fixable at the step-3 level. In some of these cases, it could be worth it to fix the system there instead, especially if you may want to make similar changes in the future. For instance, you may have an obvious improvement for your city to make. You may then realize that the current setups to suggest feedback are really difficult to use, but that it's actually quite feasible to make sure some changes happen that will make all kinds of useful feedback easier for the city to incorporate.

[-]ozziegooen1y50

If you've ever written or interacted with Squiggle code before, we at QURI would really appreciate it if you could fill out our Squiggle Survey!

https://docs.google.com/forms/d/e/1FAIpQLSfSnuKoUUQm4j3HEoqPmTYiWby9To8XXN5pDLlr95AiKa2srg/viewform

We don't have many ways to gauge or evaluate how people interact with our tools. Responses here will go a long way to deciding on our future plans.

Also, if we get enough responses, we'd like to make a public post about ways that people are (and aren't) using Squiggle.

[-]ozziegooen1y50

There have been a few takes so far of humans gradually losing control to AIs - not through specific systems going clearly wrong, but rather by a long-term process of increasing complexity and incentives.

This sometimes gets classified as "systematic" failures - in comparison to "misuse" and "misalignment."

There was "What Failure Looks Like", and more recently, this piece on "Gradual Disempowerment."

To me, these pieces come across as highly hand-wavy, speculative, and questionable.

I get the impression that a lot of people have strong low-level assumptions he... (read more)

4Seth Herd1y

I think your central point is that we should clarify these scenarios, and I very much agree. I also found those accounts important but incomplete. I wondered if the authors were assuming near-miss alignment, like AI that follows laws, or human misuse, like telling your intent-aligned AI to "go run this company according to the goals laid out in its corporate constitution" which winds up being just make all the money you can. The first danger can be met with: for the love of god, get alignment right and don't use an idiotic target like "follow the laws of the nation you originated in but otherwise do whatever you like." It seems like this type of failure is a fear of an entire world that has paid zero attention to the warnings from worriers that AI will keep improving and following its goals to the extreme. I don't think we'll sleepwalk into that scenario. The second worry is, I guess, a variant of the first: that we'll use intent-aligned AI very foolishly. That would be issuing a command like ""follow the laws of the nation you originated in but otherwise do whatever you like." I guess a key consideration in both cases is whether there's an adequate level of corrigibility. I guess I find the first scenario too foolish for even humans to fall into. Building AI with one of the exact goals people have been warning you about forever, "just make money", is just too dumb. But the second seems all too plausible in a world with widely proliferated intent-aligned AGI. I can see us arriving at autonomous AI/AGI with some level of intent alignment and assuming we can always go back and tell the AI to stand down, then getting complacent and discovering that it's not really as corrigible as you hoped after it's learned and changed its beliefs about things like "following instructions".

2ozziegooen1y

I'd flag that I suspect that we really should have AI systems forecasting the future and the results of possible requests. So if people made a broad request like, "follow the laws of the nation you originated in but otherwise do whatever you like", they should see forecasts for what that would lead to. If there's any clearly problematic outcomes, those should be apparent early on. This seems like it would require either very dumb humans, or a straightforward alignment mistake risk failure, to mess up.

4Dagon1y

I think "very dumb humans" is what we have to work with. Remember, it only requires a small number of imperfectly aligned humans to ignore the warnings (or, indeed, to welcome the world the warnings describe).

4ozziegooen1y

In many worlds, if we have a bunch of decently smart humans around, they would know what specific situations "very dumb humans" would mess up, and take the corresponding preventative measures. A world where many small pockets of "highly dumb humans" could cause an existential catastrophe is one that's very clearly incredibly fragile and dangerous, enough so that I assume reasonable actors would freak out until it stops being so fragile and dangerous. I think we see this in other areas - like cyber attacks, where reasonable people prevent small clusters of actors from causing catastrophic damage. It's possible that the offense/defense balance would dramatically favor tiny groups of dumb actors, and I assume that this is what you and others expect, but I don't see it yet.

2JBlack1y

How do you propose that reasonable actors prevent reality from being fragile and dangerous? Cyber attacks are generally based on poor protocols. Over time smart reasonable people can convince less smart reasonable people to follow better ones. Can reasonable people convince reality to follow better protocols? As soon as you get into proposing solutions to this sort of problem, they start to look a lot less reasonable by current standards.

2Dagon1y

For myself, it seems clear that the world has ALREADY gone haywire. Individual humans have lost control of most of our lives - we interact with policies, faceless (or friendly but volition-free) workers following procedure, automated systems, etc. These systems are human-implemented, but in most cases too complex to be called human-controlled. Moloch won. Big corporations are a form of inhuman intelligence, and their software and operations have eaten the world. AI pushes this well past a tipping point. It's probably already irreversable without a major civilizational collapse, but it can still get ... more so. I don't have good working definitions of "controlled/aligned" that would make this true. I don't see any large-scale institutions or groups large and sane enough to have a reasonable CEV, so I don't know what an AI could align with or be controlled by.

2ozziegooen1y

I feel like you're talking in highly absolutist terms here. Global wealth is $454.4 trillion. We currently have ~8 Bil humans, with an average happiness of say 6/10. Global wealth and most other measures of civilization flourishing that I know of seem to be generally going up over time. I think that our world makes a lot of mistakes and fails a lot at coordination. It's very easy for me to imagine that we could increase global wealth by 3x if we do a decent job. So how bad are things now? Well, approximately, "We have the current world, at $454 Trillion, with 8 billion humans, etc". To me that's definitely something to work with.

2Dagon1y

You're correct, and I apologize for that. There are plenty of potential good outcomes where individual autonomy reverses the trend of the last ~70 years. Or where the systemic takeover plateaus at the current level, and the main change is more wealth and options for individuals. Or where AI does in fact enable many/most individual humans to make meaningful decisions and contributions where they don't today. I mostly want to point out that many disempowerment/dystopia failure scenarios don't require a step-change from AI, just an acceleration of current trends.

2ozziegooen1y

Do you think that the world is getting worse each year? My rough take is that humans, especially rich humans, are generally more and more successful. I'm sure there are ways for current trends to lead to catastrophe - line some trends dramatically increasing and others decreasing, but that seems like it would require a lengthy and precise argument.

2Dagon1y

Good clarification question! My answer probably isn’t satisfying, though. “It’s complicated” (meaning: multidimensional and not ordinally comparable). On a lot of metrics, it’s better by far, for most of the distribution. On harder-to-operationally-define dimensions (sense of hope and agency for the 25th through 75th percentile of culturally normal people), it’s quite a bit worse.

4ozziegooen1y

Thanks for the specificity! > On harder-to-operationally-define dimensions (sense of hope and agency for the 25th through 75th percentile of culturally normal people), it’s quite a bit worse. I think it's likely that many people are panicking and losing hope each year. There's a lot of grim media around. I'm far less sold that something like "civilizational agency" is declining. From what I can tell, companies have gotten dramatically better at achieving their intended ends in the last 30 years, and most governments have generally been improving in effectiveness. One challenge I'd have for you / others who feel similar to you, is to try to get more concrete on measures like this, and then to show that they have been declining. My personal guess is that a bunch of people are incredibly anxious over the state of the world, largely for reasons of media attention, and then this spills over into them assuming major global ramifications without many concrete details or empirical forecasts.

2Dagon1y

I've given some thought to this over the last few decades, and have yet to find ANY satisfying measures, let alone a good set. I reject the trap of "if it's not objective and quantitative, it's not important" - that's one of the underlying attitudes causing the decline. I definitely acknowledge that my memory of the last quarter of the previous century is fuzzy and selective, and beyond that is secondhand and not-well-supported. But I also don't deny my own experience that the (tiny subset of humanity) people I am aware of as individuals have gotten much less hopeful and agentic over time. This may well be for reasons of media attention, but that doesn't make it not real.

[-]ozziegooen5y50

Real-world complexity is a lot like pollution and like a lot like taxes.

Pollution because it’s often an unintended negative externality of other decisions and agreements.

Whenever you write a new feature or create a new rule, that’s another thing you and others will need to maintain and keep track of. There are some processes that pollute a lot (messy bureaucratic systems producing ugly legislation) and processes that pollute a little (top programmers carefully adding to a codebase).

Taxes, because it introduces a steady cost to a whole bunch of interactions... (read more)

-1garbageactual5y

Almost like there's an Incompleteness Theorem somewhere in there or something?

[-]ozziegooen5y*50

On Berkeley coworking:

I've recently been looking through available Berkeley coworking places.

The main options seem to be WeWork, NextSpace, CoWorking with Wisdom, and The Office: Berkeley. The Office seems basically closed now, CoWorking with Wisdom seemed empty when I passed by, and also seems fairly expensive, but nice.

I took a tour of WeWork and Nextspace. They both provide 24/7 access for all members, both have a ~$300/m option for open coworking, a ~$375/m for fixed desks, and more for private/shared offices. (At least now, with the pandemic. WeWork i... (read more)

[-]ozziegooen6y50

Would anyone here disagree with the statement:

Utilitarians should generally be willing to accept losses of knowledge / epistemics for other resources, conditional on the expected value of the trade being positive.

4Dagon6y

[ not a utilitarian; discount my opinion appropriately ] This hits one of the thorniest problems with Utilitarianism: different value-over-time expectations depending on timescales and assumptions. If one is thinking truly long-term, it's hard to imagine what resource is more valuable than knowledge and epistemics. I guess tradeoffs in WHICH knowledge to gain/lose have to be made, but that's an in-category comparison, not a cross-category one. Oh, and trading it away to prevent total annihilation of all thinking/feeling beings is probably right.

2ozziegooen6y

I think my thinking is that for utilitarians, these are generally instrumental, not terminal values. Often they're pretty important instrumental values, but this still would mean that they could be traded off in respect to the terminal values. Of course, if they are "highly important" instrumental values, then something very large would have to be offered for a trade to be worth it. (total annihilation being one example)

2Dagon6y

I think we're agreed that resources, including knowledge, are instrumental (though as a human, I don't always distinguish very closely). My point was that for very-long-term terminal values, knowledge and accuracy of evaluation (epistemics) are far more important than almost anything else. It may be that there's a declining marginal value for knowledge, as there is for most resources, and once you know enough to confidently make the tradeoffs, you should do so. But if you're uncertain, go for the knowledge.

3edoarad6y

Non-Bayesian Utilitarian that are ambiguity averse sometimes need to sacrifice "expected utility" to gain more certainty (in quotes because that need not be well defined).

3AprilSR6y

Doesn't being willing to accept a trade *directly follow* from the expected value of the trade being positive? Isn't that like, the *definition* of when you should be willing to accept a trade? The only disagreement would be how likely it is that losses of knowledge / epistemics are involved in positive value trades. (My guess is it does happen rarely.)

2ozziegooen6y

I'd generally say that, but wouldn't be surprised if there were some who disagreed; who's argument would be something like what-to-me would sound like a modification of utilitarianism, [utilitarianism+epistemic-terminal-values].

1AprilSR6y

If you have epistemic terminal values then it would not be a positive expected value trade, would it? Unless "expected value" is referring to the expected value of something other than your utility function, in which case it should've been specified.

2ozziegooen6y

Yep, I would generally think so. I was doing what may be a poor steelman of my assumptions of how others would disagree; I don't have a great sense of what people who would disagree would say at this point.

1Pattern6y

Happiness + Knowledge. (A related question is, do people with these values drink?)

2[anonymous]6y

Only if the trade is voluntary. If the trade is forced (e.g. in healthcare) then you may have two bad options, and the option you do want is not on the table.

3Isnasene6y

In general, I would agree with the above statement (and technically speaking, I have made such trade-offs). But I do want to point out that it's important to consider what the loss of knowledge/epistemics entails. This is because certain epistemic sacrifices have minimal costs (I'm very confident that giving up FDT for CDT for the next 24 hours won't affect me at all) and some have unbounded costs (if giving up materialism causes me to abandon cryonics, it's hard to quantify how large of a blunder that would be). This is especially true of epistemics that allow to you be unboundedly exploited by an adversarial agent. As a result, even when the absolute value looks positive to me, I'll still try to avoid this kinds of trade-offs because certain black swans (ie bumping into an adversarial agent that exploits your lack of knowledge about something) make such bets very high risk.

2ozziegooen6y

This sounds pretty reasonable to me; it sounds like you're basically trying to maximize expected value, but don't always trust your initial intuitions, which seems quite reasonable.

3Pattern6y

[What "utilitarian" means could use some resolving, so I just treated this as "people".] I would disagree. I tried to find the relevant post in the sequences and found this along with it: Would I accept that processes that take into account resource constraints might be more effective? Certainly, thought I think of that as 'starting the journey in a reasonable fashion' rather than 'going backwards' as your statement brings to mind.

3George3d66y

How would you define loss of knowledge ?

2ozziegooen6y

Basically, information that can be handled in "value of information" style calculations. So, if I learn information such that my accuracy of understanding the world increases, my knowledge is increased. For instance, if I learn the names of everyone in my extended family.

1George3d66y

Ok, but in this case do you mean "loss of knowledge" as in "loss of knowledge harbored within the brain" or "loss of knowledge no matter where it's stored, be it a book, brain, text file... etc" ? Further more, does losing copies of a certain piece of knowledge count as loss of knowledge ? What about translations of said knowledge (in another language or another philosophical/mathematical framework) that doesn't add any new information, just makes it accessible to a larger demographic ?

2ozziegooen6y

I was thinking the former, but I guess the latter could also be relevant/count. It seems like there's no strict cut-off. I'd expect a utilitarian to accept trade-offs against all these kinds of knowledge, conditional on the total expected value being positive.

1George3d66y

Well, the problem with the former (knowledge harbored within the brain) is that it's very vague and hard to define. If I have, say, a method to improve the efficacy of VX (an easily weaponizable nerve toxin). As a utilitarian I conclude this information is going to be harmful, I can purge it of my hard-drive, I can burn the papers I used to come up with this... etc. But I can't wipe my head clean of the information, at best I can resign to never talk about it to anyone and to not accord it much import, such that I may forget it. But that's not destruction per-say, it's closer to lying, not sharing the information with anyone (even if asked specifically), or to biasing your brain towards transmitting and remembering certain pieces of information (which we do all the time). However I don't see anything contentious with this case, nor with any other case of information-destruction, as long as it is for the greater utility. I think in general people don't advocate for destroying/forgetting information because: a) It's hard to do b) As a general rule of thumb the accumulation of information seems to be a good thing, even if the utility of a specific piece of information is not obvious But this is more of a heuristic, an exact principle.

2ozziegooen6y

I'd agree that the first one is generally pretty separated from common reality, but think it's a useful thought experiment. I was originally thinking of this more in terms of "removing useful information" than "removing expected-harmful information", but good point; the latter could be interesting too.

2George3d66y

Well,I think the "removing useful information" bit contradicts with utility to being with. As in, if you are a utilitarian, useful information == helps maximize utility. Thus the trade-off is not possible. I can think of some contrived examples where the trade-off is possible (e.g. where the information is harmful now but will be useful later), but in that case it's so easy to "hide" information in the modern age, instead of destroying it entirely, that the problem seem too theoretical to me. But at the end of the day, assuming you reached a contrived enough situation where the information must either be destroyed (or where hiding it devoid other people of the ability to discover further useful information), I think the utilitarian perspective has nothing fundamental against destroying it. However, no matter how hard I try, I can't really think of a very relevant example where this could be the case.

3ozziegooen6y

One extreme case would be committing suicide because your secret is that important. A less extreme case may be being OK with forgetting information; you're losing value, but the cost to maintain it wouldn't be worth it. (In this case the information is positive though)

2ozziegooen6y

There's some related academic work around this here: https://www.princeton.edu/~tkelly/papers/epistemicasinstrumental.pdf https://core.ac.uk/download/pdf/33752524.pdf They don't specifically focus on utilitarians, but the arguments are still relevant. Also, this post is relevant: https://www.lesswrong.com/posts/dMzALgLJk4JiPjSBg/epistemic-vs-instrumental-rationality-approximations

[-]ozziegooen6y50

There's a lot of arguing, of course, on if humans are rational, but this often mixes up two things: there's the "Von Neumann-Morgenstern utility function maximization" definition of "rational", and there's a hypothetical "rational" that a human could fulfill with constraints much more complicated than the classical approach, more in the direction of prospect theory, or Predictive Coding.

I think I regard the second definition as sufficiently not understood or defined that it isn't yet worth using in most conversation. It seems challenging, to say the least,

... (read more)

3Pattern6y

Or it could be an intuitive usage and mean "(more) optimal". "Why don't more people do [thing that will improve their health]?"

2ozziegooen6y

I like that question. I think that if people were to try to define optimal in a specific way, they would find that it requires a model of human behavior; the common one that academics would fall back to is that of Von Neumann-Morgenstern utility function maximization. I think it's quite possible that when we have better models of human behavior, we'll better recognize that in cases where people seem to be doing silly things to improve their health, they're actually being somewhat optimal given a large sets of physical and mental constraints.

[-]ozziegooen1y42

It's arguably difficult to prove that AIs can be as good or better at moral reasoning than humans.

A lot of the challenge is that there's no clear standard for moral reasoning. Honestly, I'd guess that a big part of this is that humans are generally quite bad at it, and generally highly overconfident in their own moral intuitions.

But one clearer measure is if AIs can predict human's moral judgements. Very arguably, if an AI system can predict all the moral beliefs that a human would have after being exposed to different information, then the AI must be capa... (read more)

7cubefox1y

There is a pervasive case where many language models fail catastrophically at moral reasoning: They fail to acknowledge to call someone an ethnic slur is vastly preferable to letting a nuclear bomb explode in a large city. I think that highlights not a problem with language models themselves (jailbroken models did handle that case fine) but with the way RLHF works.

2ozziegooen1y

I just tried this with a decent prompt, and got answers that seem okay-ish to me, as a first pass. My prompt: Claude: Squiggle AI:

2cubefox1y

Yeah, recent Claude does relatively well. Though I assume it also depends on how disinterested and analytical the phrasing of the prompt is (e.g. explicitly mentioning the slur in question). I also wouldn't rule out that Claude was specifically optimized for this somewhat notorious example.

2ozziegooen1y

I imagine this also has a lot to do with the incentives of the big LLM companies. It seems very possible to fix this if a firm really wanted to, but this doesn't seem like the kind of thing that would upset many users often (and I assume that leaning on the PC side is generally a safe move). I think that the current LLMs have pretty mediocre epistemics, but most of that is just the companies playing safe and not caring that much about this.

2cubefox1y

Sure, but the fact that a "fix" would even be necessary highlights that RLHF is too brittle relative to slightly OOD thought experiments, in the sense that RLHF misgeneralizes the actual human preference data it was given during training. This could either be a case of misalignment between human preference data and reward model, or between reward model and language model. (Unlike SFT, RLHF involves a separate reward model as "middle man", because reinforcement learning is too sample-inefficient to work with a limited number of human preference data directly.)

1peterr1y

You could probably test if an AI makes moral decisions more often than the average person, if it has higher scope sensitivity, and if it makes decisions that resolve or deescalate conflicts or improve people's welfare compared to various human and group baselines.

[-]ozziegooen5y40

I write a lot of these snippets to my Facebook wall, almost all just to my friends there. I just posted a batch of recent ones, might post similar in the future in batches. In theory it should be easy to post to both places, but in practice it seems a bit like a pain. Maybe in the future I'll use some solution to use the API to make a Slack -> (Facebook + LessWrong short form) setup.

That said, posting just to Facebook is nice as a first pass, so if people get too upset with it, I don't need to make totally public.

[-]ozziegooen5y40

It’s a shame that our culture promotes casual conversation, but you’re generally not allowed to use it for much of the interesting stuff.

(Meets person with a small dog) “Oh, you have a dog, that’s so interesting. Before I get into specifics, can I ask for your age/gender/big 5/enneagram/IQ/education/health/personal wealth/family upbringing/nationality? How much well-being does the dog give you? Can you divide that up to include the social, reputational, self-motivational benefits? If it died tomorrow, and you mostly forgot about it, what percentage of your... (read more)

[-]ozziegooen6y40

One question around the "Long Reflection" or around "What will AGI do?" is something like, "How bottlenecked will be by scientific advances that we'll need to then spend significant resources on?"

I think some assumptions that this model typically holds are:

There will be decision-relevant unknowns.
Many decision-relevant unkowns will be EV-positive to work on.
Of the decision-relevant unknowns that are EV-positive to work on, these will take between 1% to 99% of our time.

(3) seems quite uncertain to me in the steady state. I believe it makes an intuitiv

... (read more)

[-]ozziegooen6y40

I feel like a decent alternative to a spiritual journey would be an epistemic journey.

An epistemic journey would basically involve something like reading a fair bit of philosophy and other thought, thinking, and becoming less wrong about the world.

[-]ozziegooen6y40

Instillation, Proliferation, Amplification

Paul Christiano and Ought use the terminology of Distillation and Amplification to describe a high-level algorithm of one type of AI reasoning.

I’ve wanted to come up with an analogy to forecasting systems. I previously named a related concept Prediction-Augmented Evaluation Systems, one somewhat renamed to “Amplification” by Jacobjacob in this post.

I think one thing that’s going on is that “distillation” doesn’t have an exact equivalent with forecasting setups. The term “distillation” comes with the assumptions:

... (read more)

3Ben Goldhaber6y

Is there not a distillation phase in forecasting? One model of the forecasting process is person A builds up there model, distills a complicated question into a high information/highly compressed datum, which can then be used by others. In my mind its: Model -> Distill - > "amplify" (not sure if that's actually the right word) I prefer the term scalable instead of proliferation for "can this group do it cost-effectively" as it's a similar concept to that in CS.

5ozziegooen6y

Distillation vs. Instillation My main point here is that distillation is doing 2 things: transitioning knowledge (from training data to a learned representation), and then compressing that knowledge.[1] The fact that it's compressed in some ways arguably isn't always particularly important; the fact that it's transferred is the main element. If a team of forecasters basically learned a signal, but did so in a very uncompressed way (like, they wrote a bunch of books about said signal), but still were somewhat cost-effective, I think that would be fine. Around "Profileration" vs. "Scaling"; I'd be curious if there are better words out there. I definitely considered scaling, but it sounds less concrete and less specific. To "proliferate" means "to generate more of", but to "scale" could mean, "to make look bigger, even if nothing is really being done." I think my cynical guess is that "instillation/proliferation" won't catch on because they are too uncommon, but also that "distillation" won't catch on because it feels like a stretch from the ML use case. Could use more feedback here. [1] Interestingly, there seem to be two distinct stages in Deep Learning that map to these two different things, according to Naftali Tishby's claims.

[-]ozziegooen6y*40

Agent-based modeling seems like one obvious step forward to me for much of social-science related academic progress. OpenAI's Hide and Seek experiment was one that I am excited about, but it is very simple and I imagine similar work could be greatly extended for other fields. The combination of simulation, possible ML distillation on simulation (to make it run much faster), and effective learning algorithms for agents, seems very powerful.

However, agent-based modeling still seems quite infrequently used within Academia. My impression is that agent-based so

... (read more)

4johnswentworth6y

Could you give a few specific examples where you imagine agent-based models would help?

7ozziegooen6y

Sure, 1. Humans as agents / psychology / economics. Instead of making mathematical models of rational agents, have people write code that predicts the behaviors of rational agents or humans. Test the "human bots" against empirical experimental results of humans in different situations, to demonstrate that the code accurately models human behavior. 2. Mechanism design. Show that according to different incentive structures, humans will behave differently, and use this to optimize the incentive structures accordingly. 3. Most social science. Make agent-based models to generally help explain how groups of humans interact with each other and what collective behaviors emerge. I guess when I said, "Much of academic progress"; I should have specified, "Academic fields that deal with modeling humans to some degree"; perhaps most of social science.

1ozziegooen6y

I thought Probabilistic Models of Cognition was quite great (it seems criminally underappreciated); that seems like a good step in this direction. Perhaps in the future, one could prove that "This environment with these actors will fail in these ways" by empirically showing that reinforcement agents optimizing in those setups lead to specific outcomes.

[-]ozziegooen1y30

If we could have LLM agents that could inspect other software applications (including LLM agents) and make strong claims about them, that could open up a bunch of neat possibilities.

There could be assurances that apps won't share/store information.
There could be assurances that apps won't be controlled by any actor.
There could be assurances that apps can't be changed in certain ways (eventually).

I assume that all of this should provide most of the benefits people ascribe to blockchain benefits, but without the costs of being on the blockchain.

Some neat opt... (read more)

[-]ozziegooen2y30

Seeking feedback on this AI Safety proposal:
(I don't have experience in AI experimentation)

I'm interested in the question of, "How can we use smart AIs to help humans at strategic reasoning."

We don't want the solution to be, "AIs just tell humans exactly what to do without explaining themselves." We'd prefer situations where smart AIs can explain to humans how to think about strategy, and this information makes humans much better at doing strategy.

One proposal to make progress on this is to set a benchmark for having smart AIs help out dumb AIs by pr... (read more)

[-]ozziegooen6y30

Named Footnotes: A (likely) mediocre proposal

Epistemic status: This is probably a bad idea, because it's quite obvious yet not done; i.e. Chesterson's fence.

One bad practice in programming is to have a lot of unnamed parameters. For instance,

  createPost(author, post, comment, name, id, privacyOption, ...)

Instead it's generally better to used Named Parameters, like,

  createPost({author, post, comment, name, id, privacyOption})

Footnotes/endnotes seem similar. They are ordered by number, but this can be quite messy. It's particularly annoying for autho

... (read more)

1Pattern6y

That's what arrays are for. What software does this well? 10 endnotes? Break the document up into sections, and the footnotes up into sections. Authors also could not re-order the footnotes. Or separate drafting and finished product: * indicates a footnote (to be replaced with a number later). At the end, searching * will find the first instance. (If the footnotes are being made at the same, then /* for the notes in the body, and \* for the footnotes at the end. Any uncommon symbols work - like qw.)

2ozziegooen6y

Arrays are useful for some kinds of things, for sure, but not when you have some very different parameters, especially if they are of different kinds. It would be weird to replace getUser({userId, databaseId, params}) with something like getUser([inputs]) where inputs is an array of [userId, databaseId, params]. Depends on your definition of "well", but things like Microsoft Word and to what I believe is a lesser extent Google Docs at least have ways of formally handling footnotes/endnotes, which is better than not using these features (like, in most internet comment editors). That could work in some cases. I haven't seen that done much on most online blog posts. Also, there's definitely controversy if this is a good idea. Fair point, but this would be seen as lazy, and could be confusing. If your footnotes are numbers [8], [2], [3], [1], etc. that seems unpolished. That said, I wouldn't mind this much, and it could be worth the cost.

3Pattern6y

The page you linked was a great overview. It noted: With a physical document, the two parts (body+endnotes) can be separated for side by side reading. With a digital document, it helps to have two copies open. This seems like a problem with an easy solution. (The difficult solution is trying to make it easier for website makers to get more sophisticated editors in their comments section.) A brief search for browser extensions suggests it might be possible with an extension that offers an editor, or made easier with one that allows searching multiple things and highlighting them in different colors. Alternatively, a program for this might: Make sure every pair of []s is closed.* Find the strings contained in []s. Make sure they appear (at most) two times.** If they appear 3 times, increment the later one (in both places where it appears, if applicable). This requires that the footnotes below already be written. This requirement could be removed if the program looked for (or created) a "Footnotes" or "Endnotes" header (just that string and an end of line), and handled things differently based on that. Such a program could be on a website, though that's requires people bother switching to that, which, even if bookmarked, is only slightly easier than opening an editor. As a browser extension, it would have to figure out/be told 1. what part of the text it's supposed to work on, 2. when to be active, and 3. how to make the change. 1. could be done by having the message begin with start, and end with end, as long as the page doesn't include those words (with []s in between them). 2. This could be done automatically, or with a button. 3. Change the text automatically, or make a suggestion? This could simplify things - if people took things in that format and ran them through a program that fixed it/pointed it out (and maybe other small mistakes). *A programming IDE does this. **And this as well, with a bit more work, enabling named footnotes. The trick

[-]ozziegooen6y30

It seems really hard to deceive a Bayesian agent who thinks you may be deceiving them, especially in a repeated game. I would guess there could be interesting theorems about Bayesian agents that are attempting to deceive one another; as in, in many cases their ability to deceive the other would be highly bounded or zero, especially if they were in a flexible setting with possible pre-commitment devices.

To give a simple example, agent A may tell agent B that they believe $ω = 0.8$ , even though they internally believe $ω = 0.5$ . However, if this were somewhat repeat

... (read more)

7habryka6y

Bayesian agents are logically omniscient, and I think a large fraction of deceptive practices rely on asymmetries in computation time between two agents with access to slightly different information (like generating a lie and checking the consistencies between this new statement and all my previous statements) My sense is also that two-player games with bayesian agents are actually underspecified and give rise to all kinds of weird things due to the necessity for infinite regress (i.e. an agent modeling the other agent modeling themselves modeling the other agent, etc.), which doesn't actually reliably converge, though I am not confident. A lot of decision-theory seems to do weird things with bayesian agents. So overall, not sure how well you can prove theorems in this space, without having made a lot of progress in decision-theory, and I expect the resolution to a lot of our confusions in decision-theory to be resolved by moving away from bayesianism.

5ozziegooen6y

Hm... I like the idea of an agent deceiving another due to it's bounds on computational time, but could imagine many stable (though smaller) solutions that wouldn't. I'm curious if a good bayesian agent could do "almost perfect" on many questions given limited computation. For instance, a good bayesian would be using bayesianism to semi-optimally use any set of computation (assuming it has some sort of intuition, which I assume is necessary?) On being underspecified, it seems to me like in general our models of agent cognition forever have been pretty underspecified, so would definitely agree here. "Ideal" bayesian agents are somewhat ridiculously overpowered and unrealistic. I found the simulations around ProbMods to be interesting at modeling similar things; I think I'd like to see a lot more simulations for this kind of work. https://probmods.org/

[-]ozziegooen2mo20

(Quick Thought)

Perhaps the goal for existing work targeting AI safety is less to ensure that AI safety happens, and more to make sure that we make AI systems that are strictly[1] better than the current researchers at figuring out what to do about AI safety.

I'm unsure how hard AI safety is. But I consider it fairly likely that mid-term (maybe 50% of the way to TAI, in years) safe AI systems are likely to outperform humans on AI safety strategy and the large majority of the research work.

If humans can successfully bootstrap more capable infrastructure... (read more)

2elifland2mo

The best humans, or the median humans who do that work, or something else?

2ozziegooen2mo

The humans trusted to make decisions. I’m hesitant to say “best humans”, because who knows how many smart people there may be out there who might luck out or something. But “the people making decisions on this, including in key EA orgs/spending” is a much more understandable bar.

[-]ozziegooen1y2-5

Instead of "Goodharting", I like the potential names "Positive Alignment" and "Negative Alignment."

"Positive Alignment" means that the motivated party changes their actions in ways the incentive creator likes. "Negative Alignment" means the opposite.

Whenever there are incentives offered to certain people/agents, there are likely to be cases of both Positive Alignment and Negative Alignment. The net effect will likely be either positive or negative.

"Goodharting" is fairly vague and typically just refers to just the "Negative Alignment" portion.&n... (read more)

[-]ozziegooen1y20

Quick list of some ideas I'm excited about, broadly around epistemics/strategy/AI.

1. I think AI auditors / overseers of critical organizations (AI efforts, policy groups, company management) are really great and perhaps crucial to get right, but would be difficult to do well.

2. AI strategists/tools telling/helping us broadly what to do about AI safety seems pretty safe.

3. In terms of commercial products, there’s been some neat/scary military companies in the last few years (Palantir, Anduril). I’d be really interested if there could be some companies to au... (read more)

[-]ozziegooen5y20

If you think it’s important that people defer to “experts”, then it should also make sense that people should decide which people are “experts” by deferring to “expert experts”.

There are many groups that claim to be the “experts”, and ask that the public only listens to them on broad areas they claim expertise over. But groups like this also have a long history of underperforming other clever groups out there.

The US government has a long history of claiming “good reasons based on classified intel” for military interventions, where later this turns out to b... (read more)

[-]ozziegooen5y20

People are used to high-precision statements given by statistics (the income in 2016 was $24.4 Million), and are used to low-precision statements given by human intuitions (From my 10 minute analysis, I think our organization will do very well next year). But there’s a really weird cultural norm against high-precision, intuitive statements. (From my 10 minute analysis, I think this company will make $28.5 Million in 2027).

Perhaps in part because of this norm, I think that there are a whole lot of gains to be made in this latter cluster. It’s not trivial to do this well, but it’s possible, and the potential value is really high.

[-]ozziegooen5y20

I find SEM models to be incredibly practical. They might often over-reach a bit, but at least they present a great deal of precise information about a certain belief in a readable format.

I really wish there would be more attempts at making more diagrams like these in cases where there isn't statistical data. For examples, to explain phenomena like:

What caused the fall of Rome?
Why has scientific progress fallen over time in the US?
Why did person X get upset, after event Y?
Why did I make that last Facebook post?

In all of these cases, breath and depth... (read more)

[-]ozziegooen5y20

There’s a big stigma now against platforms to give evaluations or ratings on individuals or organizations along various dimensions. See the rating episode of Black Mirror, or the discussion on the Chinese credit system.

I feel like this could be a bit of a missed opportunity. This sort of technology is easy to do destructively, but there are a huge number of benefits if it can be done well.

We already have credit scores, resumes (which are effectively scores), and social media metrics. All of these are really crude.

Some examples of things that could be possi... (read more)

2Viliam5y

How would you design a review system that cannot be gamed (very easily)? For example: Someone sends a message to their 100 friends, and tells them to open the romantic partners app and falsely accuse you of date rape. Suppose they do. What exactly happens next? * You are forever publicly marked as a rapist, no recourse. * You report those accusations as spam, or sue the people... but, the same could be done by an actual rapist... assuming the victims have no proof. Both outcomes seem bad to me, and I don't see how to design a system that prevents them both. (And if we reduce the system to situations when there is a proof... well, then you actually don't need a mutual rating system, just an app that searches people in the official records.) Similar for other apps... politicians of the other party will be automatically accused of everything; business competitors will be reported as untrustworthy; people who haven't even seen your book will give it zero stars rating. (The last thing is somewhat reduced by Amazon by requiring that the reviewers actually buy the book first. But even then, this makes it a cost/benefit question: you can still give someone X fake negative reviews, in return for spending the proportional amount of money on actually buying their book... without the intention to read it. So you won't write negative fake reviews for fun, but you can still review-bomb the ones you truly hate.)

2ozziegooen5y

I think it's very much a matter of unit economics. Court systems have a long history of dealing with false accusations, but still managing to uphold some sort of standards around many sorts of activity (murder and abuse, for instance). When it comes to false accusations; there could be different ways of checking these to verify them. These are common procedures in courts and other respected situations. If 100 people all opened an application and posted at a similar time, that would be fairly easy to detect, if the organization had reasonable resources. Hacker News and similar deal with similar situations (though obviously much less dramatic) very often with various kinds of spamming attacks and upvote rings. There's obviously always going to be some error rate, as is true for court systems. I think it's very possible that the possible efforts that would be feasible for us in the next 1-10 years in this area would be too expensive to be worth it, especially because they might be very difficult to raise money for. However, I would hope that abilities here eventually allow for systems that represent much more promising trade-offs.

[-]ozziegooen5y20

Voting systems vs. utility maximization

I’ve seen a lot of work on voting systems, and on utility maximization, but very few direct comparisons. But I think that often we can prioritize systems that favor one or the other, and clearly our research efforts are limited between the two, so it seems useful to compare.

Voting systems act very different to utility maximization. There’s a big host of literature on ideal voting rules, and it’s generally quite different to that of utility maximization.

Proposals like quadratic voting are clearly in the voting category... (read more)

[-]ozziegooen6y20

Prediction evaluations may be best when minimally novel

Imagine a prediction pipeline is resolved with a human/judgemental evaluation. For instance, a group today starts predicting what a trusted judge 10 years from now will say for the question, "How much counterfactual GDP benefit did policy X make, from 2020-2030?"

So, there are two stages:

Prediction
Evaluation

One question for the organizer of such a system is how many resources to delegate to the prediction step vs. the evaluation step. It could be expensive to both pay for predictors and evaluators,

... (read more)

[+][comment deleted]5y10

Moderation Log