All of Jay Bailey's Comments + Replies

Significant Digits is (or was, a few years ago) considered the best one, to my recollection.

2MondSemmel
Thanks! I haven't had time to skim either report yet, but I thought it might be instructive to ask the same questions to Perplexity Pro's DR mode (it has the same "Deep Research" as OpenAI, but otherwise has no relation to it, in particular it's free with their regular monthly subscription, so it can't be particularly powerful): see here. (This is the second report I generated, as the first one froze indefinitely while the app generated the conclusion, and thus the report couldn't be shared.)

I bought a month of Deep Research and am open to running queries if people have a few but don't want to spend 200 bucks for them. Will spend up to 25 queries in total.

A paragraph or two of detail is good - you can send me supporting documents via wnlonvyrlpf@tznvy.pbz (ROT13) if you want. Offer is open publicly or via PM.

How likely is it that AI will surpass humans, take over all power, and cause human extinction some time during the 21st century?

3cubefox
cc @Annapurna 
4MondSemmel
How about "analyze the implications for risk of AI extinction, based on how OpenAI's safety page has changed over time"? Inspired by this comment (+ follow-up w/ Internet archive link).

Having reflected on this decision more, I have decided I no longer endorse those feelings in point B of my second-to-last paragraph. In fact, I've decided that "I donated roughly 1k to a website that provided way more expected value than that to me over my lifetime, and also if it shut down I think that would be a major blow to one of the most important causes in the world" is something to be proud of, not embarrassed by, and something worthy of being occasionally reminded of.

So if you're still sending them out I'd gladly take one after all :)

I've been procrastinating on this, but I heard it was the last day to do this, so here I am. I've utilised LessWrong for years, but am also a notoriously cheap bastard. I'm working on this. That said, I feel I should pay something back, for what I've gotten out of it.

When I was 20 or so, I was rather directionless, and didn't know what I wanted to do in life, bouncing between ideas, never finishing them. I was reading LessWrong at the time. At some point, a LessWrong-ism popped into my head - "Jay - this thing you're doing isn't working. Your interests cha... (read more)

2Ben Pace
When the donation came in 15 mins ago, I wrote in slack  So you came close to being thwarted! But fear not, after reading this I will simply not send you a t-shirt :)

Hi Giorgi,

Not an expert on this, but I believe the idea is that over time the agent will learn to assign negligible probabilities to actions that don't do anything. For instance, imagine a game where the agent can move in four directions, but if there's a wall in front of it, moving forward does nothing. The agent will eventually learn to stop moving forward in this circumstance. So you could probably just make it work, even if it's a bit less efficient, if you just had the environment do nothing if an invalid action was selected.

Thanks for this! I've changed the sentence to:

The target network gets to see one more step than the Q-network does, and thus is a better predictor.

Hopefully this prevents others from the same confusion :)

pandas is a good library for this - it takes CSV files and turns them into Python objects you can manipulate. plotly / matplotlib lets you visualise data, which is also useful. GPT-4 / Claude could help you with this. I would recommend starting by getting a language model to help you create plots of the data according to relevant subsets. Like if you think that the season matters for how much gold is collected, give the model a couple of examples of the data format and simply ask it to write a script to plot gold per season.

Answer by Jay Bailey
42

To provide the obvious advice first:

  • Attempt a puzzle.
  • If you didn't get the answer, check the comments of those who did.
  • Ask yourself how you could have thought of that, or what common principle that answer has. (e.g, I should check for X and Y)
  • Repeat.

I assume you have some programming experience here - if not, that seems like a prerequisite to learn. Or maybe you can get away with using LLM's to write the Python for you.

1FinalFormal2
That sounds like a pretty good basic method- I do have some (minimal) programming experience, but I didn't use it for D&D Sci, I literally just opened the data in Excel and tried looking at it and manipulating it that way. I don't know where I would start as far as using code to try and synthesize info from the dataset. I'll definitely look into what other people did though.

I don't know about the first one - I think you'll have to analyse each job and decide about that. I suspect the answer to your second question is "Basically nil". I think that unless you are working on state-of-the-art advances in:

A) Frontier models B) Agent scaffolds, maybe.

You are not speeding up the knowledge required to automate people.

I guess my way of thinking of it is - you can automate tasks, jobs, or people.

Automating tasks seems probably good. You're able to remove busywork from people, but their job is comprised of many more things than that task, so people aren't at risk of losing their jobs. (Unless you only need 10 units of productivity, and each person is now producing 1.25 units so you end up with 8 people instead of 10 - but a lot of teams could also quite use 12.5 units of productivity well)

Automating jobs is...contentious. It's basically the tradeoff I talked about above.

A... (read more)

2dr_s
A fair point. I suppose part of my doubt though is exactly: are most of these applications going to automate jobs, or merely tasks? And to what extent does contributing to either advance the know how that might eventually help automating people?
Answer by Jay Bailey
54

I think that there are two questions one could ask here:

  • Is this job bad for x-risk reasons? I would say that the answer to this is "probably not" - if you're not pushing the frontier but are only commercialising already available technology, your contribution to x-risk is negligible at best. Maybe you're very slightly adding to the generative AI hype, but that ship's somewhat sailed at this point.

  • Is this job bad for other reasons? That seems like something you'd have to answer for yourself based on the particulars of the job. It also involves some ph

... (read more)
2dr_s
I suppose I'm mostly also looking for aspects of this I might have overlooked, or inside perspective about any details from someone who has relevant experience. I think I tend to err a bit on caution on things but ultimately I believe that "staying pure" is rarely a road to doing good (at most it's a road to not doing bad, but that's relatively easy if you just do nothing at all). Some of the problems with automation would have applied to many of the previous rounds of it, and those ultimately came out mostly good, I think, but also it somehow feels This Time It's Different (but then again, I do tend to skew towards pessimism and seeing all the possible ways things can go wrong...).

It seems to me that either:

  • RLHF can't train a system to approximate human intuition on fuzzy categories. This includes glitches, and this plan doesn't work.

  • RLHF can train a system to approximate human intuition on fuzzy categories. This means you don't need the glitch hunter, just apply RLHF to the system you want to train directly. All the glitch hunter does is make it cheaper.

1lemonhope
You may be right. Perhaps the way to view this idea is "yet another fuzzy-boundary RL helper technique" that works in a very different way and so will have different strengths and weaknesses than stuff like RLHF. So if one is doing the "serially apply all cheap tricks that somewhat reduce risk" approach then this can be yet another thing in your chain.

I was about 50/50 on it being AI-made, but then when I saw the title "Thought That Faster" was a song, I became much more sure, because that was a post that happened only a couple weeks ago I believe, and if it was human-made I assume it would take longer to go from post to full song. Then I read this post.

8Raemon
Having done a bunch of regular-ol'-fashioned songwriting, I actually don't think it takes that much longer for an experienced songwriting to get to decent-ish quality song. (This varies a lot on how hard the muse is striking me. Usually there is an initial phase where I'm figuring out the core structure and heart of the song. I'm not personally fluent at playing instruments but if I was I think it'd take a 2-8 hours to hash out the chord structure, and then 2-8 hours to do a decent recording if I hire professional musicians) (to be clear, this is to get "a decent-ish" song recording. In practice songwriting/recording takes much longer because I have higher standards and iterate on them until I feel really happy with them, but if I wanted to bang out an album I think I could do it in 2 weeks)
3qvalq
Was over two years ago.

In Soviet Russia, there used to be something called a Coke party. You saved up money for days to buy a single can of contraband Coca-Cola. You got all of your friends together and poured each of them a single shot. It tasted like freedom.

I know this isn't the point of the piece, but this got to me. However much I appreciate my existence, it never quite seems to be enough to be calibrated to things like this. I suddenly feel both a deep appreciation and vague guilt. Though it does give me a new gratitude exercise - imagine the item I am about to enjoy is forbidden in my country and I have acquired a small sample at great expense.

I notice that this is a standard pattern I use and had forgotten how non-obvious it is, since you do have to imagine yourself in someone else's perspective. If you're a man dating women on dating apps, you also have to imagine a very different perspective than your own - women tend to have many more options of significantly lower average quality. It's unlikely you'd imagine yourself giving up on a conversation because it required mild effort to continue, since you have less of them in the first place and invest more effort in each one.

The level above that ... (read more)

3CronoDAS
When my brother was trying to meet girls on social media sites about twenty years ago, after going through early PUA stuff and throwing out some of the nonsense, this was his the message decided to use as his cold opening: Yeah, it's a dad joke; the punchline is "Because it has appeal!" It worked, though; there were enough girls that were curious enough about the punchline to respond to a message from a stranger in order to hear it.

What I'm curious about is how you balance this with the art of examining your assumptions.

Puzzle games are a good way of examining how my own mind works, and I often find that I go through an algorithm like:

  • Do I see the obvious answer?
  • What are a few straightforward things I could try?

Then Step 3 I see as similar to your maze-solving method:

  • What are the required steps to solve this? What elements constrain the search space?

But I often find that for difficult puzzles, a fourth step is required:

  • What assumptions am I making, that would lead me to ov
... (read more)

Concrete feedback signals I've received:

  • I don't find myself excited about the work. I've never been properly nerd-sniped by a mechanistic interpretability problem, and I find the day-to-day work to be more drudgery than exciting, even though the overall goal of the field seems like a good one.

  • When left to do largely independent work, after doing the obvious first thing or two ("obvious" at the level of "These techniques are in Neel's demos") I find it hard to figure out what to do next, and hard to motivate myself to do more things if I do think of t

... (read more)

Anecdotally I have also noticed this - when I tell people what I do, the thing they are frequently surprised by is that we don't know how these things work.

As you implied, if you don't understand how NN's work, your natural closest analogue to ChatGPT is conventional software, which is at least understood by its programmers. This isn't even people being dumb about it, it's just a lack of knowledge about a specific piece of technology, and a lack of knowledge that there is something to know - that NN's are in fact qualitatively different from other programs.

6ChristianKl
It's worth noting that conventional software is also often not fully understood. The ranking algorithms of the major tech companies are complex enough that there might be no human that fully understands them.

Yes, this is an argument people have made. Longtermists tend to reject it. First off, applying a discount rate on the moral value of lives in order to account for the uncertainty of the future is...not a good idea. These two things are totally different, and shouldn't be conflated like that imo. If you want to apply a discount rate to account for the uncertainty of the future, just do that directly. So, for the rest of the post I'll assume a discount rate on moral value actually applies to moral value.

So, that leaves us with the moral argument.

A fairly goo... (read more)

1[deactivated]
Thanks!

For the Astra Fellowship, what considerations do you think people should be thinking about when deciding to apply for SERI MATS, Astra Fellowship, or both? Why would someone prefer one over the other, given they're both happening at similar times?

4Ryan Kidd
MATS has the following features that might be worth considering: 1. Empowerment: Emphasis on empowering scholars to develop as future "research leads" (think accelerated PhD-style program rather than a traditional internship), including research strategy workshops, significant opportunities for scholar project ownership (though the extent of this varies between mentors), and a 4-month extension program; 2. Diversity: Emphasis on a broad portfolio of AI safety research agendas and perspectives with a large, diverse cohort (50-60) and comprehensive seminar program; 3. Support: Dedicated and experienced scholar support + research coach/manager staff and infrastructure; 4. Network: Large and supportive alumni network that regularly sparks research collaborations and AI safety start-ups (e.g., Apollo, Leap Labs, Timaeus, Cadenza, CAIP); 5. Experience: Have run successful research cohorts with 30, 58, 60 scholars, plus three extension programs with about half as many participants.
5Alexandra Bates
Good question! In my opinion, the main differences between the programs are the advisors and the location. Astra and MATS share a few advisors, but most are different. Additionally, Astra will take place at the Constellation office, and fellows will have opportunities to talk with researchers that work regularly from Constellation.

The agent's context includes the reward-to-go, state (i.e, an observation of the agent's view of the world) and action taken for nine timesteps. So, R1, S1, A1, .... R9, S9, A9. (Figure 2 explains this a bit more) If the agent hasn't made nine steps yet, some of the S's are blank. So S5 is the state at the fifth timestep. Why is this important?

If the agent has made four steps so far, S5 is the initial state, which lets it see the instruction. Four is the number of steps it takes to reach the corridor where the agent has to make the decision to go left or r... (read more)

1Joseph Bloom
Thanks Jay! (much better answer!) 

I think there’s an aesthetic clash here somewhere. I have an intuition or like... an aesthetic impulse, telling me basically… “advocacy is dumb”. Whenever I see anybody Doing An Activism, they're usually… saying a bunch of... obviously false things? They're holding a sign with a slogan that's too simple to possibly be the truth, and yelling this obviously oversimplified thing as loudly as they possibly can? It feels like the archetype of overconfidence.

This is exactly the same thing that I have felt in the past. Extremely well said. It is worth pointing ou... (read more)

9bideup
Sometimes such feelings are your system 1 tracking real/important things that your system 2 hasn’t figured out yet.

I find this interesting but confusing. Do you have an idea for what mechanism allowed this? E.g: Are you getting more done per hour now than your best hours working full-time? Did the full-time hours fall off fast at a certain point? Was there only 15 hours a week of useful work for you to do and the rest was mostly padding?

I think this makes a lot of sense. While I think you can make the case for "fertility crisis purely as a means of preventing economic slowdown and increasing innovation" I think your arguments are good that people don't actually often make this argument, and a lot of it does stem from "more people = good".

But I think if you start from "more people = good", you don't actually have motivated reasoning as much as you suspect re: innovation argument. I think it's more that the innovation argument actually does just work if you accept that more people = good. B... (read more)

Okay, I think I see several of the cruxes here.

Here's my understanding of your viewpoint:

"It's utterly bizarre to worry about fertility. Lack of fertility is not going to be an x-risk anytime soon. We already have too many people and if anything a voluntary population reduction is a good thing in the relative near-term. (i.e, a few decades or so) We've had explosive growth over the last century in terms of population, it's already unstable, why do we want to keep going?"

In a synchronous discussion I would now pause to see if I had your view right. Because ... (read more)

2jbash
I think your summary's reasonable. I'm not so sure about point 3 being irrelevant. Without that, what is the positive reason for caring about fertility? Just the innovation rate and aging population? Those don't seem to explain the really extreme importance people attach to this: talking about a "crisis", talking about really large public expenditures, talking about coercive measures, talking about people's stated preferences for their own lives being wrong to the point where they need to be ignored or overridden, etc... I mean, those are the sorts of things that people tend to reserve for Big Issues(TM). I get the impression that some people just really, really care about having more humans purely for the sake of having more humans. And not just up to some set optimum number, but up to the absolute maximum number they can achieve subject to whatever other constraints they may recognize. Ceteris paribus, 10⁴⁷ people is better than 10⁴⁶ people and 10⁴⁸ is better still. That view is actually explicit in long-termist circles that are Less-Wrong-adjacent. And it's something I absolutely cannot figure out. I've been in long discussions about it on here, and I still can't get inside people's heads about it. I mean, I just got a comment calling me "morally atrocious" for not wanting to increase the population without limit (at least so long as it didn't make life worse for the existing population). I think that was meant to be independent of the part about extinction; maybe I'm wrong. ... but if you have more people around in order to get penicillin invented, you equally have more people around to suffer before penicillin is invented. That seems to be true for innovation in general. More people may mean less time before an innovation happens, but it also means more people living before that innovation. Seems like a wash in terms of the impact of almost any innovation. The only way I can get any sense out of it at all is to think that people want the innovations with

I would suggest responding with your points (Top 3-5, if you have too many to easily list) on why this is incredibly obviously not a problem, seeing where you get pushback if anywhere, and iterating from there. Don't be afraid to point out "incredibly obvious" things - it might not be incredibly obvious to other people. And if you're genuinely unsure why anyone could think this is a problem, the responses to your incredibly obvious points should give you a better idea.

jbash
10-1

OK...

  1. We already have eight billion people. There is no immediate underpopulation crisis, and in fact there are lots of signs that we're causing serious environmental trouble trying to support that many with the technology we're using[1]. We're struggling to come up with better core technologies to support even that many people, even without raising their standard of living. Maybe we will, maybe we won't. At the moment, if there's any population problem, it's overpopulation.

  2. It's not plausible that any downward trend will continue to the point of bein

... (read more)

I think Tristan is totally right, and it puts an intuition I've had into words. I'm not vegan - I am sympathetic to the idea of having this deep emotional dislike of eating animals, I feel like the version of me who has this is a better person, and I don't have it. From a utilitarian perspective I could easily justify just donating a few bucks to animal charities...but veganism isn't about being optimally utilitarian. I see it as more of a virtue ethics thing. It's not even so much that I want to be vegan, but I want to be the kind of person who chooses it... (read more)

4Tristan Williams
I don't think we can rush to judgement on your character so quick. My ability to become a vegan, or rather to at least take this step in trying to be that sort of person, was heavily intertwined with some environmental factors. I grew up on a farm, so I experienced some of what people talk about first hand. Even though I didn't process it as something overall bad at the time, a part of me was unsettled, and I think I drew pretty heavily on that memory and being there in my vegan transition period.  I guess the point is something like you can't just become that person the day after you decide you want to be. Sometimes the best thing you can do is try to learn and engage more and see where that gets you. With this example that would mean going to a slaughterhouse yourself and participating, which maybe isn't a half bad idea (though I haven't thought this through at all, so I may be missing something).  Also giving up chicken is not a salve, its a great first step, a trial period that can serve as a positive exemplar of what's possible for the version of yourself that might wish to fully revert back one day. I believe in you, and wish you the best of luck with your journey :)
4Algon
If you've tried and found it a struggle, or suffer from dietary problems, or lack money etc., then fair enough. If not, just do it. It is suprising how often going from frequently doing X to never doing X is much easier than doing X a moderate amount. Maybe going vegan for a, I don't know, a month? Yeah, a month feels like a decent amount of time. Maybe going vegan for a month will be like that for you. So why not try it?
2RamblinDash
Jay, I feel this way too and I think something that's equivalently easy as giving up chicken, while having a bigger impact, is to give up eating meat at home, including lunches packed at home and eaten at work (assuming this describes most of your meals). I have done this and it feels great - I eat a lot less meat.

One of the core problems of AI alignment is that we don't know how to reliably get goals into the AI - there are many possible goals that are sufficiently correlated with doing well on training data that the AI could wind up optimising for a whole bunch of different things.

Instrumental convergence claims that a wide variety of goals will lead to convergent subgoals such that the agent will end up wanting to seek power, acquire resources, avoid death, etc.

These claims do seem a bit...contradictory. If goals are really that inscrutable, why do we strongly ex... (read more)

I found an error in the application - when removing the last item from the blacklist, every page not whitelisted is claimed to be blacklisted. Adding an item back to the blacklist fixes this. Other than that, it looks good!

1Sergii
thanks!

Interesting. That does give me an idea for a potentially useful experiment! We could finetune GPT-4 (or RLHF an open source LLM that isn't finetuned, if there's one capable enough and not a huge infra pain to get running, but this seems a lot harder) on a "helpful, harmless, honest" directive, but change the data so that one particular topic or area contains clearly false information. For instance, Canada is located in Asia.

Does the model then:

  • Deeply internalise this new information? (I suspect not, but if it does, this would be a good sign for scalable
... (read more)
2Marius Hobbhahn
Sounds like an interesting direction. I expect there are lots of other explanations for this behavior, so I'd not count it as strong evidence to disentangle these hypotheses. It sounds like something we may do in a year or so but it's far away from the top of our priority list. There is a good chance, we will never run it. If someone else wants to pick this up, feel free to take it on.

In current user-facing LLMs like ChatGPT or Claude, the closest approximation to goals may be being helpful, harmless, and honest.

According to my understanding of RLHF, the goal-approximation it trains for is "Write a prompt that is likely to be rated as positive". In ChatGPT / Claude, this is indeed highly correlated with being helpful, harmless, and honest, since the model's best strategy for getting high ratings is to be those things. If models are smarter than us, this may cease to be the case, as being maximally honest may begin to conflict with the r... (read more)

2Marius Hobbhahn
Seems like one of multiple plausible hypotheses. I think the fact that models generalize their HHH really well to very OOD settings and their generalization abilities in general could also mean that they actually "understood" that they are supposed to be HHH, e.g. because they were pre-prompted with this information during fine-tuning.  I think your hypothesis of seeking positive ratings is just as likely but I don't feel like we have the evidence to clearly say so wth is going on inside LLMs or what their "goals" are.

I don't really understand how your central point applies here. The idea of "money saves lives" is not supposed to be a general rule of society, but rather a local point about Alice and Bob - namely, donating ~5k will save a life. That doesn't need to be always true under all circumstances, there just needs to be some repeatable action that Alice and Bob can take (e.g, donating to the AMF) that costs 5k for them that reliably results in a life being saved. (Your point about prolonging life is true, but since the people dying of malaria are generally under 5... (read more)

1Caerulea-Lawrence
Hello Jay Bailey, Thanks for your reply. Yes, I seem to have overcomplicated the point made in this post by adding the system-lens to this situation. It isn't irrelevant, it is simply besides the point for Alice and Bob. The goal I am focusing on is a 'system overhaul' not a concrete example like this. I was also reminded of how detrimental the confrontational tone and haughtiness by Alice and the lack of clarity and self-understanding of Bob is for learning, change and understanding. How it creates a loop where the interaction itself doesn't seem to bring either any closer to being more in tune with their values and beliefs. It seems to further widen the gulf between their respective positions, instead of capitalizing on their respective differences to further improve on facets of their values-to-actions efficiency ratio that their opposite seems capable of helping them with. But I didn't focus much on this point in my comment. Kindly, Caerulea-Lawrence

Not quite, in my opinion. In practice, humans tend to be wrong in predictable ways (what we call a "bias") and so picking the best option isn't easy.

What we call "rationality" tends to be the techniques / thought patterns that make us more likely to pick the best option when comparing alternatives.

How about "AI-assisted post"? Shouldn't clash with anything else, and should be clear what it means on seeing the tag.

1[anonymous]
Nice idea!

"Reward" in the context of reinforcement learning is the "goal" we're training the program to maximise, rather than a literal dopamine hit. For instance, AlphaGo's reward is winning games of Go. When it wins a game, it adjusts itself to do more of what won it the game, and the other way when it loses. It's less like the reward a human gets from eating ice-cream, and more like the feedback a coach might give you on your tennis swing that lets you adjust and make better shots. We have no reason to suspect there's any human analogue to feeling good.

I think intelligence is a lot easier than morality, here. There are agreed upon moral principles like not lying, not stealing, and not hurting others, sure...but even those aren't always stable across time. For instance, standard Western morality held that it was acceptable to hit your children a couple of generations ago, now standard Western morality says it's not. If an AI trained to be moral said that actually, hitting children in some circumstances is a worthwhile tradeoff, that could mean that the AI is more moral than we are and we overcorrected, or... (read more)

1[anonymous]
Yes, I appreciate the complexities of morality when compared with intelligence but it's not something that we can in any way afford to ignore. It's an essential part of alignment, and if we can get narrow ASI behind it we may be able to sufficiently solve it before we arrive at AGI and full ASI. I don't think this is an intelligence vs morality matter. It seems that we need to apply AI intelligence much more directly to better understanding and solving moral questions that have thus far proved too difficult for humans. Another part of this is that we don't need full consensus. All of the nations of the world have an extensive body of laws that not everyone agrees with but that are useful in ensuring the best welfare of their citizens. Naturally I'm not defending laws that disenfranchise various groups like women, but our system of laws shows that much can be done by agreeing upon various moral questions. I think a lot of AI's success with this will depend on logic and reasoning algorithms. For example 99% of Americans eat animal products notwithstanding the suffering that those animals endure in factory farms. While there may not be consensus on the cruelty of this practice, the logic and reasoning behind it being terribly cruel could not be more clear. Yes, I do believe that we humans need to ramp up our own morality in order to better understand what AI comes up with. Perhaps we need it to also help us do that.

If NAMSI achieved a superhuman level of expertise in morality, how would we know? I consider our society to be morally superior to the one we had in 1960. People in 1960 would not agree with this assessment upon looking. If NAMSI agrees with us about everything, it's not superhuman. So how do we determine whether its possibly-superhuman morality is superior or inferior?

1[anonymous]
If we're measuring intelligence we measure it relative to a known metric: https://www.google.com/amp/s/www.scientificamerican.com/article/i-gave-chatgpt-an-iq-test-heres-what-i-discovered/%3Famp=true Just as we measure intelligence based on fundamental attributes, we can do the same with morality. It seems that we have generally agreed upon moral principles like not lying and stealing and not hurting others without good reason. So it seems we would measure intelligence based on how well it does in those areas. Just like there is a lack of a consensus regarding what intelligence is and how it should be measured, the same would apply for morality. But I believe we can still arrive at a useful working understanding of relative morality based on accepted moral principles. Also its proposals for how we could best solve alignment would probably make more sense to us.

That may explain why these scenarios have never been all that appealing to me, because I do think about the future in these hypothetical scenarios. I ask myself "Okay, what would the plan be in five years, when the scavenged food has long since run out?" and that feels scary and overwhelming. (Admittedly, rollercoaster scary, since it's a fantasy, but I find myself spending just as much time asking how the hell I'd learn to recreate agriculture and how miserable day-to-day farming would be as I do imagining myself as a badass hero who saves someone from zombies - and that's assuming I survive at all, which is a pretty big if!)

1nim
Hypothetical-you might like exploring abandoned permaculture sites -- scavenge their libraries for knowledge of what's good to forage, and their gardens for delicious food. Perennial food forests are a great hedge against long-term food anxiety.

""AI alignment" has the application, the agenda, less charitably the activism, right in the name."

This seems like a feature, not a bug. "AI alignment" is not a neutral idea. We're not just researching how these models behave or how minds might be built neutrally out of pure scientific curiosity. It has a specific purpose in mind - to align AI's. Why would we not want this agenda to be part of the name?

What are the best ones you've got?

2Nicholas / Heather Kross
(sources: discord chats on public servers) Why do I believe X? What information do I already have, that could be relevant here? What would have to be true such that X would be a good idea? If I woke up tomorrow and found a textbook explaining how this problem was solved, what's paragraph 1? What is the process by which X was selected?

I don't think this is a good metric. It is very plausible that porn is net bad, but living under the type of govermnment that would outlaw it is worse. In which case your best bet would be to support its legality but avoid it yourself.

I'm not saying that IS the case, but it certainly could be. I definitely think there are plenty of things that are net-negative to society but nowhere near bad enough to outlaw.

1Going Durden
we know that involuntary sexual celibacy is psychologically harmful, and socially disruptive. If porn can damped the effects of involuntary celibacy and sexual frustration (which include, but are not limited to: rape, sexual harassment, social radicalization, and co-relates with acts of terrorism or public shootings etc )then it is almost certainly a net positive.
-2[anonymous]
I agree 100% than excessive porn use is bad, and many of the practices of the porn industry are bad. However I believe it is harmless if not beneficial when consumed in appropriate moderation. The general trend that leads countries to ban porn is a general suppression of sexual expression, either for religious or political reasons. It is almost a natural law in societies that suppressed sexuality will not dampen sexual urges, and it will manifest in less healthy ways than if those outlets were permitted. 

An AGI that can answer questions accurately, such as "What would this agentic AGI do in this situation" will, if powerful enough, learn what agency is by default since this is useful to predict such things. So you can't just train an AGI with little agency. You would need to do one of:

  • Train the AGI with the capabilities of agency, and train it not to use them for anything other than answering questions.
  • Train the AGI such that it did not develop agency despite being pushed by gradient descent to do so, and accept the loss in performance.

Both of these s... (read more)

Late response but I figure people will continue to read these posts over time: Wedding-cake multiplication is the way they teach multiplication in elementary school. i.e, to multiply 706 x 265, you do 706 x 5, then 706 x 60, then 706 x 200 and add all the results together. I imagine it is called that because the result is tiered like a wedding cake.

One of the easiest ways to automate this is to have some sort of setup where you are not allowed to let things grow past a certain threshold, a threshold which is immediately obvious and ideally has some physical or digital prevention mechanism attached.

Examples:

Set up a Chrome extension that doesn't let you have more than 10 tabs at a time. (I did this)

Have some number of drawers / closet space. If your clothes cannot fit into this space, you're not allowed to keep them. If you buy something new, something else has to come out.

I know this is two years later, but I just wanted to say thank you for this comment. It is clear, correct, and well-written, and if I had seen this comment when it was written, it could have saved me a lot of problems at the time.

I've now resolved this issue to my satisfaction, but once bitten twice shy, so I'll try to remember this if it happens again!

2romeostevensit
glad it was helpful! I like Core Transformation for deeper dives looking for side effects.

Sorry it took me a while to get to this.

Intuitively, as a human, you get MUCH better results on a thing X if your goal is to do thing X, rather than Thing X being applied as a condition for you to do what you actually want. For example, if your goal is to understand the importance of security mindset in order to avoid your company suffering security breaches, you will learn much more than being forced to go through mandatory security training. In the latter, you are probably putting in the bare minimum of effort to pass the course and go back to whatever y... (read more)

Answer by Jay Bailey
21

Are you thinking of Dynalist? I know Neel Nanda's interpretability explainer is written in it.

1mcint
I am indeed, thank you!

I think what the OP was saying was that in, say, 2013, there's no way we could have predicted the type of agent that LLM's are and that they would be the most powerful AI's available. So, nobody was saying "What if we get to the 2020s and it turns out all the powerful AI are LLM's?" back then. Therefore, that raises a question on the value of the alignment work done before then.

If we extend that to the future, we would expect most good alignment research to happen within a few years of AGI, when it becomes clear what type of agent we're going to get. Align... (read more)

Load More