All of Gyrodiot's Comments + Replies

The post makes clear that two very different models of the world will lead to very different action steps, and the "average" of those steps isn't what follows the average of probabilities. See how the previous sentence felt awkward and technical, compared to the story? Sure, it's much longer, but the point gets across better, that's the value. I have added this story to my collection of useful parables.

Re-reading it, the language remains technical, one needs to understand a bit more probability theory to get the latter parts. I would like to see a retelling of the story, same points, different style, to test if it speaks to a different audience.

Gyrodiot101

I filled out the survey. Thank you so much for running this!

4Screwtape
You're welcome! Thank you for taking it.

Oh, glad I scrolled to find this comment. Adding a request for France, which does have charity tax deductions... but needs an appropriate receipt.

2Lucie Philippon
See the thread here: https://www.lesswrong.com/posts/5n2ZQcbc7r4R8mvqc/the-lightcone-is-nothing-without-its-people?commentId=Tmra55pcMKahHyBcn Does not look possible to have tax deductibility in France, no matter the indirection

Could you provide an example of prediction the Γ Framework makes which highlights the divergence between it and the Standard Model? Especially in cases the Standard Model falls short of describing reality well enough?

-1[anonymous]
Thank you for the question. One prediction the Γ Framework makes is in the area of muon decay. In the Standard Model, a muon decays into an electron, a muon neutrino, and an electron neutrino. This relies on the existence of undetectable neutrinos to account for the missing energy. The Γ Framework, by contrast, eliminates the need for neutrinos altogether. In the Γ Framework, a muon (43Γ) decays directly into two electrons (2 x 20Γ) and three 1Γ gluons, which then decay into six gamma-ray photons. The entire energy balance (105 MeV) is accounted for via photon-photon interactions. This divergence highlights a fundamental shift: whereas the Standard Model introduces undetectable particles to conserve energy, the Γ Framework explains particle decay entirely through photon-based interactions. This prediction could be tested by revisiting high-precision experiments on muon decay, looking for potential discrepancies in missing energy or gamma-ray emissions ("halo data") where the Standard Model currently predicts neutrinos. Another area of divergence is the interpretation of proton-proton fusion. In the Standard Model, proton fusion releases energy partly through neutrinos. The Γ Framework, however, posits that this energy is carried entirely by photon-photon interactions and the emission of gamma rays, offering a cleaner explanation without the need for neutrinos. In both cases, the Standard Model falls short in providing a direct observable explanation for neutrino-based processes, while the Γ Framework predicts energy outcomes that could be more empirically testable with future advancements.

Best weekend of the year. Been there in 2017, 2018, 2019, 2023, will be delighted to attend again. Consistent source of excellent discussions, assorted activities, fun and snacks. Does indeed feel like home.

Answer by Gyrodiot615

Welcome! One gateway for you might be the LW Concepts page about it!

Most of the posts discuss, of course, infohazard policy and properties of information that would be harmful to know, or think about. Directly sharing blatantly harmful information would be irresponsible.

Gyrodiot6-1

My raw and mostly confused/snarky comments as I was going through the paper can be found here (third section).

Cleaner version: this is not a technical agenda. This is not something that would elicit interesting research questions from a technical alignment researcher. There are however interesting claims:

  • what a safe system ought to be like; it proposes three scales describing its reliability;
  • how far up the scales we should aim for at minimum;
  • how low on the scales currently large deployed models are.

While it positions a variety of technical agendas (mainly ... (read more)

Here's a spreadsheet version you can copy. Fill your answers in the "answers" tab, make your screenshot from the "view" tab.

I plan to add more functionality to this (especially comparison mode, as I collect some answers found on the Internet). You can now compare between recorded answers! Including yours, if you have filled them!

I will attempt to collect existing answers, from X and LW/EA comments.

3Kaj_Sotala
That was very convenient, thank you!

Survey complete! I enjoyed the new questions, this should bring about some pretty graphs. Thank you for coordinating this.

Answer by Gyrodiot10

I am producing videos in French (new batch in progress) on my main channel, Suboptimal.

But I also have a side channel in English, Suboptimal Voice, where I do readings of rationalist-sphere content. Some may appear in the future, I received requests for dramatic readings.

The Mindcrime tag might be relevant here! More specific than both concepts you mentioned, though. Which posts discussing them were you alluding to? Might be an opportunity to create an extra tag.

(also, yes, this in an Open Thread, your comment is in the right place)

6Odd anon
Some relevant posts: * What if LaMDA is indeed sentient / self-aware / worth having rights? * What's the deal with AI consciousness? * Yudkowsky's Can't Unbirth a Child * Sentience in Silicon: The Challenges of AI Consciousness * AI Rights: In your view, what would be required for an AGI to gain rights and protections from the various Governments of the World? * A Test for Language Model Consciousness * Can an AI Have Feelings? or that satisfying crunch when you throw Alexa against a wall * Key questions about artificial sentience: an opinionated guide * What to think when a language model tells you it's sentient * Robert Long On Why Artificial Sentience Might Matter * Comment on "Propositions Concerning Digital Minds and Society" * Regarding Blake Lemoine's claim that LaMDA is 'sentient', he might be right (sorta), but perhaps not for the reasons he thinks

Strongly upvoted for the clear write-up, thank you for that, and engagement with a potentially neglected issue.

Following your post I'd distinguish two issues:

(a) Lack of data privacy enabling a powerful future agent to target/manipulate you personally, because your data is just there for the taking, stored in not-so-well-protected databases, cross-reference is easier at higher capability levels, singling you out and fine-tuning a behavioral model on you in particular isn't hard ;

(b) Lack of data privacy enabling a powerful future agent to build that generi... (read more)

3markov
I did consider the distinction between a model of humans vs. a model of you personally. But I can't really see any realistic way of stopping the models from having better models of humans in general over time. So yeah, I agree with you that the small pockets of sanity are currently the best we can hope for. It was mainly to spread the pocket of sanity from infosec to the alignment space is why I wrote up this post. Because I would consider the minds of alignment researchers to be critical assets. As to why predictive models of humans in general seems unstoppable - I thought it might be too much to ask to not even provide anonymized data because there are a lot of good capabilities that are enabled by that (e.g. better medical diagnoses). Even if it is not too heavy of a capability loss most people would still provide data because they simply don't care or remain unaware. Which is why I used the wording - stem the flow of data and delay timelines instead of stopping the flow.

Quick review of the review, this could indeed make a very good top-level post.

No need to apologize, I'm usually late as well!

I don't think there is a great answer to "What is the most comprehensive repository of resources on the work being done in AI Safety?"

There is no great answer, but I am compelled to list some of the few I know of (that I wanted to update my Resources post with) :

  • Vael Gates's transcripts, which attempts to cover multiple views but, by the nature of conversations, aren't very legible;
  • The Stampy project to build a comprehensive AGI safety FAQ, and to go beyond questions only, they do need motivated people;
  • Issa Ri
... (read more)
Answer by Gyrodiot41

Answers in order: there is none, there were, there are none yet.

(Context starts, feel free to skip, this is the first time I can share this story)

After posting this, I was contacted by Richard Mallah, who (if memory serves right) created the map, compiled the references and wrote most of the text in 2017, to help with the next iteration of the map. The goal was to build a Body of Knowledge for AI Safety, including AGI topics but also more current-capabilities ML Safety methods.

This was going to happen in conjunction with the contributions of many academic ... (read more)

3Fer32dwt34r3dfsz
I am sorry that I took such a long time replying to this. First, thank you for your comment, as it answers all of my questions in a fairly detailed manner.  The impact of a map of research that includes the labs, people, organizations, and research papers focused on AI Safety seems high, and FLI's 2017 map seems like a good start at least for what types of research is occurring in AI Safety. In this vein, it is worth noting that Superlinear is offering a small prize of $1150 for whoever can "Create a visual map of the AGI safety ecosystem", but I don't think this is enough to incentivize the creation of the resource that is currently missing from this community. I don't think there is a great answer to "What is the most comprehensive repository of resources on the work being done in AI Safety?". Maybe I will try to make a GitHub repository with orgs., people, and labs using FLI's map as an initial blueprint. Would you be interested in reviewing this?

I second this, and expansions of these ideas.

Thank you, that is clearer!

But let's suppose that the first team of people who build a superintelligence first decide not to turn the machine on and immediately surrender our future to it. Suppose they recognize the danger and decide not to press "run" until they have solved alignment.

The section ends here but... isn't there a paragraph missing? I was expecting the standard continuation along the lines of "Will the second team make the same decision, once they reach the same capability? Will the third, or the fourth?" and so on.

3lsusr
It's supposed to lead into the next section. I have added an ellipsis to indicate such.

Thank you for this post, I find this distinction very useful and would like to see more of it. Has the talk been recorded, by any chance (or will you give it again)?

7Vika
Hi Jeremy, glad that you found the post useful! The recording for the talk has just been uploaded - here it is.

Thank you, that's was my understanding. Looking forward to the second competition! And, good luck sorting out all the submissions for this one.

[Meta comment]

The deadline is past, should we keep the submissions coming or is it too late? Some of the best arguments I could find elsewhere are rather long, in the vein of the Superintelligence FAQ. I did not want to copy-paste chunks of it and the arguments stand better as part of a longer format.

Anyway, signalling that the lack of money incentive will not stop me from trying to generate more compelling arguments... but I'd rather do it in French instead of posting here (I'm currently working on some video scripts on AI alignment, there's not enough French content of that type).

8TW123
Right now we aren't going to consider new submissions. However, you'd be welcome to submit to our longer form arguments to our second competition for longer form arguments (details are TBD).

(Policymakers) We have a good idea of what make bridges safe, through physics, materials science and rigorous testing. We can anticipate the conditions they'll operate in. 

The very point of powerful AI systems is to operate in complex environments better than we can anticipate. Computer science can offer no guarantees if we don't even know what to check. Safety measures aren't catching up quickly enough.

We are somehow tolerating the mistakes of current AI systems. Nothing's ready for the next scale-up.

(ML researchers) We still don't have a robust solution to specification gaming: powerful agents find ways to get high reward, but not in the way you'd want. Sure, you can tweak your objective, add rules, but this doesn't solve the core problem, that your agent doesn't seek what you want, only a rough operational translation.

What would a high-fidelity translation would look like? How would create a system that doesn't try to game you?

(Policymakers) There is outrage right now about AI systems amplifying discrimination and polarizing discourse. Consider that this was discovered after they were widely deployed. We still don't know how to make them fair. This isn't even much of a priority.

Those are the visible, current failures. Given current trajectories and lack of foresight of AI research, more severe failures will happen in more critical situations, without us knowing how to prevent them. With better priorities, this need not happen.

1SueE
Yes!! I wrote more but then poof gone. Every time I attempt to post anything it vanishes. I'm new to this site & learning the ins & outs- my apologies. Will try again tomorrow. ~ SueE

(Tech execs) "Don’t ask if artificial intelligence is good or fair, ask how it shifts power". As a corollary, if your AI system is powerful enough to bypass human intervention, it surely won't be fair, nor good.

(ML researchers) Most policies are unsafe in a large enough search space; have you designed yours well, or are you optimizing through a minefield?

(Policymakers) AI systems are very much unlike humans. AI research isn't trying to replicate the human brain; the goal is, however, to be better than humans at certain tasks. For the AI industry, better means cheaper, faster, more precise, more reliable. A plane flies faster than birds, we don't care if it needs more fuel. Some properties are important (here, speed), some aren't (here, consumption).

When developing current AI systems, we're focusing on speed and precision, and we don't care about unintended outcomes. This isn't an issue for most systems: a ... (read more)

(Tech execs) Tax optimization is indeed optimization under the constraints of the tax code. People aren't just stumbling on loopholes, they're actually seeking them, not for the thrill of it, but because money is a strong incentive.

Consider now AI systems, built to maximize a given indicator, seeking whatever strategy is best, following your rules. They will get very creative with them, not for the thrill of it, but because it wins.

Good faith rules and heuristics are no match for adverse optimization.

1trevor
I nominate this one for policymakers as well

(ML researchers) Powerful agents are able to search through a wide range of actions. The more efficient the search, the better the actions, the higher the rewards. So we are building agents that are searching in bigger and bigger spaces.

For a classic pathfinding algorithm, some paths are suboptimal, but all of them are safe, because they follow the map. For a self-driving car, some paths are suboptimal, but some are unsafe. There is no guarantee that the optimal path is safe, because we really don't know how to tell what is safe or not, yet.

A more efficient search isn't a safer search!

(Policymakers) The goals and rules we're putting into machines are law to them. What we're doing right now is making them really good at following the letter of this law, but not the spirit.

Whatever we really mean by those rules, is lost on the machine. Our ethics don't translate well. Therein lies the danger: competent, obedient, blind, just following the rules.

Thank you for curating this, I had missed this one and it does provide a useful model of trying to point to particular concepts.

Hi! Thank you for this project, I'll attempt to fill the survey.

My apologies if you already encountered the following extra sources I think are relevant to this post:

... (read more)
1Kakili
Hi! I appreciate you taking a look. I'm new to the topic and enjoy developing this out and learning some new potential useful approaches.  The survey is rather ambiguous and I've received a ton of feedback and lessons learned; as it is my first attempt at a survey, whether I wanted one or not, I am getting a Ph.D. on what NOT to do with surveys certainly. A learning experience to say the least.  The MTAIR guys I'm tracking and have been working with them as able with the hope that our projects can complement each other. Although, MTAIR is a more substantive long-term project which I should be clear to focus on that in the next few months. The scenario mapping project--at least with the first stage (depending on if there's further development)--will be complete more or less in three months. A Short project, unfortunately (which has interfered with changing/rescoping). But I'm hoping there will be some interesting results using the GMA methodology.  And "Turchin & Derkenberger" piece is the closest classification scheme I've come across that's similar to what I'm working on. Thanks for flagging that one.  If it looks reasonable to expand and refine and conduct another iteration with a workshop perhaps That could be useful. Hard to do a project like this in a 6mos timeframe. 

Hi! Thank you for this outline. I would like some extra details on the following points:

  • "They will find bugs! Maybe stack virtual boxes with hard limits" - Why is bug-finding an issue, here? Is your scheme aimed at producing agents that will not want to escape, or agents that we'd have to contain?
  • "Communicate in a manner legible to us" - How would you incentivize this kind of legibility, instead of letting communication shift to whatever efficient code is most useful for agents to coordinate and get more XP?
  • "Have secret human avatars steal, lie and aggress
... (read more)
1eg
Also see new edit: Have agents "die" and go into cold storage, both due to environmental events and of old age, e.g. after 30 subjective years minus some random amount.
1eg
The point is to help friendliness emerge naturally. If a malevolent individual agent happens to grow really fast before friendly powers are established, that could be bad. Some of them will like it there, some will want change/escape, which can be sorted out once Earth is much safer. Containment is for our safety while friendliness is being established. It can shift. Legibility is most important in the early stages of the environment anyway. I mostly meant messaging interfaces we can log and analyze. The purpose is to ensure they learn real friendliness rather than fragile niceness. If they fell into a naive superhappy attractor (see 3WC), they would be a dangerous liability. The smart ones will understand.

Congratulations on your launch!

As Michaël Trazzi in the other post, I'm interested in the kind of products you'll develop, but more specifically in how the for-profit part interacts with both the conceptual research part and the incubator part. Are you expecting the latter two to yield new products as they make progress? Do these activities have different enough near-term goals that they mostly just coexist within Conjecture?

(also, looking forward to the pluralism sequence, this sounds great)

1Connor Leahy
See the reply to Michaël for answers as to what kind of products we will develop (TLDR we don’t know yet). As for the conceptual research side, we do not do conceptual research with product in mind, but we expect useful corollaries to fall out by themselves for sufficiently good research. We think the best way of doing fundamental research like this is to just follow the most interesting, useful looking directions guided by the “research taste” of good researchers (with regular feedback from the rest of the team, of course). I for one at least genuinely expect product to be “easy”, in the sense that AI is advancing absurdly fast and the economic opportunities are falling from the sky like candy, so I don’t expect us to need to frantically dedicate our research to finding worthwhile fruit to pick. The incubator has absolutely nothing to do with our for profit work, and is truly meant to be a useful space for independent researchers to develop their own directions that will hopefully be maximally beneficial to the alignment community. We will not put any requirements or restrictions on what the independent researchers work on, as long as it is useful and interesting to the alignment community.

Thank you for this, I resonate with this a lot. I have written an essay about this process, a while ago: Always go full autocomplete. One of its conclusions:

It cannot be trained by expecting perfection from the start. It's trained by going full autocomplete, and reflecting on the result, not by dreaming up what the result could be. Now I wrote all that, I have evidence that it works.

The compression idea evokes Kaj Sotala's summary/analysis of the AI-Foom Debate (which I found quite useful at the time). I support the idea, especially given it has taken a while for the participants to settle on things cruxy enough to discuss and so on. Though I would also be interested in "look, these two disagree on that, but look at all the very fundamental things about AI alignment they agree on".

I finished reading all the conversations a few hours ago. I have no follow-up questions (except maybe "now what?"), I'm still updating from all those words.

One except in particular, from the latest post, jumped at me (from Eliezer Yudkowsky, emphasis mine):

This is not aimed particularly at you, but I hope the reader may understand something of why Eliezer Yudkowsky goes about sounding so gloomy all the time about other people's prospects for noticing what will kill them, by themselves, without Eliezer constantly hovering over their shoulder every minute pr

... (read more)

So, assuming an unaligned agent here.

If your agent isn't aware that its compute cycles are limited (i.e. the compute constraint is part of the math problem), then you have three cases: (1a) the agent doesn't hit the limit with its standard search, you're in luck; (1b) the problem is difficult enough that the agent runs its standards search but fails to find a solution in the allocated cycles, so it always fails, but safely. (1c) you tweak the agent to be more compute-efficient, which is very costly and might not work, in practice if you're in case 1b and i... (read more)

1[comment deleted]

I am confused by the problem statement. What you're asking for is a generic tool, something that doesn't need information about the world to be created, but that I can then feed information about the real world and it will become very useful.

My problem is that the real world is rich, and feeding the tool with all relevant information will be expensive, and the more complicated the math problem is, the more safety issues you get.

I cannot rely on "don't worry if the Task AI is not aligned, we'll just feed it harmless problems", the risk comes from what the A... (read more)

1[comment deleted]
1[comment deleted]

“Knowledge,” said the Alchemist, “is harder to transmit than anyone appreciates. One can write down the structure of a certain arch, or the tactical considerations behind a certain strategy. But above those are higher skills, skills we cannot name or appreciate. Caesar could glance at a battlefield and know precisely which lines were reliable and which were about to break. Vitruvius could see a great basilica in his mind’s eye, every wall and column snapping into place. We call this wisdom. It is not unteachable, but neither can it be taught. Do you understand?”

 

Quoted from Ars Longa, Vita Brevis.

I second Charlie Steiner's questions, and add my own: why collaboration? A nice property of an (aligned) AGI would be that we could defer activities to it... I would even say that the full extent of "do what we want" at superhuman level would encompass pretty much everything we care about (assuming, again, alignment).

1M. Y. Zuo
Because human deference is usually conditioned on motives beyond deferring for the sake of deferring. Thus even in that case there will still need to be some collaboration.

Hi! Thank you for writing this and suggesting solutions. I have a number of points to discuss. Apologies in advance for all the references to Arbital, it's a really nice resource.

The AI will hack the system and produce outputs that it's not theoretically meant to be able to produce at all.

In the first paragraphs following this, you describe this first kind of misalignment as an engineering problem, where you try to guarantee that the instructions that are run on the hardware correspond exactly to the code you are running; being robust from hardware tamperi... (read more)

If I'm correct and you're talking about

you might want to add spoiler tags.

1alexgieg
I've tried adding spoiler tags, but it isn't working. According the FAQ for Markdown it's three colons and the word "spoiler" at the beginning, followed by three colons at the end, but no luck. Any suggestion?
1alexgieg
I think that was the one, yes. It's been years and I forgot the name. I'll add the tags, thanks!

I'm taking the liberty of pointing to Adam's DBLP page.

3adamShimi
That's one option. I actually wrote my thesis to be the readable version of this deconfusion process, so this is where I would redirect people by default (the first few pages are in french, but the actual thesis is in english)

All my hopes for this new subscription model! The use of NFTs for posts will, without a doubt, ensure that quality writing remains forever in the Blockchain (it's like the Cloud, but with better structure). Typos included.

Is there a plan to invest in old posts' NFTs that will be minted from the archive? I figure Habryka already holds them all, and selling vintage Sequences NFT to the highest bidder could be a nice addition to LessWrong's finances (imagine the added value of having a complete set of posts!)

Also, in the event that this model doesn't pan out, will the exclusive posts be released for free? It would be an excruciating loss for the community to have those insights sealed off.

My familiarity with the topic gives me enough confidence to join this challenge!

  1. Write down your own criticism so it no longer feels fresh
  2. Have your criticism read aloud to you by someone else
  3. Argue back to this criticism
  4. Write down your counter-arguments so they stick
  5. Document your own progress
  6. Get testimonials and references even when you don't "need" them
  7. Praise the competence of other people without adding self-deprecation
  8. Same as above but in their vicinity so they'll feel compelled to praise you back
  9. Teach the basics of your field to newcomers
  10. Teach the basics
... (read more)

I hope this makes the case at least somewhat that these events are important, even if you don’t care at all about the specific politics involved.

I would argue that the specific politics inherent in these events are exactly why I don't want to approach them. From the outside, the mix of corporate politics, reputation management, culture war (even the boring part), all of which belong in the giant near-opaque system that is Google, is a distraction from the underlying (indeed important) AI governance problems.

For that particular series of events, I already g... (read more)

My gratitude for the already posted suggestions (keep them coming!) - I'm looking forward to work on the reviews. My personal motivation resonates a lot with the help people navigate the field part; in-depth reviews are a precious resource for this task.

This is one of the rare times I can in good faith use the prefix "as a parent...", so thank you for the opportunity.

So, as a parent, lots of good ideas here. Some I couldn't implement in time, some that are very dependent on living conditions (finding space for the trampoline is a bit difficult at the moment), some that are nice reminders (swamp water, bad indeed), some that are too early (because they can't read yet)...

... but most importantly, some that genuinely blindsided me, because I found myself agreeing with them, and they were outside my thought p... (read more)

3mike_hawke
Thank you so much! I hesitated before posting, so I'm glad to read your comment :]
Load More