I filled out the survey. Thank you so much for running this!
Oh, glad I scrolled to find this comment. Adding a request for France, which does have charity tax deductions... but needs an appropriate receipt.
Could you provide an example of prediction the Γ Framework makes which highlights the divergence between it and the Standard Model? Especially in cases the Standard Model falls short of describing reality well enough?
Best weekend of the year. Been there in 2017, 2018, 2019, 2023, will be delighted to attend again. Consistent source of excellent discussions, assorted activities, fun and snacks. Does indeed feel like home.
Welcome! One gateway for you might be the LW Concepts page about it!
Most of the posts discuss, of course, infohazard policy and properties of information that would be harmful to know, or think about. Directly sharing blatantly harmful information would be irresponsible.
My raw and mostly confused/snarky comments as I was going through the paper can be found here (third section).
Cleaner version: this is not a technical agenda. This is not something that would elicit interesting research questions from a technical alignment researcher. There are however interesting claims:
While it positions a variety of technical agendas (mainly ...
Here's a spreadsheet version you can copy. Fill your answers in the "answers" tab, make your screenshot from the "view" tab.
I plan to add more functionality to this (especially comparison mode, as I collect some answers found on the Internet). You can now compare between recorded answers! Including yours, if you have filled them!
I will attempt to collect existing answers, from X and LW/EA comments.
Survey complete! I enjoyed the new questions, this should bring about some pretty graphs. Thank you for coordinating this.
I am producing videos in French (new batch in progress) on my main channel, Suboptimal.
But I also have a side channel in English, Suboptimal Voice, where I do readings of rationalist-sphere content. Some may appear in the future, I received requests for dramatic readings.
The Mindcrime tag might be relevant here! More specific than both concepts you mentioned, though. Which posts discussing them were you alluding to? Might be an opportunity to create an extra tag.
(also, yes, this in an Open Thread, your comment is in the right place)
Strongly upvoted for the clear write-up, thank you for that, and engagement with a potentially neglected issue.
Following your post I'd distinguish two issues:
(a) Lack of data privacy enabling a powerful future agent to target/manipulate you personally, because your data is just there for the taking, stored in not-so-well-protected databases, cross-reference is easier at higher capability levels, singling you out and fine-tuning a behavioral model on you in particular isn't hard ;
(b) Lack of data privacy enabling a powerful future agent to build that generi...
Quick review of the review, this could indeed make a very good top-level post.
No need to apologize, I'm usually late as well!
I don't think there is a great answer to "What is the most comprehensive repository of resources on the work being done in AI Safety?"
There is no great answer, but I am compelled to list some of the few I know of (that I wanted to update my Resources post with) :
Answers in order: there is none, there were, there are none yet.
(Context starts, feel free to skip, this is the first time I can share this story)
After posting this, I was contacted by Richard Mallah, who (if memory serves right) created the map, compiled the references and wrote most of the text in 2017, to help with the next iteration of the map. The goal was to build a Body of Knowledge for AI Safety, including AGI topics but also more current-capabilities ML Safety methods.
This was going to happen in conjunction with the contributions of many academic ...
I second this, and expansions of these ideas.
Thank you, that is clearer!
But let's suppose that the first team of people who build a superintelligence first decide not to turn the machine on and immediately surrender our future to it. Suppose they recognize the danger and decide not to press "run" until they have solved alignment.
The section ends here but... isn't there a paragraph missing? I was expecting the standard continuation along the lines of "Will the second team make the same decision, once they reach the same capability? Will the third, or the fourth?" and so on.
Thank you for this post, I find this distinction very useful and would like to see more of it. Has the talk been recorded, by any chance (or will you give it again)?
Thank you, that's was my understanding. Looking forward to the second competition! And, good luck sorting out all the submissions for this one.
[Meta comment]
The deadline is past, should we keep the submissions coming or is it too late? Some of the best arguments I could find elsewhere are rather long, in the vein of the Superintelligence FAQ. I did not want to copy-paste chunks of it and the arguments stand better as part of a longer format.
Anyway, signalling that the lack of money incentive will not stop me from trying to generate more compelling arguments... but I'd rather do it in French instead of posting here (I'm currently working on some video scripts on AI alignment, there's not enough French content of that type).
(Policymakers) We have a good idea of what make bridges safe, through physics, materials science and rigorous testing. We can anticipate the conditions they'll operate in.
The very point of powerful AI systems is to operate in complex environments better than we can anticipate. Computer science can offer no guarantees if we don't even know what to check. Safety measures aren't catching up quickly enough.
We are somehow tolerating the mistakes of current AI systems. Nothing's ready for the next scale-up.
(ML researchers) We still don't have a robust solution to specification gaming: powerful agents find ways to get high reward, but not in the way you'd want. Sure, you can tweak your objective, add rules, but this doesn't solve the core problem, that your agent doesn't seek what you want, only a rough operational translation.
What would a high-fidelity translation would look like? How would create a system that doesn't try to game you?
(Policymakers) There is outrage right now about AI systems amplifying discrimination and polarizing discourse. Consider that this was discovered after they were widely deployed. We still don't know how to make them fair. This isn't even much of a priority.
Those are the visible, current failures. Given current trajectories and lack of foresight of AI research, more severe failures will happen in more critical situations, without us knowing how to prevent them. With better priorities, this need not happen.
(Tech execs) "Don’t ask if artificial intelligence is good or fair, ask how it shifts power". As a corollary, if your AI system is powerful enough to bypass human intervention, it surely won't be fair, nor good.
(ML researchers) Most policies are unsafe in a large enough search space; have you designed yours well, or are you optimizing through a minefield?
(Policymakers) AI systems are very much unlike humans. AI research isn't trying to replicate the human brain; the goal is, however, to be better than humans at certain tasks. For the AI industry, better means cheaper, faster, more precise, more reliable. A plane flies faster than birds, we don't care if it needs more fuel. Some properties are important (here, speed), some aren't (here, consumption).
When developing current AI systems, we're focusing on speed and precision, and we don't care about unintended outcomes. This isn't an issue for most systems: a ...
(Tech execs) Tax optimization is indeed optimization under the constraints of the tax code. People aren't just stumbling on loopholes, they're actually seeking them, not for the thrill of it, but because money is a strong incentive.
Consider now AI systems, built to maximize a given indicator, seeking whatever strategy is best, following your rules. They will get very creative with them, not for the thrill of it, but because it wins.
Good faith rules and heuristics are no match for adverse optimization.
(ML researchers) Powerful agents are able to search through a wide range of actions. The more efficient the search, the better the actions, the higher the rewards. So we are building agents that are searching in bigger and bigger spaces.
For a classic pathfinding algorithm, some paths are suboptimal, but all of them are safe, because they follow the map. For a self-driving car, some paths are suboptimal, but some are unsafe. There is no guarantee that the optimal path is safe, because we really don't know how to tell what is safe or not, yet.
A more efficient search isn't a safer search!
(Policymakers) The goals and rules we're putting into machines are law to them. What we're doing right now is making them really good at following the letter of this law, but not the spirit.
Whatever we really mean by those rules, is lost on the machine. Our ethics don't translate well. Therein lies the danger: competent, obedient, blind, just following the rules.
Thank you for curating this, I had missed this one and it does provide a useful model of trying to point to particular concepts.
Hi! Thank you for this project, I'll attempt to fill the survey.
My apologies if you already encountered the following extra sources I think are relevant to this post:
Hi! Thank you for this outline. I would like some extra details on the following points:
Congratulations on your launch!
As Michaël Trazzi in the other post, I'm interested in the kind of products you'll develop, but more specifically in how the for-profit part interacts with both the conceptual research part and the incubator part. Are you expecting the latter two to yield new products as they make progress? Do these activities have different enough near-term goals that they mostly just coexist within Conjecture?
(also, looking forward to the pluralism sequence, this sounds great)
Thank you for this, I resonate with this a lot. I have written an essay about this process, a while ago: Always go full autocomplete. One of its conclusions:
It cannot be trained by expecting perfection from the start. It's trained by going full autocomplete, and reflecting on the result, not by dreaming up what the result could be. Now I wrote all that, I have evidence that it works.
The compression idea evokes Kaj Sotala's summary/analysis of the AI-Foom Debate (which I found quite useful at the time). I support the idea, especially given it has taken a while for the participants to settle on things cruxy enough to discuss and so on. Though I would also be interested in "look, these two disagree on that, but look at all the very fundamental things about AI alignment they agree on".
I finished reading all the conversations a few hours ago. I have no follow-up questions (except maybe "now what?"), I'm still updating from all those words.
One except in particular, from the latest post, jumped at me (from Eliezer Yudkowsky, emphasis mine):
...This is not aimed particularly at you, but I hope the reader may understand something of why Eliezer Yudkowsky goes about sounding so gloomy all the time about other people's prospects for noticing what will kill them, by themselves, without Eliezer constantly hovering over their shoulder every minute pr
So, assuming an unaligned agent here.
If your agent isn't aware that its compute cycles are limited (i.e. the compute constraint is part of the math problem), then you have three cases: (1a) the agent doesn't hit the limit with its standard search, you're in luck; (1b) the problem is difficult enough that the agent runs its standards search but fails to find a solution in the allocated cycles, so it always fails, but safely. (1c) you tweak the agent to be more compute-efficient, which is very costly and might not work, in practice if you're in case 1b and i...
I am confused by the problem statement. What you're asking for is a generic tool, something that doesn't need information about the world to be created, but that I can then feed information about the real world and it will become very useful.
My problem is that the real world is rich, and feeding the tool with all relevant information will be expensive, and the more complicated the math problem is, the more safety issues you get.
I cannot rely on "don't worry if the Task AI is not aligned, we'll just feed it harmless problems", the risk comes from what the A...
“Knowledge,” said the Alchemist, “is harder to transmit than anyone appreciates. One can write down the structure of a certain arch, or the tactical considerations behind a certain strategy. But above those are higher skills, skills we cannot name or appreciate. Caesar could glance at a battlefield and know precisely which lines were reliable and which were about to break. Vitruvius could see a great basilica in his mind’s eye, every wall and column snapping into place. We call this wisdom. It is not unteachable, but neither can it be taught. Do you understand?”
Quoted from Ars Longa, Vita Brevis.
I second Charlie Steiner's questions, and add my own: why collaboration? A nice property of an (aligned) AGI would be that we could defer activities to it... I would even say that the full extent of "do what we want" at superhuman level would encompass pretty much everything we care about (assuming, again, alignment).
Hi! Thank you for writing this and suggesting solutions. I have a number of points to discuss. Apologies in advance for all the references to Arbital, it's a really nice resource.
The AI will hack the system and produce outputs that it's not theoretically meant to be able to produce at all.
In the first paragraphs following this, you describe this first kind of misalignment as an engineering problem, where you try to guarantee that the instructions that are run on the hardware correspond exactly to the code you are running; being robust from hardware tamperi...
All my hopes for this new subscription model! The use of NFTs for posts will, without a doubt, ensure that quality writing remains forever in the Blockchain (it's like the Cloud, but with better structure). Typos included.
Is there a plan to invest in old posts' NFTs that will be minted from the archive? I figure Habryka already holds them all, and selling vintage Sequences NFT to the highest bidder could be a nice addition to LessWrong's finances (imagine the added value of having a complete set of posts!)
Also, in the event that this model doesn't pan out, will the exclusive posts be released for free? It would be an excruciating loss for the community to have those insights sealed off.
My familiarity with the topic gives me enough confidence to join this challenge!
I hope this makes the case at least somewhat that these events are important, even if you don’t care at all about the specific politics involved.
I would argue that the specific politics inherent in these events are exactly why I don't want to approach them. From the outside, the mix of corporate politics, reputation management, culture war (even the boring part), all of which belong in the giant near-opaque system that is Google, is a distraction from the underlying (indeed important) AI governance problems.
For that particular series of events, I already g...
My gratitude for the already posted suggestions (keep them coming!) - I'm looking forward to work on the reviews. My personal motivation resonates a lot with the help people navigate the field part; in-depth reviews are a precious resource for this task.
This is one of the rare times I can in good faith use the prefix "as a parent...", so thank you for the opportunity.
So, as a parent, lots of good ideas here. Some I couldn't implement in time, some that are very dependent on living conditions (finding space for the trampoline is a bit difficult at the moment), some that are nice reminders (swamp water, bad indeed), some that are too early (because they can't read yet)...
... but most importantly, some that genuinely blindsided me, because I found myself agreeing with them, and they were outside my thought p...
The post makes clear that two very different models of the world will lead to very different action steps, and the "average" of those steps isn't what follows the average of probabilities. See how the previous sentence felt awkward and technical, compared to the story? Sure, it's much longer, but the point gets across better, that's the value. I have added this story to my collection of useful parables.
Re-reading it, the language remains technical, one needs to understand a bit more probability theory to get the latter parts. I would like to see a retelling of the story, same points, different style, to test if it speaks to a different audience.