Fixed. Thanks :)
Thanks for the feedback on how to parse out feedback :)
We do have logs for everything, but as Zack pointed out we don't currently have the processes in place to automatically recover specific inputs from the logs which were meant as feedback.
Newcomers to the AI Safety arguments might be under the impression that there will be discrete cutoffs, i.e. either we have HLAI or we dont. The point of (t,n) AGI is to give a picture of what a continuous increase in capabilities looks like. It is also slightly more formal than the simple "words based" definitions of AGI. If you know of a more precise mathematical formulation of the notion of general and super intelligences, I would love if you could point me towards it so that I can include that in the post.
As for Four Background Claims, the reason for i...
I think the point of Bio Anchors was to give a big upper bound, and not say this is exactly when it will happen. At least that is how I perceive it. People who might be at a 101 level still probably have the impression that capabilities heavy AI is like multiple decades if not centuries away. The reason I have bio anchors here, is to try to point towards the fact that we have quite likely at most until 2048. Then based on that upper bound we can scale back further.
We have the recent OpenAI report that extends bio anchors - What a compute-centric framework ...
Thanks for the feedback. I actually had an entire subsection in an earlier draft that covered Reward is not the optimization target. I decided to move it to the upcoming chapter 3 which covers optimization, goal misgen and inner alignment. I thought it would fit better as an intro section there since it ties the content back to the previous chapter, while also differentiating rewards from objectives. This flows well into differentiating which goals the system is actually pursuing.
TOR is way too slow and google hates serving content to TOR users. I2P might be faster than TOR but the current adoption is way too low. Additionally, it doesn't help that identity persistence is a regulatory requirement in most jurisdictions because it helps traceability against identity theft, financial theft, fraud, etc... Cookie cleaning means they have to log in every time which for most people is too annoying.
I acknowledge that there are ways to technically poison existing data. The core problem though is finding things that both normal people and al...
I understand your original comment a lot better now. My understanding of what you said is that open source intelligence that anyone provides through their public persona is revealing more than enough information to be damaging. The little that is sent over encrypted channels is just cherries on the cake. So the only real way to avoid manipulation is to first hope that you have not been a very engaged member of the internet for the last decade, and also primarily communicate over private channels.
I suppose I just underestimated how much people actually post...
I am trying to be as realistic as I can while realizing that privacy is inversely proportional to convenience.
So no, of course you should not stop making lesswrong posts.
The main things I suggested were - removing the ability to use data by favoring E2EE, and additionally removing the ability to hoard data, by favoring decentralized (or local) storage and computation.
As an example just favor E2EE services for collaborating instead of drive, dropbox, or office suite if you have the ability to do so. I agree that this doesn't solve the problem but at least i...
I did consider the distinction between a model of humans vs. a model of you personally. But I can't really see any realistic way of stopping the models from having better models of humans in general over time. So yeah, I agree with you that the small pockets of sanity are currently the best we can hope for. It was mainly to spread the pocket of sanity from infosec to the alignment space is why I wrote up this post. Because I would consider the minds of alignment researchers to be critical assets.
As to why predictive models of humans in general seems unstop...
Thanks for pointing that out! It's embarrassing that I made a mistake, but it's also relieving in some sense to learn that the impacts were not as I had thought them to be.
I hope this error doesn't serve to invalidate the entire post. I don't really know what the post-publishing editing etiquette is, but I don't want to keep anything in the post that might serve as misinformation so I'll edit this line out.
Please let me know if there are any other flaws you find and I'll get them fixed.
Hey, I just wanted to write a quick update. Since you mentioned you will be using the 2023 summaries around Feb. Unfortunately, it seems like the AGISF syllabus is still very fluid, and the readings are still changing as the current iteration of the course is progressing. Which basically means, that the only realistic target for getting those done, is by the end of this current AGISF iteration. Sorry if that causes inconvenience.
Can a DL-based system still end up causing catastrophic damage before we ever even manage to get to ASI?
Hey Dusan! Yes, Of course, you have permission to translate these summaries. It's awesome that you are doing that!
Thanks for your suggestion. Yeah, this comment serves as blanket permission to anyone who wants to translate to freely do so.
Thanks for the comment! I'll add a sentence or a footnote for both loss, and weights in the sections you mentioned. As for forecasting in section 5.2, that claim is imagining something like
This is slightly different from what is happening currently. Models are not un... (read more)