Often you can compare your own Fermi estimates with those of other people, and that’s sort of cool, but what’s way more interesting is when they share what variables and models they used to get to the estimate. This lets you actually update your model in a deeper way.
Basically all ideas/insights/research about AI is potentially exfohazardous. At least, it's pretty hard to know when some ideas/insights/research will actually make things better; especially in a world where building an aligned superintelligence (let's call this work "alignment") is quite harder than building any superintelligence (let's call this work "capabilities"), and there's a lot more people trying to do the latter than the former, and they have a lot more material resources.
Ideas about AI, let alone insights about AI, let alone research results about AI, should be kept to private communication between trusted alignment researchers. On lesswrong, we should focus on teaching people the rationality skills which could help them figure out insights that help them build any superintelligence, but are more likely to first give them insights...
Note that I agree with your sentiment here, although my concrete argument is basically what LawrenceC wrote as a reply to this post.
[I'm posting this as a very informal community request in lieu of a more detailed writeup, because if I wait to do this in a much more careful fashion then it probably won't happen at all. If someone else wants to do a more careful version that would be great!]
By crux here I mean some uncertainty you have such that your estimate for the likelihood of existential risk from AI - your "p(doom)" if you like that term - might shift significantly if that uncertainty were resolved.
More precisely, let's define a crux as a proposition such that: (a) your estimate for the likelihood of existential catastrophe due to AI would shift a non-trivial amount depending on whether that proposition was true or false; (b) you think there's at least...
Sylvia is a philosopher of science. Her focus is probability and she has worked on a few theories that aim to extend and modify the standard axioms of probability in order to tackle paradoxes related to infinite spaces. In particular there is a paradox of the "infinite fair lottery" where within standard probability it seems impossible to write down a "fair" probability function on the integers. If you give the integers any non-zero probability, the total probability of all integers is unbounded, so the function is not normalisable. If you give the integers zero probability, the total probability of all integers is also zero. No other option seems viable for a fair distribution.
...This paradox arises in a number of places within cosmology, especially in the context of
Thanks for posting Mako. I even mention Effective Altruism/Longtermism at one point in the video!
Announcing the first academic Mechanistic Interpretability workshop, held at ICML 2024! I think this is an exciting development that's a lagging indicator of mech interp gaining legitimacy as an academic field, and a good chance for field building and sharing recent progress!
We'd love to get papers submitted if any of you have relevant projects! Deadline May 29, max 4 or max 8 pages. We welcome anything that brings us closer to a principled understanding of model internals, even if it's not "traditional” mech interp. Check out our website for example topics! There's $1750 in best paper prizes. We also welcome less standard submissions, like open source software, models or datasets, negative results, distillations, or position pieces.
And if anyone is attending ICML, you'd be very welcome at the workshop!...
Looks relevant to me on a skim! I'd probably want to see some arguments in the submission for why this is useful tooling for mech interp people specifically (though being useful to non mech interp people too is a bonus!)
Meta: I'm writing this in the spirit of sharing negative results, even if they are uninteresting. I'll be brief. Thanks to Aaron Scher for lots of conversations on the topic.
Problem statement
You are given a sequence of 100 random digits. Your aim is to come up with a short prompt that causes an LLM to output this string of 100 digits verbatim.
To do so, you are allowed to fine-tune the model beforehand. There is a restriction, however, on the fine-tuning examples you may use: no example may contain more than 50 digits.
Results
I spent a few hours with GPT-3.5 and did not get a satisfactory solution. I found this problem harder than I initially expected it to be.
The question motivating this post's setup is: can you do precise steering...
Effective altruism prides itself on truthseeking. That pride is justified in the sense that EA is better at truthseeking than most members of its reference category, and unjustified in that it is far from meeting its own standards. We’ve already seen dire consequences of the inability to detect bad actors who deflect investigation into potential problems, but by its nature you can never be sure you’ve found all the damage done by epistemic obfuscation because the point is to be self-cloaking.
My concern here is for the underlying dynamics of EA’s weak epistemic immune system, not any one instance. But we can’t analyze the problem without real examples, so individual instances need to be talked about. Worse, the examples that are easiest to understand are almost by definition...
...Originally I felt happy about these, because “mostly agreeing” is an unusually positive outcome for that opening. But these discussions are grueling. It is hard to express kindness and curiosity towards someone yelling at you for a position you explicitly disclaimed. Any one of these stories would be a success but en masse they amount to a huge tax on saying anything about veganism, which is already quite labor intensive.
The discussions could still be worth it if it changed the arguer’s mind, or at least how they approached the next argument. But I don’t g
Hello! My name is Amy.
This is my first LessWrong post. I'm about somewhat certain it will be deleted, but I'm giving it a shot anyway, because I've seen this argument thrown around a few places and I still don't understand. I've read a few chunks of the Sequences, and the fundamentals of rationality sequences.
What makes artificial general intelligence 'inevitable'? What makes artificial superintelligence 'inevitable'? Can't people decide simply not to build AGI/ASI?
I'm very, very new to this whole scene, and while I'm personally convinced AGI/ASI is coming, I haven't really been convinced it's inevitable, the way so many people online (mostly Twitter!) seem convinced.
While I'd appreciate to hear your thoughts, what I'd really love is to get some sources on this. What are the best sequences to read on this topic? Are there any studies or articles which make this argument?
Or is this all just some ridiculous claim those 'e/acc' people cling to?
Hope this doesn't get deleted! Thank you for your help!
In the war example, wars are usually negative sum for all involved, even in the near-term. And so while they do happen, wars are pretty rare, all things considered.
Meanwhile, the problem with AI development is that that there are enormous financial incentives for building increasingly more powerful AI, right up to the point of extinction. Which also means that you need not some but all people from refraining from developing more powerful AI. This is a devilishly difficult coordination problem. What you get by default, absent coordination, is that everyone ...
The beauty industry offers a large variety of skincare products (marketed mostly at women), differing both in alleged function and (substantially) in price. However, it's pretty hard to test for yourself how much any of these product help. The feedback loop for things like "getting less wrinkles" is very long.
So, which of these products are actually useful and which are mostly a waste of money? Are more expensive products actually better or just have better branding? How can I find out?
I would guess that sunscreen is definitely helpful, and using some moisturizers for face and body is probably helpful. But, what about night cream? Eye cream? So-called "anti-aging"? Exfoliants?
A simplistic model of your metabolism is that you have two states:
A common theme in scientific anti-aging is that you need to balance both states and that the modern life leads us to spend too long in the anabolic state (in a state of abundance, well fed, moderate temperature and not physically stressed). Anabolic interventions can lead to good outcomes in the short-term and quick results, but can potentially be...
A person at our local LW meetup (not active at LW.com) tested various Soylent alternatives that are available in Europe and wrote a post about them:
______________________
Over the course of the last three months, I've sampled parts of the
european Soylent alternatives to determine which ones would work for me
longterm.
- The prices are always for the standard option and might differ for
e.g. High Protein versions.
- The prices are always for the amount where you get the cheapest
marginal price (usually around a one month supply, i.e. 90 meals)
- Changing your diet to Soylent alternatives quickly leads to increased
flatulence for some time - I'd recommend a slow adoption.
- You can pay for all of them with Bitcoin.
- The list is...
I haven't paid attention to this recently (I have small kids, so we need to cook anyway), but I think it is magnesium and calcium -- they somehow interfere with each other's absorption.
Just a random thing I found in google, but didn't read it: https://pubmed.ncbi.nlm.nih.gov/1211491/
(Plus there is a more general concern about what other similar relations may exist that no one has studied yet, because most people do not eat like "I only eat X at the same time as Y, mixed together".)