Bachelor in general and applied physics. AI safety researcher wannabe. Interested in agent foundations.
Email: roman.malov27@gmail.com
GitHub: https://github.com/RomanMalov
TG channel (in Russian): https://t.me/healwithcomedy
But staying on the frontier seems to be a really hard job. Lots of new research comes every day, and scientists struggle to follow it. New research has lots of value while it's hot, and loses it as the field progresses and finds itself a part of general theory (and learning it is a much more worthwhile use of time).
Which does introduce the question: if you are not currently at the cutting edge and actively advancing your field, why follow new research at all? After a bit of time, the field would condense the most important and useful research into neat textbooks and overview articles, and reading them when they appear would be a much more efficient use of time. While you are not at the cutting edge — read condensations of previous works until you get there.
Also, it seems like there is not much of that in the field of alignment. I want there to be more work on unifying (previously frontier) alignment research and more effort to construct paradigms in this preparadigmatic field (but maybe I just haven't looked hard enough).
It doesn't matter if you want to dance at your friend's wedding; if you think the wedding would be "better" if more people danced, and you dancing would meaningfully contribute to others being more likely to dance, you should be dancing. You should incorporate the positive externality of the social contagion effect of your actions for most things you do (eg if should you drink alcohol, bike, use Twitter etc.).
Yes! I wish more people adopted FDT/UDT style decision theory. We already (to some extent, and not deliberately) borrow wisdom from timeless decision theories (i.e. “treat others like you would like them to treat you”, “if everybody thought like that the world would be on fire” etc.), but not for the small scale low stakes social situations, and this exactly the point you bring here.
I haven't fully thought it out, but there might be some counterargument in the style of anti-Pascals-mugging counterargument, where if your priors say that you might be modeled by a hostile entity, there is an incentive to confuse it, and it's all going to balance out (somehow) and you just need to use your decision theory as if you are real always.
Great post!
It ironically has lot's of youtube links, and when I instinctively clicked on one, I was stopped by LeechBlock plugin I installed to make my youtube-related screentime shorter (due to Rob Miles's advice).
we know that at every intermediate step before the final result the cure rate was less than 70%.
Actually, why? Bessel was worrying whether the rate of cures is greater than 60%, and after the 99th experiment there were either 69 cures out of 99 patients (~69.7%) or 70 cures out of 99 patients (~70.7%), so he could've stopped there and that would not hurt his reputation.
IIRC, when we discussed this essay in the reading group, one member said that Eliezer here did a bad job describing the thought experiment, and that actually both experimenters precommitted to treat at least 100 patients.
Rules can generate examples. For instance: DALLE-3 is a rule according to which different examples (images) are generated.
From examples, rules can be inferred. For example: with a sufficient dataset of images and their names, a DALLE-3 model can be trained on it.
In computer science, there is a concept called Kolmogorov complexity of data. It is (roughly) defined as the length of the shortest program capable of producing that data.
Some data are simple and can be compressed easily; some are complex and harder to compress. In a sense, the task of machine learning is to find a program of a given size that serves as a "compression" of the dataset.
In the real world, although knowing the underlying rule is often very useful, sometimes it is more practical to use a giant look-up table (GLUT) of examples. Sometimes you need to memorize the material instead of trying to "understand" it.
Sometimes there are examples that are more complex than the rule that generated them. For example, in the interval [0;1] (which is quite easy to describe, the rule being: all numbers are not greater than 1 and not less than 0), there exists a number containing all the works of Shakespeare (which definitely cannot be compressed to a description comparable to that of the interval [0;1]).
Or, сonsider the program that outputs every natural number from 1 to (which is very short, because the Kolmogorov complexity of is low) will at some point produce a binary encoding of LOTR. In that case, the complexity lies in the starting index, the map for finding the needle in the haystack is as valuable (and as complex) as the needle itself.
Properties follow from rules. It is not necessary to know about every example of a rule in order to have some information about all of them. Moreover, all examples together can have less information (or Kolmogorov complexity) than sum of individual Kolmogorov complexities (as in example above).
What are your timelines?