Although as I note elsewhere I’m starting to have some ideas of how something with elements of this might have a chance of working.
I've missed where you discussed this. Does anyone have a link or can anyone expound?
I think the problem of "actually specifying to an AI to do something physical, in reality, like 'create a copy of strawberry down to the cellular but not molecular level', and not just manipulate its own sensors to believe it perceives itself achieving that even if it accomplishes real things in the world to do that" is a problem that is very deeply related to physics, and is almost certainly dependent on the physical laws of world more than some abstract disembodied notion of an agent.
You're thinking much too small, this only stops things occurring that are causally *downstream* of us. Things will still occur in other timelines, and we should prevent though things from happening too. I propose we create a "hyperintelligence" that acausally trades across timelines or invents time travel to prevent anything from happening in any other universe or timeline as well. Then we'll be safe from AI ruin.
Thanks for the great link. Fine-tuning leading to mode collapse wasn't the core issue underlying my main concern/confusion (intuitively that makes sense). paulfchristiano's reply leaves me now mostly completely unconfused, especially with the additional clarification from you. That said I am still concerned; this makes RLHF seem very 'flimsy' to me.
I was also thinking the same thing as you, but after reading paulfchristiano's reply, I now think it's that you can use the model to use generate probabilities of next tokens, and that those next tokens are correct as often as those probabilities. This is to say it's not referring to the main way of interfacing with GPT-n (wherein a temperature schedule determines how often it picks something other than the option with the highest probability assigned; i.e. not asking the model "in words" for its predicted probabilities).
GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it’s likely to make a mistake. Interestingly, the base pre-trained model is highly calibrated (its predicted confidence in an answer generally matches the probability of being correct). However, through our current post-training process, the calibration is reduced.
What??? This is so weird and concerning.
Not a new phenomenon. Fine-tuning leads to mode collapse, this has been pointed out before: Mysteries of mode collapse
“However, through our current post-training process, the calibration is reduced.” jumped out at me too.
I graduated college in four years with two bachelors and a masters. Some additions:
AP Tests:
You don't need to take the AP course to take the test at all. This is NOT a requirement. If your high school doesn't offer the test you may need to take it at another school, though. Also unfortunate is that if it is the same as when I did this, your school probably gets test fees waived for students who took the course and thus you may need to pay for the test. https://apstudents.collegeboard.org/faqs/can-i-register-ap-exam-if-my-school-doesnt-offer-ap-courses-or-a...
The big accusation, I think, is of sub-maximal procreation. If we cared at all about the genetic proliferation that natural selection wanted for us, then this time of riches would be a time of fifty-child families, not one of coddled dogs and state-of-the-art sitting rooms.
Natural selection, in its broadest, truest, (most idiolectic?) sense, doesn’t care about genes.
So what did natural selection want for us? What were we selected for? Existence.
I think there might be a meaningful way to salvage the colloquial concept of "humans have overt...
(This critique contains not only my own critiques, but also critiques I would expect others on this site to have)
First, I don't think that you've added anything new to the conversation. Second, I don't think what you have mentioned even provides a useful summary of the current state of the conversation: it is neither comprehensive, nor the strongest version of various arguments already made. Also, I would prefer to see less of this sort of content on LessWrong. Part of that might be because it is written for a general audience, and LessWrong is not very li...
I haven't quite developed an opinion on the viability of this strategy yet, but I would like to appreciate that you produced a plausible sounding scheme that I, a software engineer not mathematician, feel like I could actually probably contribute to. I would like to request people come up with MORE proposals similar along this dimension and/or readers of this comment to point me to other such plausible proposals. I think I've seen some people consider potential ways for non-technical people to help, but I feel like I've seen disproportionately few ways for technically competent but not theoretically/mathematically minded to help.
If I discover something first, our current culture doesn't assign much value to the second person finding it, is why I mentioned exploration as not-positive sum. Avoiding death literally requires free energy, a limited resource, but I realize that's an oversimplification at the scale we're talking.
I see. I feel like honor/idealism/order/control/independence don't cleanly decompose to these four even with a layer of abstraction, but your list was more plausible than I was expecting.
That said, I think an arbitrary inter-person interaction with respect to these desires is pretty much guaranteed to be zero or negative sum, as they all depend on limited resources. So I'm not sure what aligning on the values would mean in terms of helping cooperation.
I don't think most people are consciously aware, but I think most people are unconsciously aware that "it is merely their priorities that are different, rather than their fundamental desires and values" and furthermore our society largely looks structured such that only the priorities are different, but that the priorities differ significantly enough because of the human-sparseness of value-space.
I am skeptical of psychology research in general, but my cursory exploration has suggested to me that it is potentially reasonable to think there are 16. My best estimates are probably that there literally are 100 or more, but that most of those dimension largely don't have big variance/recognizable gradations/are lost in noise. I think humans are reasonably good at detecting 1 part in 20, and that the 16 estimate above is a reasonable ballpark, meaning I believe that 20^16=6.5E20 is a good approximation of the number of states in the discretized value spa...
I'm not sure why adjacency has to be "proper"; I'm just talking about social networks, where people can be part of multiple groups and transmit ideas and opinions between them.
I approximately mean something as follows:
Take the vector-value model I described previously. Consider some distance metric (such as the L2 norm), D(a, b) where a and b are humans/points in value-space (or mind-space, where a mind can "reject" an idea by having it be insufficiently compatible). Let k be some threshold for communicability of a particular idea. Assum...
I was thinking of issues like the economy, healthcare, education, and the environment.
I disagree and will call any national or global political issues high-hanging fruit. I believe there is low-hanging fruit at the local level, but coordination problems of million or more people are hard.
They can influence the people ideologically adjacent to them, who can influence the people adjacent to them, et cetera.
In my experience, it's not clear that there is really much "proper adjacency." Sufficiently high dimensional spaces make any sort of clustering amb...
I don't think there's many potential negative consequences in trying. My response wasn't a joke so much as taking issue with
It is apparent to me that making human politics more constructive is a low-hanging fruit
I think it really, really is not low hanging fruit. The rights and personhood line seems quite a reasonable course of discussion to go down, but you're frequently talking to people who don't want to apply reason, at least not at the level of conversation.
Religion is a "reasonable choice" in that you buy a package and it's pretty solid and defended ...
Great, now solve pro-choice vs pro-life.
I think most people would agree that at some point there is likely to be diminishing returns. I, and I think the prevailing view on lesswrong, is that the biological constraints you mentioned are actually huge constraints that silicon-based intelligence won't/doesn't have. And the lack of these constraints will push off the point of diminishing returns to a point much past humans.
You can find it here. https://www.glowfic.com/replies/1824457#reply-1824457
I would describe it as extremely minimal spoilers as long as you read only the particular post and not preceding or later ones. The majority of the spoilerability is knowing that the content of the story is even partially related, which you would already learn by reading this post. The remainder of the spoilers is some minor characterization.
At the same time it's basically the only filtering criteria provided besides "software developer job." Having worked a few different SWE jobs, I know that some company cultures which people love are cultures I hate, and vice versa. I would point someone to completely different directions based off a response. Not because I think it's likely they got their multidimensional culture preferences exactly perfectly communicated, but because the search space is so huge it's good to at least have an estimator on how to order what things to look into.
I don't have strong preferences about what the company does. I mostly care about working with a team that has a good culture.
This is pretty subjective, and I would find it helpful to know what sort of culture you're looking for.
so I have forwarded all of these domains to my home page
On my end this does not appear to be working.
Also, nice work.
Disambiguation is a great feature of language, but we can also opt instead to make things maximally ambiguous with my favorite unit system: CCC. All measurements expressed with only the letter C.
A sketch of solution that doesn't involve (traditional) world leaders could look like "Software engineers get together and agree that the field is super fucked, and start imposing stronger regulations and guidelines like traditional engineering disciplines use but on software." This is a way of lowering the cost of alignment tax in the sense that, if software engineers all have a security mindset, or have to go through a security review, there is more process and knowledge related to potential problems and a way of executing a technical solution at the last moment. However, this description is itself is entirely political not technical, yet easily could not reach the awareness of world leaders or the general populace.
My conclusion: Let's start the meme that Alignment (the technical problem) is fundamentally impossible (maybe it is? why think you can control something supposedly smarter than you?) and that you will definitely kill yourself if you get to the point where finding a solution to Alignment is what could keep you alive. Pull a Warhammer 40k, start banning machine learning, and for that matter, maybe computers (above some level of performance) and software. This would put more humans in the loop for the same tasks we have now, which offers more opportunities to...
I think there's an important difference Valentine tries to make with respect to your fourth bullet (and if not, I will make). You perhaps describe the right idea, but the wrong shape. The problem is more like "China and the US both have incentives to bring about AGI and don't have incentives towards safety." Yes deflecting at the last second with some formula for safe AI will save you, but that's as stupid as jumping away from a train at the last second. Move off the track hours ahead of time, and just broker a peace between countries to not make AGI.
Yes, there are those who are so terrified of Covid that they would advise practicing social distancing in the wake of nuclear Armageddon. This is an insight into that type of thinking. I do think keeping your mask on would be wise, but for obvious other reasons.
I saw this too and was very put off to find social distancing being mentioned in a nuclear explosion survival guide, glad I'm not the only one who noticed this. Doubt many would survive (myself included) without the aid of other humans in such an apocalyptic situation, you know, like a crowded...
Ah, I forgot to emphasize that these were things to look into to get better. I don't claim to know EY's lineage. That said, how many people do you think are well versed in cryptography? If someone said, "I am one of very few people who is well versed in cryptography" that doesn't sound particularly wrong to me (if they are indeed well versed). I guess I don't know exactly how many people EY thinks is in this category with him, but people versed enough in cryptography to, say, make their own novel and robust scheme is probably on the order of 1,000-10,000 w...
Cryptography was mentioned in this post in a relevant manner, though I don't have enough experience with it to advocate it with certainty. Some lineages of physics (EY points to Feynman) try to evoke this, though it's pervasiveness has decreased. You may have some luck with Zen. Generally speaking, I think if you look at the Sequences, the themes of physics, security mindset, and Zen are invoked for a reason.
Color blindness is a blind spot in color space.
This is true, but for it to be true you need to take a maybe-not-obvious view either of what "colour space" is or of what the other space in which we're thinking about blind spots is. The following may all already have been in Kyle's head, but it took a minute to get it into mine and I may save someone else some trouble.
So what's not true is that colour-blindness involves being unable to see colours in some subset of an RGB-like space, in the same sort of way as a blind spot is being unable to see things in a particular subset of (x,y,z) space. Having a bl...
I think you forgot to insert "Vaccinations graphs"
It is almost a fully general counter argument. It argues against all knowledge, but to different degrees. You can at least compare the references of symbols to finite calculations that you have already done within your own head, and then use Occam's Razor.
I don't accept "math" as a proper counterexample. Humans doing math aren't always correct, how do you reason about when math is correct?
My argument is less about "finite humans cannot think about infinities perfectly accurately" and more, "your belief that humans can think about infinities at all is predicated upon the assumption (which can only be taken on faith) that the symbol you manipulate relates to reality and its infinities at all."
By what means are you coming to your reasoning about infinite quantities? How do you know the quantities you are operating on are infinite at all?
I am confused how you got to the point of writing such a thoroughly detailed analysis of the application of the math of infinities to ethics while (from my perspective) strawmanning finitism by addressing only ultrafinitism. “Infinities aren’t a thing” is only a "dicey game" if the probability of finitism is less than 100% :). In particular, there's an important distinction between being able to reference the "largest number + 1" and write it down versus referencing it as a symbol as we do, because in our referencing of it as a symbol, in the original fram...
Typo in this sentence: "And probably I we would have had I not started working on The Machine."
I was in a similar position, but I am now at a point where I believe ADHD is negatively affecting my life in way that has overturned my desire to not take medication. It's hard to predict the future, but if you have a cheap or free way to get a diagnosis, I would recommend doing so for your own knowledge and to maybe make getting prescriptions in the future a smidge easier. I think it's really believable that in your current context there are no or nearly no negative repercussions to your ADHD if you have it, but it's hard to be certain of your future contexts, and even to know what aspects of your context would have to change for your symptoms to act (sufficiently) negatively.
To start, I propose a different frame to help you. Ask yourself not "How do I get intuition about information theory?" instead ask "How is information theory informing my intuitions?"
It looks to me like it's more central than is Bayes' Theorem, and that it provides essential context for why and how that theorem is relevant for rationality.
You've already noticed that this is "deep" and "widely applicable." Another way of saying these things is "abstract," and abstraction reflects generalizations over some domain of experience. These generalizations ar...
There's a tag for gears level and in the original post it looks like everyone in the comments was confused even then what gears-level meant, and in particular there were a lot of non-overlapping definitions given. In particular, the author, Valentine, also expresses confusion.
The definition given, however, is:
1. Does the model pay rent? If it does, and if it were falsified, how much (and how precisely) could you infer other things from the falsification?
...2. How incoherent is it to imagine that the model is accurate but that a given variable could be d
Early twin studies of adult individuals have found a heritability of IQ between 57% and 73%,[6] with the most recent studies showing heritability for IQ as high as 80%.[7]
I really enjoyed the post, but something that maybe wasn't the focus of it really stuck out to me.
...i think i felt a little bit of it with Collin when i was trying to help him find a way to exercise regularly. the memory is very hazy, but i think the feeling was focused on the very long list of physical activities that were ruled out; it seemed the solution could not involve Collin having to tolerate discomfort. much like with Gloria and the "bees", i experienced some kind of emotional impulse to be separate from him, to push him away, to judge him to be ina
I appreciate this post a lot. In particular, I think it's cool that you establish a meta-frame, or at least class of frames. Also, I've had debates that definitely have had reachability mismatches in the past and I hope that that I'll be able to just link to this post in the future.
The most frequent debate mismatch I have is on a subject you mention: climate change. I generally take the stance of Clara: the way I view it, it's a coordination problem, and individual action, no matter how reachable, I model as having completely unsubstantial effect. In some ...
To offer a deeper explanation, I personally view the piece as doing the following things:
I don't see any mention of confidence in the article, so I'm having trouble seeing ...
When someone is smarter than you, you cannot tell if they're one level above you or fifty because you literally cannot comprehend their reasoning.
I take issue with this claim, as I believe it to be vastly oversimplified. You can often, if not always, still comprehend their reasoning with additional effort on your behalf. By analogy, a device capable of performing 10 FLOPS can check the calculation of a device that can perform 10 GFLOPS by taking an additional 10^9 factor of time. Even in cases of extreme differences in ability, I think there can be simple ...
What is the correct amount of self praise? Do you have reasons to believe Isusr has made an incorrect evaluation regarding their aptitude? Do you believe that even if the evaluation is correct that the post is still harmful?
I find it quite reasonable that the LessWrong community could benefit from more praise, self or otherwise. I don't have strong signals as to the aptitude of Isusr other than having read some fraction of their posts.
I worry your response comes as an automatic social defense mechanism as opposed to reflecting "real" beliefs and would like to understand what many upvoters find the issue to be.
I would love an excuse to go back and learn QFT. Looking forward to your QFT AI insights :D