Hello everybody!
I have done some commenting & posting around here, but I think a proper introduction is never bad.
I was Marxist for a few years, then I fell out of it, discovered SSC and thereby LW three years ago, started reading the Sequences and the Codex (yes, you now name them together). I very much enjoy the discussions around here, and the fact that LW got resurrected.
I sometimes write things for my personal website about forecasting, obscure programming languages and [REDACTED]. I think I might start cross-posting a bit more (the two last posts on my profile are such cross-posts).
I endorse spending my time reading, meditating, and [REDACTED], but my motivational system often decides to waste time on the internet instead.
I'm looking for a science fiction novel that I believe I first saw mentioned on LessWrong. I don't remember the author, the title, or any of the characters' names. It's about a robot whose intelligence consists of five separate agents, serving different roles, which have to negotiate with each other for control of the body they inhabit and to communicate with humans. That's about all I can remember.
Thanks.
While I'm here, if someone likes Ted Chiang and Greg Egan, who might they read for more of the same? "Non-space-opera rationalist SF that's mainly about the ideas" would be the simplest characterisation. The person in question is not keen on "spaceship stories" like (in his opinion) Iain M. Banks, and was unimpressed by Cixin Liu's "Three-Body Problem". I've suggested HPMoR, of course, but I don't think it took.
I like Bruce Sterling as an author that manages to explore ideas well. His book Distraction is well worth reading for a perspective of how a political system like the current US system might evolve.
Naming Osama Bin Ladin as an important powerplayer in Zeitgeist which was published in 2000, is illustrates his good geopolitical understanding.
The proganists however aren't rational in the sense that HPMOR protagonists are rational.
The tags / concepts system seems to be working very well so far, and the minimal tagging overhead is now sustainable as new posts roll in. Thank you, mod team!
I am doing an art criticism project that’s very important to me, and I’m looking for high res digital versions the art in the following books.
Help with getting these via a university library, or pointers to where I could buy an electronic copy of any of these is much appreciated.
You wrote in markdown, but we have a WYSIWYG editor! Just highlight a piece of text to see the edit menu popup, and you can put the link in that way. Or use cmd-k. Anyway, FTFY.
The first two links are broken for me but Amazon seems to have Beksinski 1-4: https://www.amazon.com/Beksinski-1-4-complete-Zdzislaw/dp/B077W8ZCZY/ref=pd_sbs_14_1/141-5378130-9367012?_encoding=UTF8&pd_rd_i=B077W8ZCZY&pd_rd_r=856b6a41-858c-4529-b369-5be519ee9b9a&pd_rd_w=GTPiD&pd_rd_wg=JXU4w&pf_rd_p=ed1e2146-ecfe-435e-b3b5-d79fa072fd58&pf_rd_r=JC0EH8JKZAXCC9ZHXW69&psc=1&refRID=JC0EH8JKZAXCC9ZHXW69
Out of curiosity, what's the art criticism project? The other two things seem very different in kind from this one, on the face of it.
Thanks, I forgot to make it clear I'm looking for digital versions.
I'm making an online museum of ethos (my ethos). I'm using good and bad art and commentary to make my ethos very visible through aesthetics.
I would love to have variable voting so i could give (or take) anywhere between one and my maximum vote strength. The way I'd do it is have each click increase the vote strength by one, and a long press set it to max strength (Keep the current tooltip so people know). then to cancel the vote (whether positive or negative) there would be a small X to the side of the up/down buttons.
I know it has been discussed already, but just wanted to give this as another datapoint. it happens to me a lot that i want to give a post more than 1 karma but less than 5, so i would use this a lot if it was possible.
The Invisible People YouTube channel interviews homeless people. At the end, the interviewer always asks what the interviewee would do if they had three wishes. Usually the answers are about receiving help with money, drugs, or previous relationships. Understandably.
But in Working Actor Now Homeless in Los Angeles, the guy's mind immediately went to things like ending deadly disease and world peace. Things that would help others, not himself.
And it wasn't as if he debated doing things for himself vs. for others. My read is that the thought of doing things for himself didn't even really get promoted to conscious attention. It didn't really occur to him. It looked like it was an obvious choice to him that he would use the wishes to help others.
One of the more amazing things I can recall experiencing. It gave me a much needed boost in my faith in humanity.
Another interpretation would be that the system trains people in Los Angeles in a way where there are certain answers allowed to the question of "what would you do if you had a wish" and the allowed questions aren't selfish things.
If an actor goes to a casting and gets asks for wishes, ending disease and world peace and the safe wishes, drugs aren't.
Dileep George's "AGI comics" are pretty funny! He's only made ~10 of them ever; most are in this series of tweet / comics poking fun of both sides of the Gary Marcus - Yoshua Bengio debate ... see especially this funny take on what is the definition of deep learning, and one related to AGI timelines. :-)
I have a question about attainable utility preservation. Specifically, I read the post "Attainable Utility Preservation: Scaling to Superhuman", and I'm wondering how and agent using the attainable utility implementation in equations 3, 4, and 5 could actually be superhuman. I've been misunderstanding things and mis-explaining things recently, so I'm asking here instead of the post for now to avoid wasting an AI safety researcher's time.
The equations incentivize the AI to take actions that will provide an immediate reward in the next timestep, but penalizes its ability to achieve rewards in later timesteps.
But what if the only way to receive a reward is to do something that will only give a reward several timesteps later? In realistic situations, when can you ever actually accomplish the goal you're trying to accomplish in a single atomic action?
For example, suppose the AI is rewarded for making paperclips, but all it can do in the next timestep is start moving its arm towards wire. If it's just rewarded for making paperclips, and it can't make a paperclip the next timestep, so the AI would instead focus on minimizing impact and not do anything.
I know you could adjust the reward function to reward the AI doing things that you think will help it accomplish your primary goal in the future. For example, you know the AI moving its arm towards the wire is useful, so you could reward that. But then I don't see how the AI could do anything clever or superhuman to make paperclips.
Suppose the AI can come up with a clever means of making paperclips by creating a new form of paperclip-making machine. Presumably, it would take many actions to build before it could be completed. And the person responsible for giving out awards wouldn't be able to anticipate that the exact device the AI is making would be helpful, so I don't see how the person giving out the rewawrds could get the AI to make the clever machine. Or do anything else clever.
Then wouldn't such a reduced-impact agent pretty much just follow the doing what a human would think is most helpful for making paperclips? But then wouldn't the AI pretty much just emulating human, not superhuman, behavior?
I basically don't see the human mimicry frame as a particularly relevant baseline. However, I think I agree with parts of your concern, and I hadn't grasped your point at first.
The [AUP] equations incentivize the AI to take actions that will provide an immediate reward in the next timestep, but penalizes its ability to achieve rewards in later timesteps.
I'd consider a different interpretation. The intent behind the equations is that the agent executes plans using its "current level of resources", while being seriously penalized for gaining resources. It's like if you were allowed to explore, you're currently on land which is 1,050 feet above sea level, and you can only walk on land with elevation between 1,000 and 1,400 feet. That's the intent.
The equations don't fully capture that, and I'm pessimistic that there's a simple way to capture it:
But what if the only way to receive a reward is to do something that will only give a reward several timesteps later? In realistic situations, when can you ever actually accomplish the goal you're trying to accomplish in a single atomic action?
For example, suppose the AI is rewarded for making paperclips, but all it can do in the next timestep is start moving its arm towards wire. If it's just rewarded for making paperclips, and it can't make a paperclip the next timestep, so the AI would instead focus on minimizing impact and not do anything.
I agree that it might be penalized hard here, and this is one reason I'm not satisfied with equation 5 of that post. It penalizes the agent for moving towards its objective. This is weird, and several other commenters share this concern.
Over the last year, I think that the "penalize own AU gain" is worse than "penalize average AU gain", in that I think the latter penalty equation leads to more sensible incentives. I still think that there might be some good way to penalize the agent for becoming more able to pursue its own goal. Equation 5 isn't it, and I think that part of your critique is broadly right.
I hadn't thought about the distinction between gaining and using resources. You can still wreak havoc without getting resources, though, by using them in a damaging way. But I can see why the distinction might be helpful to think about.
It still seems to me that an agent using equation 5 would pretty much act like a human imitator for anything that takes more than one step, so that's why I was using it as a comparison. I can try to explain my reasoning if you want, but I suppose it's a moot point now. And I don't know if I'm right, anyways.
Basically, I'm concerned that most nontrivial things a person wants will take multiple actions, so in most of the steps the AI will be motivated mainly by the reward given in the current step for reward-shaping reasons (as long as it doesn't gain too much power). And doing the action that gives the most immediate reward for reward shaping-reasons sounds pretty much like doing whatever action the human would think is best in that situation. Which is probably what the human (and mimic) would do.
I hadn't thought about the distinction between gaining and using resources. You can still wreak havoc without getting resources, though, by using them in a damaging way. But I can see why the distinction might be helpful to think about.
I explain my thoughts on this in The Catastrophic Convergence Conjecture. Not sure if you've read that, or if you think it's false, or you have another position entirely.
I agree that intelligent agents have a tendency to seek power and that that is a large cause of what makes them dangerous. Agents could potentially cause catastrophes in other ways, but I'm not sure if any are realistic.
As an example, suppose an agent creates powerful self-replicating nanotechnology that makes a pile of paperclips, the agent's goal. However, since they are self-replicating the agent didn't want to spend the time engineering a way to stop replication, the nanobots eat the world.
But catastrophes like this would probably also be dealt with by AUP-preservation, though. At least, if you use the multi-equation impact measure. (If the impact equation only concerns the agent's ability to achieve its own goal, maybe it would let the world be consume after putting up a nanotech-proof barrier around all of its paperclip manufacturing resources. But again, I don't know if that's realistic.)
I'm also concerned agents would create large, catastrophic changes to the world in ways that don't increase their power. For example, an agent who wants to make paperclips might try to create nanotech that assembles the entire world into paperclips. It's not clear to me that this would increase the agent's power much. The agent wouldn't necessarily have any control of the bots, so it would limit the agent to doing with for just its one utility function. And if the agent is intelligent enough to easily discover how to create such technology, actually creating them doesn't sound like it would give it more power than it already had.
If the material for the bots is scarce then making them prevents the AI from making other things, then they might provide a net decrease to the agent's power. And once the world is paperclips, the agent would be limited to just having paperclips available, which could make it pretty weak.
I don't know if you consider the described scenario as seeking power. At least, I don't think it would count as an increase in the agent's impact equation.
I'm wondering how and agent using the attainable utility implementation in equations 3, 4, and 5 could actually be superhuman.
In the "superhuman" analysis post, I was considering whether that reward function would incentivize good policies if you assumed a superintelligently strong optimizer optimized that reward function.
For example, suppose the AI is rewarded for making paperclips, but all it can do in the next timestep is start moving its arm towards wire. If it's just rewarded for making paperclips, and it can't make a paperclip the next timestep, so the AI would instead focus on minimizing impact and not do anything.
Not necessarily; an optimal policy maximizes the sum of discounted reward over time, and so it's possible for the agent to take actions which aren't locally rewarding but which lead to long-term reward. For example, in a two-step game where I can get rewarded on both time steps, I'd pick actions which maximize . In this case, could be 0, but the pair of actions could still be optimal.
I know you could adjust the reward function to reward the AI doing things that you think will help it accomplish your primary goal in the future. For example, you know the AI moving its arm towards the wire is useful, so you could reward that. But then I don't see how the AI could do anything clever or superhuman to make paperclips.
This idea is called "reward shaping" and there's a good amount of literature on it!
Is there much the reduced-impact agent with reward shaping could do that an agent using human mimicry couldn't?
Perhaps it could improve over mimicry by being able to consider all actions, while a human mimic would only in effect consider the actions a human would. But I don't think there are usually many single-step actions to choose from, so I'm guessing this isn't a big benefit. Could the performance improvement come from better understanding the current state than mimics could? I'm not sure when this would make a big difference, though.
I'm also still concerned the reduced-impact agent would find some clever way to cause devastation while avoiding the impact penalty, but I'm less concerned about human mimics causing devastation. Are there other, major risks to using mimicry that the reduced-impact agent avoids?
I've started browsing and posting here a bit so I should introduce myself.
I've been writing online for around five months and put some draft chapters of a book on my website. The objective is to think about how to immunise a society from decline, which basically means trying to find the right balance between creativity and cohesion (not that they are inversely related—it’s quite possible to have neither). Because I can’t buy into any worldview out there today, I’ve tried to systematise my thoughts into a philosophy I call Metasophism. It’s a work in progress, and most of what I read and write links into that in some way.
Prediction mechanisms are something commonly discussed here which I’ve partly integrated, but I need to think more about that which this site will help with I think.
How did I end up here? A commenter on an early post of mine mentioned LW, which I didn’t then frequent even though I was familiar with some of the writers here. That caused me to check it out, and the epistemic culture caused me to stick around.
It seems you've had some success in thinking things through rigorously on your own, so kudos to you! I can relate to not buying into any existing worldview.
Glad you did stick around. Anything in particular about the epistemic culture that you think works especially well or poorly?
Thanks! There seems to be an openness towards error correction which is admirable and unfortunately uncommon.
Do the newest numbers indicate that the new Covid strand isn't that bad after all, for whatever reason? If not, why not?
Edit: Zvi gave a partial answer here.
Killing their host or disabling their host is generally disadvantagous to viruses, so viruses exist in the space where they duplicate enough within a host to spread to other people while still allowing the host to go out and infect other people.
If a virus mutation manage to still be infectious while having less effect on a person it's more likely that don't self-isolate and pass on the virus.
Hello, lesswrongsters (if I can call you like this),
What do you think about the following statement: "You should be relatively skeptical about each of your past impressions, but you should be absolutely non-skeptical about your most current one at a given moment. Not because it was definitely true, but because there is practically no other option."
Please, give me your opinions, criticism, etc. about this.
You should be skeptical about your most current one! It is likely better informed than previous ones, but that doesn't mean you're done processing.
BUT, you need to exercise that skepticism by knowing what your best understanding strongly predicts and what discrepancies should surprise you, not by trying to make yourself give humbler answers.
The first sentence of the quote sounds like a mix of the Buddhist concept of the now plus the financial concept of how the current price of a security reflects all information about its price.
Ok, I will put it a little bit straightforward.
My Christian friend claimed that atheists/rationalists/skeptics/evolutionists cannot trust even their own reason (beacuse it is the product of their imperfect brains in their opinion).
So I wanted to counterargue reasonably, and my statement above seems to me a relatively reasonable and relevant. And I don't know whether it would convince my Christian friend, but it is convincing at least me :) .
Thanks in advance for your opinions, etc.
atheists/rationalists/skeptics/evolutionists cannot trust even their own reason
Well, I don't. But at the end of the day, some choices need to be made, and following my own reason seems better than... well, what is the alternative here... following someone else's reason, which is just as untrustworthy.
Figuring out the truth for myself, and convincing other people are two different tasks. In general, truth should be communicable (believing something for mysterious inexplicable reasons is suspicious); the problem is rather that the other people cannot be trusted to be in a figuring-out-the-truth mode (and can be in defending-my-tribe or trying-to-score-cheap-debate-points mode instead).
Part of being a good skeptic is being skeptical of one's own reasoning. You need to be skeptical of your own thinking to be able to catch errrors in your own thinking.
Consider how justified trust can come into existence.
You're traveling through the forest. You come to moldy looking bridge over a ravine. It looks a little sketchy. So naturally you feel distrustful of the bridge at first. So you look at it from different angles, and shake it a bit. And put a bit of weight on it. And eventually, some deep unconscious part of you will decide that it's either untrustworthy and you'll find another route, or it will decide its trustworthy and you'll cross the bridge.
We don't understand that process, but its reliable anyway.
Yes, but it can happen that in the time course of our individual existence two "justified opinions" inconsistent with each other can occur in our minds. (And if they didn't, we would be doomed to believe all flawed opinions from our childhood without possibility to update them because of rejecting new inconsistent opinions, etc.)
And morover, we are born with some "priors" which are not completely true but relatively useful.
And there are some perceptual illusions.
And prof. Richard Dawkins claims that there are relatively very frequent hallucinations that could make us think that a miracle is happenning (if I understood him correctly). By relatively frequent I mean that probably any of the healthy people could experience a hallucination at least once in a lifetime (often without realizing it).
And of course, there are mental fallacies and biases.
And if the process is reliable, why different people do have different opinions and inconsistent "truths"?
Thus, I think that the process is relatively reliable but not totally reliable.
PS: I am relatively new here. So hopefully, my tone is not agressively persuasive. If any of you have a serious problem with my approach, please, criticize me.
>Thus, I think that the process is relatively reliable but not totally reliable.
Absolutely. That's exactly right.
>My Christian friend claimed that atheists/rationalists/skeptics/evolutionists cannot trust even their own reason (beacuse it is the product of their imperfect brains in their opinion).
It sounds like there's a conflation between 'trust' and 'absolute trust'. Clearly we have some useful notion of trust because we can navigate potentially dangerous situations relatively safely. So using plain language its false to say that atheists can't trust their own judgement. Clearly they can in some situations. Are you saying atheists can't climb a ladder safely?
It sounds like he wants something to trust in absolutely. Has he faced the possibility that that might just not exist?
Why does prediction-book 1) allow you to make 100% credence predictions and 2) bucket 99% credence in the 90% bucket instead of the 100% bucket?
Does anyone know?
It means I either need to have an unsightly graph where my measured accuracy falls to 0 in the 100% bucket, or take the unseemly approach of putting 100% (rounding up, of course, not literally 100%) on some extremely likely prediction.
The bucketing also means that if I make many 99% predictions, but few 90% predictions (for instance), I'll appear uncalibrated, even if I have perfect calibration, (since items in that bucket would be accurate more than 90% of the time). Not realizing this, I might think that I need to adjust more.
I'm trying to find a post (maybe a comment?) from the past few years. The idea was, say you have 8 descriptive labels. These labels could correspond to clusters in thing-space. Or they could correspond to axis. I think it was about types of mathematicians.
Does the feeling of having one's nails too long have a name and a place in the classification of sensations? I mean, some people find it uncomfortable to just have theirs "longer than the nail bed" (me too). It's not like our nails even start to get in the way of doing things, we just have a sensation (kind of an itch), I'd say like "holding" something with the edge of the nail and the nailbed. Is this from touch receptors detecting a different creasing of skin? It goes away when I pay deliberate attention to it.
Considering how much LW and the Sequences talk about doing Bayesian updates, I figured it's worth talking about a downside I am experiencing. I closely monitor Metaculus and adjust my vaccination expectations accordingly. I have certain events that are on hold until I get vaccinated. Therefore, the optimal strategy would seem to be:
What I didn't expect was the level of effort required by step 2 after the first run through. It gets tiring to call people every few months and say something like "Oops, we have to reschedule again..."
If it’s worth saying, but not worth its own post, here's a place to put it.
If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.
If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the new Concepts section.
The Open Thread tag is here. The Open Thread sequence is here.