All of Pattern's Comments + Replies

T: "Gah, I really need to write this up as a blog post. Giving an example that I'm not really into at the moment seems kind of bad. But ok, so, [[goes on a rambling five-minute monologue that starts bored and boring, until visibly excited about some random thing like the origin of writing or the implausibility of animals with toxic flesh evolving or emergent modularity enabled by gene regulatory networks or the revision theory of truth or something; see the appendix for examples]]"

Still reading the rest of this.

 

"Playful Thinking" (Curiosity driven ex... (read more)

It might be a browser compatibility issue?

This should be spoilered. I typed it, and didn't copy paste it. 

This seems like it might be useful to post to that subreddit.

What are people using to load and analyze the data?

1simon
I started out with Excel, but it could only load, as abstractapplic noted, about 2/3 of the dataset. I considered using just that, or splitting the data, but then decided that since I had been thinking of trying out doing data analysis in Haskell, I would abandon Excel and try out Haskell. After various hangups, including most perniciously the stubborn refusal of the Parsec library to modify its operation to conform to my mental model of how it works, I still haven't actually loaded the data in my program in a usable form. But I'm hoping I'll manage it soon. And then I have to figure out how to actually process and get data out in a usable form...
4abstractapplic
I used the python package Pandas. (I also tried Excel, but the dataset was too large to load everything in. In retrospect, I realize I could have just loaded in the first million rows - 2/3 of the dataset, more than enough to get statistically significant results from - and analyzed that, possibly keeping the remaining ~400k rows as a testing set.)

Is your writing online anywhere?

As a speaker of a native language that has only genderneutral pronouns and no gendered ones, I often stumble and misgender people out of disregard of that info because that is just not how referring works in my brain. I suspect that natives don't have this property and the self-reports are about them.

What language is this?

2Slider
The one that has the word "astalo". (I am keeping my identity small by not needlessly invoking national identities) I seemed to also have a misunderstanding about the word. It is rather something used as a melee weapon that is not a melee weapon as an object. Something that in DnD terms would be an "improvised weapon". But it seems that affordance of ranged weapon is not included in that, the "melee" there is essential (and even that blunt damage is in and slashing and piercing are out). Still a term that is deliberately very wide, but as the function is also to mean very specific things getting it wrong is kinda bad.

It reminds me of a move made in a lawsuit.

But you said that I should use orange juice as a replacement because it's similarly sweet.

Does ChatGPT think tequila is sweet, orange juice is bitter...or is it just trying to sell you drinks?*

tequila has a relatively low alcohol content

Relative to what ChatGPT drinks no doubt.

And tequila doesn’t have any sugar at all.

*Peer pressure you into it drinking it maybe.

At best this might describe some drinks that have tequila in them. Does it know the difference between "tequila" and "drinks with tequila"?

 

Does ChatGPT not differentiate between sweet and sug... (read more)

these success stories seem to boil down to just buying time, which is a good deal less impressive.

The counterpart to 'faster vaccination approval' is 'buying time' though. (Whether or not it ends up being well used, it is good at the time. The other reason to focus on it is - how much can you affect pool testing versus vaccination approval speed? Other stuff like improving statistical techniques might be easier for a lot of people than changing a specific organization.

Overall this was pretty good.

 

That night, Bruce dreamt of being a bat, of swooping in to save his parents. He dreamt of freedom, and of justice, and of purity. He dreamt of being whole. He dreamt of swooping in to protect Alfred, and Oscar, and Rachel, and all of the other good people he knew.

The part about "purity" didn't make sense.

 

Bruce would act.

This is bit of a change from before - something more about the mistake seems like it would make more sense. Not worry. ('Bruce would get it right this time' or something about 'Bruce would act (and it would make things better this time)'.) 'Bruce wouldn't be afraid' maybe?

I was thinking

The rules don't change over time, but what if on...the equivalent of the summer solstice, fire spells get +1 fire mana or something. i.e, periodic behavior. Wait, I misread that. I meant more like, rules might be different, say, once every hundred years (anniversary of something important) - like there's more duels that day, so you might have to fight multiple opponents, or something. 

This is a place where people might look at the game flux, and go 'the rules don't change'. 

Our world is so inadequate that seminal psychology experiments are described in mangled, misleading ways. Inadequacy abounds, and status only weakly tracks adequacy. Even if the high-status person belongs to your in-group. Even if all your smart friends are nodding along.

It says he started with the belief. Not, that he was right, or ended with it. Keeping the idea contained to the source, so it's clear it's not being stated could be improved, yes.

2TurnTrout
I think Pavlov knew that food-salivation wasn't hardwired, and IIRC he makes the point in detail in the lectures. AFAICT many (but certainly not all, and perhaps not even most) contemporary retellings of the experiment are extremely sloppy in this way, and the quoted source doesn't go on to correct the misapprehension.  I would put it as: At the beginning of the experiment, adult dogs salivate when they see food. Therefore, relatively speaking, food-salivation is the "unconditioned" stimulus, since you don't have to condition the dog during the experiment in order to produce the response of salivation.

This is what would happen if you were magically given an extraordinarily powerful AI and then failed to aligned it,

Magically given a very powerful, unaligned, AI. (This 'the utility function is in code, in one place, and can be changed' assumption needs re-examination. Even if we assert it exists in there*, it might be hard to change in, say, a NN.)

* Maybe this is overgeneralizing from people, but what reason do we have to think an 'AI' will be really good at figuring out its utility function (so it can make changes without changing it, if it so desires). ... (read more)

Spoilering/hiding questions. Interesting.

Do the rules of the wizards' duels change depending on the date?

I'll aim to post the ruleset and results on July 18th (giving one week and both weekends for players).  If you find yourself wanting extra time, comment below and I can push this deadline back.

The dataset might not have enough info for this/rules might not be deep enough, but a wizards duel between analysts, or 'players', also sounds like it could be fun.

2aphyer

I think that is a flaw of comments, relative to 'google docs'. Long documents without the referenced areas being tagged in comments, might make it hard to find other people asking the same question you did, even if someone wondered about the same section. (And the difficulty of ascertaining that quickly seems unfortunate.)

It also possesses the ability to levitate and travel through solid objects. 

How is it contained?

It's still a trivial inconvenience sometimes, but:

Two tabs:

one for the response comment writing as reading

one for the reading

 

Note, sometimes people downvote typo comments. Doesn't happen often, but, sometimes it seems like, when the author fixes it, it happens?

For example, if our function measures the probability that some particular glass is filled with water, the space near the maximum is full of worlds like “take over the galaxy and find the location least likely to be affected by astronomical phenomena, then build a megastructure around the glass designed to keep it full of water”.

If the function is 'fill it and see it is filled forever' then strange things may be required to accomplish that (to us) strange goal.

 

Idea:

Don’t specify our goals to AI using functions.

Flaw:

Current deep learning methods use f

... (read more)
2En Kepeig
Thanks for the responses, I'll try to address them individually. I agree that this doesn't adequately represent our goal, but I think the problem persists even when we add lots of qualifications like "make sure the glass is filled with water for the next five minutes and then lose interest". The maximum of that function might not include a large-scale plan due to limited time, but it could include destroying everything within range except for the facility to prevent interference. It's possible that adding enough qualifications would solve this, but it wouldn't be easy to verify. I don't know how to achieve the same capabilities as current or future machine learning without specifying goals using functions. In that sense, I think it would be hard to match something like GPT without deep learning, and so more legible alternatives wouldn't be competitive. (I might be understating this. It seems like function-based learning is the only method we have that works.) I was thinking of Robin Hanson's idea that the competitive market of many AIs would prevent any individual AI from taking over. I don't think that would work either, but I agree that intentionally designing opposing AIs would be even worse. It seems like humans are often kept safe from each other by limited resources and limited thinking time, so I agree that this could be a promising approach. But we would have to prevent a limited AI from increasing its own capabilities. Maybe it's not as easy as copying a piece of software, but probably easier than building a nuclear weapon in terms of resources. If running it requires an uncommon amoung of computing, then you're right, it would be hard to copy. You're right, achieving the global maximum for many functions would be unfeasible. The risk comes when the space of high-value bad outcomes overlaps with the space of feasibale strategies for the AI. This is not necessarily at or even near the global maximum. This way of framing the problem might be more accura

Yeah. When something is very unclear, it's like

Is it good or bad? It's impossible to decipher, I can't tell. Is it true or false? No way to tell. (It doesn't happen often, but it's usually downvoted.)

 

ETA: I'm not sure at the moment what other aspects there are.

It didn't state that explicitly re sorting, but looking at:

It has no other direct effects on the site or content visibility.

I see what you mean. (This would have been less of a question in a 'magic-less sorting system'.)

Agree/Disagree are weird when evaluating your comment.

Agree with you asking the question (it's the right question to ask) or disagree with your view?

 

I read Duncan's comment as requesting that the labeling of the buttons be more explicit in some way, though I wasn't sure if it was your way. (Also Duncan disagreeing with what they reflect).

Upvote (Like**)

  • Quality*

Agreement (Truth)

  • Veracity

Not present***: Value?Judgement? (Good/bad)

  • Good/Bad

 

**This is in ()s because it's the word that shows up in bold when hovering over a button.

*How well something is written?

***That is a harsher bold than I was going for.

2Duncan Sabien (Deactivated)
I guess a different point is that, given what I understand to be the goals of LessWrong, I'm confused about valid reasons for liking something other than either: * This just seems true, irrespective of any of its other properties (e.g. whether it reduces the heat of a conversation or not) * This just seems like it moves the conversation in a better/more productive direction, irrespective of any of its other properties (e.g. whether it's true or not) Writing quality is a good one to mention; I suppose I have upvoted things purely on the grounds that I wanted to incentivize [more like this] for a comment that was clear and clearly effortful.

I think some aspects of 'voting' might benefit from being public. 'Novelty' is one of them. (My first thought when you said 'can't be downvoted' was 'why?'. My filtering desires for this might be...complex. The simple feature being:

I want to be able to sort by novelty. (But also be able to toggle 'remove things I've read from the list'. A toggle, because I might want it to be convenient to revisit (some) 'novel' ideas.))

2Ben Pace
Hm, k, have added an edit.

you should also have to be thinking about

Consider replacing this long phrase (above) with 'consider'.

1tutor vals
Partially agreed for replacing 'have to be thinking about' by 'consider', ie :  If you're really into manipulating public opinion, you should also consider strong upvoting [...] Disagreed on replacing the "should also" part because it reminds you this is only hypothetical and not actually good behaviour. 

Upvoting/downvoting self

  • Sorting importance

'Agreeing'/'Disagreeing'

  • 'I have discovered that this (post (of mine)) is wrong in important ways'
  • or
  • Looking back, this has still stood the test of time.

These methods aren't necessarily very effective (here).

 

Arguably, this can be done better by:

Having them be public (likely in text). What you think of your work is also important. ('This is wrong. I'm leaving it up, but also see this post explaining where I went wrong, etc.')

 

See the top of this article for an example: https://www.gwern.net/Fake-Journal-Clu... (read more)

How do sorting algorithms (for comments) work now?

2Ben Pace
The same as always. Karma score, with a hint of magic (i.e. putting new comments higher for a period on the order of a few hours). As it says in the OP section titled "How the system works", agree/disagree voting has no effect on sorting.

For companies, this is something like the R&D budget. I have heard that construction companies have very little or no R&D. This suggests that construction is a "background assumption" of our society.


Or that research is happening elsewhere. Our society might not give it as much focus as it could though.

 In the context of quantilization, we apply limited steam to projects to protect ourselves from Goodhart. "Full steam" is classically rational, but we do not always want that. We might even conjecture that we never want that. 

So you never do anything with your full strength, because getting results is bad?

Well, by 'we' you mean both 'you' and 'a thing you are designing with quantilization'.

7abramdemski
Anything name-able and not hopelessly vague seems to be bad to full-strength optimize. Although we should be open to exceptions to that. As a life philosophy, it might be pretty uninspiring.

It seems to me that in a competitive, 2-player, minimize-resource-competition StarCraft, you would want to go kill your opponent so that they could no longer interfere with your resource loss?

I would say that in general it's more about what your opponent is doing. If you are trying to lose resources and the other player is trying to lose them, you're going to get along fine. (This would be likely be very stable  and common if players can kill units and scavenge them for parts.) If both of you are trying to lose them...

Trying to minimize resources is a... (read more)

stab

You assume they're dead. (It gives you a past measurement - no guarantee someone won't become evil later.)

Okay, but no testing it on yourself, or anyone else you don't want dead. You'd be lucky to lose only a finger, or a hand.

1Aleksi Liimatainen
Sure, I'll be careful. I only need it for my expedition to the Platonic Realm anyway.

It's a shame we can't see the disagree number and the agree number, instead of their sum.

3Maxime Riché
You can see the sum of the votes and the number of votes (by having your mouse over the number). This should be enough to give you a rough idea of the ration between + and - votes :) 
6Charbel-Raphaël
And also the number of views

So far as I know, every principle of this kind, except for Jessica Taylor's "quantilization", and "myopia" (not sure who correctly named this as a corrigibility principle), was invented by myself; eg "low impact", "shutdownability".  (Though I don't particularly think it hopeful if you claim that somebody else has publication priority on "low impact" or whatevs, in some stretched or even nonstretched way; ideas on the level of "low impact" have always seemed cheap to me to propose, harder to solve before the world ends.)

Low impact seems so easy to pro... (read more)

3. AI which ultimately wants to not exist in future as a terminal goal. Fulfilling the task is on the simplest trajectory to non-existence
 

The first part of that sounds like it might self destruct. And if it doesn't care about anything else...that could go badly. Maybe nuclear badly depending... The second part makes it make more sense though.

 

9. Ontological uncertainty about level of simulation.

So it stops being trustworthy if it figures out it's not in a simulation? Or, it is being simulated?

Also:

Being able to change a system after you've built it.

 

(This also refers to something else - being able to change the code. Like, is it hard to understand? Are there modules? etc.)

I think those are just two principles, not just four.

 

Myopia seems like it includes/leads to 'shutdownability', and some other things.

Low impact: How low? Quantilization is meant as a form of adjustable impact. There's been other work* around this (formalizing power/affecting other's ability to achieve their goals).

*Like this, by TurnTrout: https://www.lesswrong.com/posts/yEa7kwoMpsBgaBCgb/towards-a-new-impact-measure

I think there might be more from TurnTrout, or relating to that. (Like stuff that was intended to explain it 'better' or as the ideas ch... (read more)

I would set up a "council" of AGI-systems (a system of systems), and when giving it requests in an oracle/genie-like manner I would see if the answers converged. At first it would be the initial AGI-system, but I would use that system to generate new systems to the "council".
 

I like this idea. Although, if things don't converge, i.e. there is disagreement, this could potentially serve as identifying information that is needed to proceed, or reckon further/efficiently.

Votes aren't public. (Feedback can be.)

-Tell operators anything about yourself they may want to or should know. 

...

but of course explain what you think the result will be to them

Possible issue: They won't have time to listen. This will limit the ability to:

defer to human operators.

 

Also, does defer to human operators take priority over 'humans must understand consequences'?

better than the tag overall

What does this mean? Improve on what you've (the OP has) already written that's here (LW) tagged corrigibility?

 

The overall point make sense, see how far you can go on:

'principles for corrigbility'.

The phrasing at the end of the post was a little weird though.

The article is short enough - One page! - you should read it instead of the description that follows. One thing I appreciate about is that it covers just a subject, briefly, and does so well.

I'm not sure if I have the right to copy the article over, so I didn't. I came across a screenshot of it online, and looked up the source above.

 

This article is about how feeling stupid is a sign of ignorance, but it's something that happens when you're learning (e.g grad+), especially when you're working on projects to find out things that no else has yet. (e.g. ... (read more)

So, again, you end up needing alignment to generalize way out of the training distribution

I assume this is 'you need alignment if you are going to try 'generalize way out of the training distribution and give it a lot of power'' (or you will die).

And not something else like 'it must stay 'aligned' - and not wirehead itself - to pull something like this off, even though it's never done that before'. (And thus 'you need alignment to do X', not because you will die if you do, but because alignment means something like 'the ability to generalize way out of ... (read more)

'This problem seems hard. Perhaps making AI that's generally good, and then having the AI do it would be easier.'

4Vladimir_Nesov
If charity is taken to mean curiosity about reasons for others' claims/behavior, as I specified in this thread, then lack of charity is a systematic failure to pay some attention to figuring out those reasons. Curiosity is liveness of figuring things out, a rejection of not making progress on any of its subjects. Healthy curiosity keeps the investigation of all relevant topics going, and the actual reasons for someone's ridiculous claims/behavior are relevant to engaging with them.
4benjamincosman
Bayes’ Theorem, presumably.
0lc
lol

How technical is the use of the word 'distributed' here?

 

While arranging my evening, I may perform some Bayesian updates. Maybe I learn that the movie is not available on Netflix, so I ask a friend if they have a copy, then check Amazon when they don’t. This process is reasonably well-characterized as me having a centralized model of the places I might find the movie, and then Bayes-updating that model each time I learn another place where I can/can’t find it.

It seems more like going through a list of places and checking off 'not there' than Bayesian ... (read more)

1. Yeah, this is tricky. I didn't like the terminology, but I didn't have a replacement. It's hard to come up with a term for this (for reasons discussed at length in the post). I was looking more at 'both are 'boundaries'' and disambiguating that it is your boundary (versus the social one) that you are sort of opting in/asking others to work with you to define. (Opting-in (by self) to boundary exploration (of self by others).) 'Boundary exploration' still doesn't sound good, though 'boundary violation' sounds worse. Emphasizing the opt-in part in the term... (read more)

(Prompt:)

The important part would be:

1. The post communicates its point but the terminology could be better. (Which is probably why there are so many "hedges".)
 

Less important:

2. In order to scale up, some things do require opt in/advance notice. Some possibilities are (largely) exclusive of each other. (A costume party and a surprise water balloon fight.)
3. The post mentions different subcultures have different rules, but talks about society boundaries like they are one thing only.

 

(Purpose:)

Overall, I made notes as I read the post. (This post i... (read more)

2Duncan Sabien (Deactivated)
(Thanks) 1. There's been a bid elsewhere for "boundaries" to refer exclusively to the individually-specified thing, and "norms" to be used to indicate the social boundary.  This ... tracks, and seems good, although it leaves out that people e.g. say "Boundaries, Phil, geez!" in reinforcement of social ones, and that the word "norms" refers to many things besides boundaries. But I don't object to using those as the terms if enough other people think they make sense.   2. No disagreement that some things (many, even) require opting in or advance notice. 3. I think they largely are one-thing-only within a subculture (where e.g. "LW" would count as a subculture, and "LWers who live in California when they meet in person" would count as a somewhat different one).  I think there is approximately always, for any given collection of humans in any given time and place, a surprisingly-consistent-across-people sense of what the norms are.  
Load More