All of ZY's Comments + Replies

A side note -

We don’t own slaves, women can drive, while they couldn’t in Ancient Rome, and so on.

Seems to be a very low bar for being "civilized"

Enemies vs Malefactors

Model Integrity: MAI on Value Alignment

focusing less on intent and more on patterns of harm

In a general context, understanding intent though will help to solve the issue fundamentally. There might be two general reasons behind harmful behaviors: 1.do not know this will cause harm, or how not to cause harm, aka uneducated on this behavior/being ignorant, 2.do know this will cause harm, and still decided to do so. There might be more nuances but these two are probably the two high level categories. Knowing what the intent is helps to create strategies to address the issue - 1.more education? 2.more punishments/legal actions?

Daniel Kokotajlo's Shortform

In my opinion, theoretically, the key to have "safe" humans and "safe" models, is "to do no harm" under any circumstances, even when they have power. This is roughly what law is about, and what moral values should be about (in my opinion)

Daniel Kokotajlo's Shortform

Yeah nice; I heard youtube also has something similar for checking videos as well

ZY6mo30

It is interesting; I am only a half musician but I wonder what a true musician think about the music generation quality generally; also this reminds me of the Silicon Valley show's music similarity tool to check for copyright issues; that might be really useful nowadays lmao

1yams5mo

A great many tools like this already exist and are contracted by the major labels. When you post a song to streaming services, it’s checked against the entire major label catalog before actually listing on the service (the technical process is almost certainly not literally this, but it’s something like this, and they’re very secretive about what’s actually happening under the hood).

Have we seen any "ReLU instead of sigmoid-type improvements" recently

Reducing x-risk might be actively harmful

On the side - could you elaborate why you think "relu better than sigmoid" is a "weird trick", if that is implied by this question?

The reason that I thought to be commonly agreed is that it helps with the vanishing gradient problem (this could be shown from the graphs).

ZY6mo4-3

I personally agree with your reflection on suffering risks (including factory farming, systemic injustices, and wars) and the approach to donating to different cause areas. My (maybe unpopular under "prioritizing only 1" type of mindset) thought is: maybe we should avoid prioritizing only one single area (especially collectively), but recognize that in reality there are always multiple issues we need to fight about/solve. Personally we could focus professionally on one issue, and volunteer for/donate to another cause area, depending on our knowledge, inter... (read more)

Rauno's Shortform

ZY6mo30

Yeah that makes sense; the knowledge should still be there, just need to re-shift the distribution "back"

Shortform

ZY6mo*64

Haven't looked too closely at this, but my initial two thoughts:

child consent is tricky.
likely many are foreign children, which may or may not be in the 75 million statistic

It is good to think critically, but I think it would be beneficial to present more evidence before making the claim or conclusion

Rauno's Shortform

ZY6mo*32

This is very interesting, and thanks for sharing.

One thing that jumps out at me is they used an instruction format to prompt base models, which isn't typically the way to evaluate base models. It should be reformatted to a completion type of task. If this is redone, I wonder if the performance of the base model will also increase, and maybe that could isolate the effect further to just RLHF.
I wonder if this has anything to do with also the number of datasets added on by RLHF (assuming a model go through supervised/instruction finetuning first, and th

... (read more)

2p.b.6mo

There was one comment on twitter that the RLHF-finetuned models also still have the ability to play chess pretty well, just their input/output-formatting made it impossible for them to access this ability (or something along these lines). But apparently it can be recovered with a little finetuning.

eggsyntax's Shortform

ZY6mo40

I find it useful sometimes to think about "how to differentiate this term" when defining a term. In this case, in my mind it would be thinking about "reasoning", vs "general reasoning" vs "generalization".

Reasoning: narrower than general reasoning, probably would be your first two bullet points combined in my opinion
Generalization: even more general than general reasoning (does not need to be focused on reasoning). Seems could be the last two bullet points you have, particularly the third
General reasoning (this is not fully thought through): Now that we ta

... (read more)

3eggsyntax6mo

Interesting approach, thanks!

Consider tabooing "I think"

ZY6mo*73

In my observation (trying to avoid I think!), "I think" is intended to (or actually should have been used to) point out perspective differences (which helps to lead to more accurate conclusions, including collaborative and effective communication), rather than confidence. In the latter case of misuse, it would be good if people clarify "this term is about confidence, not perspective in my sentence".

What is malevolence? On the nature, measurement, and distribution of dark traits

Lessons learned from talking to >100 academics about AI safety

True. I wonder for the average people, if being self-aware would at least unconsciously be a partial "blocker" on the next malevolence action they might do, and that may evolve across time too (even if it may take a bit longer than a mostly-good)

ZY6mo30

I highly agree with almost all of these points, and those are very consistent with my observation. As I am still relatively new to lesswrong, one big observation (based on my experience) I still see today, is disconnected concepts, definitions, and or terminologies with the academic language. Sometimes I see terminology that already exists in academia and introducing new concepts with the same name may be confusing without using channels academics are used to. There are some terms that I try to search on google for example, but the only relevant ones are f... (read more)

Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)

Should CA, TX, OK, and LA merge into a giant swing state, just for elections?

What would be some concrete examples/areas to work on for human flourishing? (Just saw a similar question on the definition; I wonder what could be some concrete areas or examples)

Should CA, TX, OK, and LA merge into a giant swing state, just for elections?

True; and they would only need to merge up to they reach a "swing state" type of voting distribution.

ZY6mo*30

That would be interesting; on the other hand, why not just merge all the states? I guess it would be a more dramatic change and may be harder to execute and unnecessary in this case.

7Adam Scherlis6mo

It stops being in the interests of CATXOKLA to invite more states once they're already big enough to dominate national electoral politics.

Non-human centric view of existence

Context-dependent consequentialism

Yes, what I meant is exactly "there is no must, but only want". But it feels like a "must" in some context that I am seeing, but I do not recall exactly where. And yeah true, there may be some survival bias.

I agree it is tragedy from human race's perspective, but I think what I meant is from a non-human perspective to view this problem. For example, to an alien who is observing earth, human is just another species that rise up as a dominant species, as a thought experiment.

(On humans prefer to be childless - actually this already slowed down in many countries due to cost of raising a child etc, but yeah this is a digress on my part.)

ZY6mo*10

My two cents:

The system has a fixed goal that it capably works towards across all contexts.
The system is able to capably work towards goals, but which it does, if any, may depend on the context.

From these two above, seems it would be good for you to define/clarify what exactly you mean by "goals". I can see two definitions: 1. goals as in a loss function or objective that the algorithm is optimizing towards, 2. task specific goals like summarize an article, planning. There may be some other goals that I am unaware of, or this is obvious elsewhere in some c... (read more)

Trading Candy

I think that is probably not a good reason to be libertarian in my opinion? Could you also share maybe how much older were your than you siblings? If you are not that far apart, you and your siblings came from the same starting line, distributing is not going to happen in real life economically nor socially even if not libertarian (in real life, where we need equity is when the starting line is not the same and is not able to be changed by choice. A more similar analogy might be some kids are born with large ears, and large ears are favored by the society, and the large eared kids always get more candy). If you are ages apart with you being a lot older, it may make some limited sense to for your parents to re-distribute.

3johnswentworth6mo

To be clear, I don't really think of myself as libertarian these days, though I guess it'd probably look that way if you just gave me a political alignment quiz. To answer your question: I'm two years older than my brother, who is two years older than my sister.

ZY's Shortform

Failures in Kindness

ZY7mo20

I am not quite sure about the writing/examples in computational kindness and responsibility offloading, but I think I feel the general idea.

For computational kindness, I think it is just really the difference in how people prefer to communicate, or making plans it seems, with the example on trip planning. I, for example, personally prefer being offered with their true thoughts - if they are okay with just really anything, or not. Anything is fine as long as that is what they really think or prefer (side talk: I generally think communicating real pref... (read more)

If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?

ZY7mo30

Ah thanks. Do you know why these former rationalists were "more accepting" of irrational thinking? And to be extremely clear, does "irrational" here mean not following one's preference with their actions, and not truth seeking when forming beliefs?

I don't understand either. If it is meant what it meant, this is a very biased perception and not very rational (truth seeking or causality seeking). There should be better education systems to fix that.

ZY7mo*10

On what evidence do I conclude what I think is know is correct/factual/true and how strong is that evidence? To what extent have I verified that view and just how extensively should I verify the evidence?

For this, aside from traditional paper reading from credible sources, one good approach in my opinion is to actively seek evidence/arguments from, or initiate conversations with people who have a different perspective with me (on both side of the spectrum if the conclusion space is continuous).

I am interested in learning more about this, but not sure what "woo" means; after googling, is it right to interpret as "unconventional beliefs" of some sort?

4gilch7mo

It's short for "woo-woo", a derogatory term skeptics use for magical thinking. I think the word originates as onomatopoeia from the haunting woo-woo Theremin sounds played in black-and-white horror films when the ghost was about to appear. It's what the "supernatural" sounds like, I guess. It's not about the belief being unconventional as much as it being irrational. Just because we don't understand how something works doesn't mean it doesn't work (it just probably doesn't), but we can still call your reasons for thinking so invalid. A classic skeptic might dismiss anything associated categorically, but rationalists judge by the preponderance of the evidence. Some superstitions are valid. Prescientific cultures may still have learned true things, even if they can't express them well to outsiders.

Originality vs. Correctness

I personally agree with you on the importance of these problems. But I myself might also be a more general responsible/trustworthy AI person, and I care about other issues outside of AI too, so not sure about a more specific community, or what the definition is for "AI Safety" people.

For funding, I am not very familiar and want to ask for some clarification: by "(especially cyber-and bio-)security", do you mean generally, or "(especially cyber-and bio-)security" caused by AI specifically?

davekasten's Shortform

ZY7mo*10

Does "highest status" here mean highest expertise in a domain generally agreed by people in that domain, and/or education level, and/or privileged schools, and/or from more economically powerful countries etc? It is also good to note that sometimes the "status" is dynamic, and may or may not imply anything causal with their decision making or choice on priorities.

One scenario is "higher status" might correlates with better resources to achieve those statuses, and a possibility is as a result they haven't experienced or they are not subject to many near-ter... (read more)

1davekasten7mo

I mean, functionally all of those things. (Well, minus the country dynamic. Everyone at this event I talked to was US, UK, or Canadian, which is all sorta one team for purposes of status dynamics at that event)

Could randomly choosing people to serve as representatives lead to better government?

Could you define what you mean by "correctness" in this context? I think there might be some nuances into this, in terms of what "correct" means, and under what context

Jimrandomh's Shortform

ZY7mo*36

Based on the words from this post alone -

I think that would depend on what the situation is; in the scenario of price increases, if the business is a monopoly or have very high market power, and the increase is significant (and may even potentially cause harm), then anger would make sense.

What is malevolence? On the nature, measurement, and distribution of dark traits

Thanks! I think the term duration is interesting and creative.

Do you think for the short-term ones there might be pre-studies they need to do for the exact topics they need to learn on? Or maybe could design the short-term ones for topics that can be learnt quickly and solved quickly? I am a little worried about the consistency in policy as well (for example even with work, when a person on a project take vacation, and someone need to cover for them, there are a lot of onboarding docs, and prior knowledge to transfer), but could not find a good way just yet. I will think more about these.

ZY7mo20

Amazingly detailed article covering malevolence, interaction with power, and the other nuances! Have been thinking of exploring similar topics, and found this very helpful. Besides the identified research questions, some of which I highly agree with, one additional question I was wondering is: do self-awareness of one's own malevolence factors help one to limit the malevolence factors? if so how effective would that be? how would this change when they have power?

3Viliam6mo

Probably the effect would be nonlinear, like the evil people would just laugh, the average might get depressed and give up, and the mostly-good would strive to achieve perfection (or conclude that they are already good enough compared to others, and relax their efforts?).

Could randomly choosing people to serve as representatives lead to better government?

Interesting idea, and I think there is a possibility that the responsibility will make the "normal people" make better choices or learn more even though they do not know policy, etc in the first place.

A few questions:

Do you think there is a situation where selected random people do not want to be in office/leadership and want to pursue their own passion/career and thus due to this reason may do a bad job? Is this mandatory?
What are some nuances about population and diversity? (I am not sure yet)

2John Huang7mo

>Do you think there is a situation where selected random people do not want to be in office/leadership and want to pursue their own passion/career and thus due to this reason may do a bad job? Is this mandatory? I think a robust way to design the assembly (or multiple assemblies like with Bouricius's model) is to have many different people serving different term lengths. Some people may serve a term of only a couple days or weeks. Others might serve for years. For short-term service, I would make that mandatory. Everyone is required to come. For long term service, maybe those should be voluntary. As far as incentives go, there's a range of enforcement options for "mandatory" service. Perhaps you can just pay a big fine, as a percentage of your income, as an alternative to service. There probably ought to be mechanisms to defer service so you can time things a bit better with your life circumstances. The typical Citizens' Assembly will also offer benefits such as child care, parental care. A high paying salary will encourage the lower and middle class to participate. I have trouble coming up with ways to help small business owners to participate though. Could a small business owner drop their work for an entire year, even if it was well paid -- especially if the small business is so small there are no managers to cover their role? Perhaps there could be alternatives for them, such as part time work coupled with work-from-home. >What are some nuances about population and diversity? (I am not sure yet) I have yet to hear about a case where Deliberative decision making techniques were tried and failed due to excessive diversity or cultural factors. I'm not an expert on the latest and greatest research here so I may be wrong. I do know that deliberation experiments have been performed all around the world, including East Asia, Africa, and India. An example deliberative poll was performed in Uganda, paper linked here: https://direct.mit.edu/daed/a

Cipolla's Shortform