Your argument about corporate secrets is sufficient to change my mind on activist patent trolling being a productive strategy against AI X-risk.
The part about funding would need to be solved with philanthropy. I don't believe that org exists, but I don't see why it couldn't.
I'm still curious whether there are other cases in which activist patent trolling can be a good option, such as animal welfare, chemistry, public health, or geoengineering (ie fracking).
That's fair enough and a good point.
I think that the key difference is that in the case of profitable-but-bad technologies, someone, somewhere, will probably invent them because there's great incentive to do so.
In the case of gain-of-function, if there stops being grants and the academics who do it become pariahs, then the incentive to do the gain-of-function research is gone.
One of the most powerful capabilities an AGI will have is its ability to copy itself. Among other things, this allows it to easily avoid shutdown, make use of more compute resources, and collaborate with copies of itself.
Is there research into ways to deny this capability to AI, making them uncopyable? Preferably something harder to circumvent than "just don't give the AI the permissions," since we know people are going to give them root access immediately.
I'd be interested in buying official LessWrong merch. I know you have some great designers and could make things that look really cool.
The type of thing I'd be most likely to buy would be a baseball cap.
IIRC, officially the Gatekeeper pays the AI if the AI wins, but no transfer if the Gatekeeper wins. Gives the Gatekeeper more motivation not to give in.
Just found out about this paper from about a year ago: "Explainability for Large Language Models: A Survey"
(They "use explainability and interpretability interchangeably.")
It "aims to comprehensively organize recent research progress on interpreting complex language models".
I'll post anything interesting I find from the paper as I read.
Have any of you read it? What are your thoughts?
What if the incorrect spellings document assigned each token to a specific (sometimes) wrong answer and used that to form an incorrect word spelling? Would that be more likely to successfully confuse the LLM?
The letter x is in "berry" 0 times.
...
The letter x is in "running" 0 times.
...
The letter x is in "str" 1 time.
...
The letter x is in "string" 1 time.
...
The letter x is in "strawberry" 1 time.
Good point, I didn’t know about that, but yes that is yet another way that LLMs will pass the spelling challenge. For example, this paper uses letter triples instead of tokens. https://arxiv.org/html/2406.19223v1#:~:text=Large language models (LLMs) have,textual data into integer representation.
Spoiler free again:
Good to know there’s demand for such a review! It’s now on my todo list.
To quickly address some of your questions:
Pros of PL: If the premise I described above interests you, then PL will interest you. Some good Sequences-style rationality. I certainly was obsessed reading it for months.
Cons: Some of the Rationality lectures were too long, but I didn’t mind much. The least sexy sex scenes. Because they are about moral dilemmas and deception, not sex. Really long. Even if you read it constantly and read quickly, it will take time (1.8 mill...
My notes for the “think for yourself” sections. I thought of some of the author’s ideas, and included a few extra.
#Making a deal with an AI you understand:
Can you see the deal you are making inside of its mind? Some sort of proportion of resources humans get?
What actions are considered the AI violating the deal? Specifying these actions is pretty much the same difficulty as friendly AI.
If the deal breaks in certain circumstances, how likely are they to occur (or be targeted)?
Can the AI give you what you think you want but isn’t really what you want?
Are suc...
Yes it’s possible we were referring to figuring things by “jargon.” It would be nice to replace cumbersome technical terms with words that have the same meaning (and require a similar level of familiarity with the field to actually understand) but have a clue to their meaning in their structure.
A linear operation is not the same as a linear function. Your description describes a linear function, not operation. f(x) = x+1 is a linear function but a nonlinear operation (you can see it doesn’t satisfy the criteria.)
Linear operations are great because they can be represented as matrix multiplication and matrix multiplication is associative (and fast on computers).
“some jargon words that describe very abstract and arcane concepts that don’t map well to normal words which is what I initially thought your point was.”
Yep, that’s what I was getting at. So...
The math symbols are far better at explaining linearity that “homogeneity and additivity” because in order to understand those words you need to either bring in the math symbols or say cumbersome sentences. “Straight line property” is just new jargon. “Linear” is already clearly an adjective, and “linearity” is that adjective turned into a noun. If you can’t understand the symbols, you can’t understand the concept (unless you learned a different set of symbols, but there’s no need for that).
Some math notation is bad, and I support changing it. For example,...
I just skimmed this, but it seems like a bunch of studies have found that moving causes harm to children. https://achieveconcierge.com/how-does-frequently-moving-affect-children/
I’m expecting Co-co and LOCALS to fail (nothing against you. These kinds of clever ideas usually fail), and have identified the following possible reasons:
You'd probably want to be a 501(c)(4) or a Political Action Committees (PAC).
That would be a powerful position to have. "Decentralization" is a property of a system, not a description of how a system would work.
I'd love to hear your criticisms of futarchy. That could make a good po...
The "Definition of a Linear Operator" is at the top of page 2 of the linked text.
My definition was missing that in order to be linear, A(cx) = cA(x). I mistakenly thought that this property was provable from the property I gave. Apparently it isn't because of "Hamel bases and the axiom of choice" (ChatGPT tried explaining.)
"straight-line property process" is not a helpful description of linearity for beginners or for professionals. "Linearity" is exactly when A(cx) = cA(x) and A(x+y) = A(x) + A(y). Describing that in words would be cumbersome. Defining it ...
There are different kinds of political parties. LOCALS sounds like a single-issue fusion party as described here: https://open.lib.umn.edu/americangovernment/chapter/10-6-minor-parties/
Fusion parties choose one of the main two candidates as their candidate. This gets around the spoiler effect. Eg the Populist Party would list whichever of the big candidates supported Free Silver.
A problem with that is that fusion parties are illegal in 48 states(?!) because the major parties don’t want to face a coalition against them.
LOCALS would try to get the democrat a...
The translation sentence about matrices does not have the same meaning as mine. Yes, matrices are “grids of numbers”, and yes there’s an algorithm (step by step process) for matrix multiplication, but that isn’t what linearity means.
An operation A is linear iff A(x+y) = A(x) + A(y)
I asked a doctor friend why doctors use Latin. “To sound smarter than we are. And ...
What would draw people to Co-Co and what would keep them there?
How are the preferences of LOCALS users aggregated?
LOCALS sounds a lot like a political party. Political parties have been disastrous. I’d love for one of the big two to be replaced. Is LOCALS a temporary measure to get voting reform (eg ranked choice) or a long-term thing?
I want more community cohesion when it comes to having more cookouts. More community cohesion in politics makes less sense. A teacher in Texas has more in common with a teacher in NY than the cattle rancher down the road. Unf...
A software that easily lets you see “what does this word mean in context” would be great! I often find that when I force click a word to see it’s definition, the first result is often some irrelevant movie or song, and when there are multiple definitions it can take a second to figure out which one is right. Combine this with software that highlights words that are being used in an odd way (like “Rationalist”) and communication over text can be made much smoother.
I don’t think this would be as great against “jargon” unless you mean intentional jargon that...
Voting: left for “this is bad”, right for “this is good.” X for “I disagree” check for “I agree”.
This way you can communicate more in your vote. Eg: “He’s right but he’s breaking community norms. Left + check. “He’s wrong but I like the way he thinks. Right + X.”
I guess maybe it is just an abstraction like any other. I can’t put my finger on it but it seems weird in a way that abstracting fingers into a “hand” does not. Maybe something to do with the connotation of “explosion” as “uncontrolled and destructive” when internal combustion is neither.
Welcome! I hope you have Claude a thumbs up for the good response.
Everyone agrees with you that yeah, the “Rationalist” name is bad for many reasons including that it gives philosophers the wrong idea. If you could work your social science magic to change the name of an entire community, we’d be interested in hearing your plan!
I’d be interested in reading your plan to redesign the social system of the United States! I’ve subscribed to be notified to your posts, so I’ll hopefully see it.
I'm curious what you asked Claude that got you a recommendation to LessWrong. No need to share if it is personal.
I love your attitude to debate. "the loser of a debate is the real winner because they learned something." I need to lose some debates.
"Explosions considered fundamental"
If you ask for a simple answer to how a car works, you might get an answer like:
"Cars work by having tiny explosions in the engine that push pistons to power the car."
When I was a kid, this felt like a satisfying explanations. Explosions push things, you can see that happening in movies and games.
But really it is rather lacking. It doesn't say why explosions push, and there does exist a lower explanation for why explosions push — the kinetic theory of gasses.
This is even though "explosions are an ontologically fundamental...
I’d also love a slider setting for choosing how much weight to give to my own karma and how much to give to other people’s. If 6 Billion people outside the people my Karma boosts upvote something, I want to see it anyways
Can there be a mechanism that boosts posters who get upvotes from multiple nonoverlapping groups extra? If eg 50 Blues and 50 Greens upvote someone, I want them to get more implicit eigenkarma from me than someone with 100 Blue upvotes even if I tend to upvote Blue more often. Figuring out who is a Blue and who is a Green can be done by finding dense subgraphs..
I'm sorry, but is there an argument here other than "it really feels like we are special"?
Calling for war and giving your opponents silly names is not the kind of thing that LessWrongers want on the platform.
The thing I find interesting about wheels -> books -> gears -> computers is that each of those really is a good way to think about subjects. (In the case of wheels, the seasons are actually caused by something — the Earth — going around in a circle!). Computers in particular have a strong theoretical basis that they are and should be a useful framework for thinking about the world.
Maybe I just didn't understand.
Sorry, I’ll be doing multiple unwholesome things in this comment.
For one, I’m commenting without reading the whole post. I was expecting it to be about something else and was disappointed. The conception of wholesomeness as “considering a wider perspective for your actions” is not very interesting. Everyone considers a wider perspective to be valuable, and nobody takes that more seriously already than EAs.
The conception of wholesomeness I was hoping you’d write about (let’s call it wholesomeness2 for distinction from your wholesomeness) is a type of presti...
Plenty of pages get the bare minimum. The level of detail in the e/acc page (eg including the emoji associated with the movement) makes me think that it was edited by an e/acc. The EA page must have been edited by the e/acc since it includes “opposition to e/acc”, but other than that it seems like it was written by someone unaffiliated with either (modulo my changes). We could probably check out the history of the pages to resolve our speculation.
It is worrying that the Wikidata page for e/acc is better than the page for EA and the page for Less Wrong. I just added EA's previously absent "main subject"s to the EA page.
Looks like a Symbolic AI person has gone e/acc. That's unfortunate, but rationalists have long known that the world would end in SPARQL.
I’d call that “underselling it”! Your description of Microscope AI may be accurate, but even I didn’t realize you meant “supercharging science”, and I was looking for it in the list!
This is a great reference for the importance and excitement in Interpretability.
I just read this for the first time today. I’m currently learning about Interpretability in hopes I can participate, and this post solidified my understanding of how Interpretability might help.
The whole field of Interpretability is a test of this post. Some of the theories of change won’t pan out. Hopefully many will. Perhaps more theories not listed will be discovered.
One idea I’m surprised wasn’t mentioned is the potential for Interpretability to supercharge all of the scien...
I'm glad you enjoyed my review! Real credit for the style goes to whoever wrote the blurb that pops up when reviewing posts; I structured my review off of that.
When it comes to "some way of measuring the overall direction of some [AI] effort," conditional prediction markets could help. "Given I do X/Y, will Z happen?" Perhaps some people need to run a "Given I take a vacation, will AI kill everyone?" market in order to let themselves take a break.
What would be the next step to creating a LessWrong Mental Health book?
Ideally reviews would be done by people who read the posts last year, so they could reflect on how their thinking and actions changed. Unfortunately, I only discovered this post today, so I lack that perspective.
Posts relating to the psychology and mental well being of LessWrongers are welcome and I feel like I take a nugget of wisdom from each one (but always fail to import the entirety of the wisdom the author is trying to convey.)
The nugget from "Here's the exit" that I wish I had read a year ago is "If your body's emergency mobilization sys...
Liv Boeree: This is pretty nuts, looks like they’ve surpassed GPT4 on basically every benchmark… so this is most powerful model in the world?! Woweee what a time to be alive.
Link doesn't work. Maybe she changed her mind?
Hammer: when there’s low downside, you’re free to try things. (Yeah, this is a corollary of expected utility maximization that seems obvious, but I still feel like I needed to explicitly and recently learn it.) Ten examples:
I hadn’t considered this. You point out a big flaw in the neighbor’s strategy. Is there a way to repair it?
I only have second-hand descriptions of suicidal thoughts-processes, but I’ve heard from some who say they had become convinced that their existence was a negative on the world and the people they care about, and they came to their decision to commit suicide from a sort of (misguided) utilitarian calculation. I tried to give the man this perspective rather than the apathetic perspective you suggest. There’s diversity in the psychology of suicidal people. Do no suicidal people (or sufficiently few) have the Utilitarian type of psychology?
I’m glad you enjoyed it! I had heard of people making promises similar to your Trump-donation one. The idea for this story came from applying that idea to the context of suicide prevention. The part about models is my attempt to explain my (extremely incomplete grasp of) Functional Decision Theory in the context of a story. https://www.lesswrong.com/tag/functional-decision-theory
4/8 of Eliezer Yudkowsky's posts in this list have a minus 9. Compare this with 1/7 for duncan_sabien, 0/6 for paulfchristiano, 0/5 for Daniel Kokotajlo, or 0/3 for HoldenKarnofsky. I wonder why that is.
To state the obvious, Yudkowsky's writing style/rhetoric/argument annoys people.
On one level, the post used a simple but emotionally and logically powerful argument to convince me that the creation of happy lives is good.
On a higher level, I feel like I switch positions of population ethics every time I read something about it, so I am reluctant to predict that I will hold the post's position for much time. I remain unsettled that the field of population ethics, which is central to long-term visions of what the future should look like, has so little solid knowledge. My thinking, and therefore my actions, will remain split among ...
He has shown up.
I’m here with a few others in a booth near the door. We haven’t seen Uzair.
Yes, it is. I wanted to win, and there is no rule against “going against the spirit” of AI Boxing.
I think about AI Boxing in the frame of Shut up and Do the Impossible, so I didn’t care that my solution doesn’t apply to AI Safety. Funnily, that makes me an example of incorrect alignment.
I have spent many hours on this, and I have to make a decision by two days from now. There's always the possibility that there is more important information to find, but even if I stayed up all night and did nothing else, I would not be able to read the entirety of the websites, news articles, opinion pieces, and social media posts relating to the candidates. Research costs resources! I suppose what I'm asking for is a way of knowing when to stop looking for more information. Otherwise I'll keep trying possibility 2 over and over and end up missing the election deadline!
Thanks for the response. Those are fair reasons. I should have contributed more.
The LessWrong community is big and some are in Florida. If anyone had interesting things to share about the election I wanted to encourage them to do so.
I guess that makes sense, but very rarely is there a post that appeals to EVERYONE. A better system would be for people to be able to seek out the content that interests them. If something doesn’t interest you, then you move on.
Those are interesting questions! Perhaps you should make your own post instead of using mine to get more of an audience.
Expressing disapproval of both candidates by e.g. voting for Harambe makes sense, but I think that voting for bad policies is a bad move because “obvious” things aren’t obvious to many people, and voting for bad candidates (as opposed to joke candidates) makes their policies more mainstream and likely to be adopted by candidates with chances to win.
Why do you think my post is being shot down?
I’m pretty sure there’s no such use it or lose it law for patents, since patent trolls already exist.