Agreed code as coordination mechanism
Code nowadays can do lots of things, from buying items to controlling machines. This presents code as a possible coordination mechanism, if you can get multiple people to agree on what code should be run in particular scenarios and situations, that can take actions on behalf of those people that might need to be coordinated.
This would require moving away from the “one person committing code and another person reviewing” code model.
This could start with many people reviewing the code, people could write their own t...
Looks like someone has worked on this kind of thing for different reasons https://www.worlddriven.org/
What would a "qualia-first-calibration" app would look like?
Or, maybe: "metadata-first calibration"
The thing with putting probabilities on things is that often, the probabilities are made up. And the final probability throws away a lot of information about where it actually came from.
I'm experimenting with primarily focusing on "what are all the little-metadata-flags associated with this prediction?". I think some of this is about "feelings you have" and some of it is about "what do you actually know about this topic?"
The sort of app I'm imagining would he...
"what are all the little-metadata-flags associated with this prediction?"
Some metadata flags I associate with predictions:
So the usual refrain from Zvi and others is that the specter of China beating us to the punch with AGI is not real because limits on compute, etc. I think Zvi has tempered his position on this in light of Meta's promise to release the weights of its 400B+ model. Now there is word that SenseTime just released a model that beats GPT-4 Turbo on various metrics. Of course, maybe Meta chooses not to release its big model, and maybe SenseTime is bluffing--I would point out though that Alibaba's Qwen model seems to do pretty okay in the arena...anyway, my point is that I don't think the "what if China" argument can be dismissed as quickly as some people on here seem to be ready to do.
Yes, yes. Probably not. And they already have a Sora clone called Vidu, for heaven's sake.
We spend all this time debating: should greedy companies be in control, should government intervene, will intervention slow progress to the good stuff: cancer cures, longevity, etc. All of these arguments assume that WE (which I read as a gloss for the West) will have some say in the use of AGI. If the PRC gets it, and it is as powerful as predicted, these arguments become academic. And this is not because the Chinese are malevolent. It's because, AGI would fall into ...
I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I'd be interested in hearing people's answers to this question. Or, if you want more specific questions:
Thank you for detailing your thoughts. Some differences for me:
My current main cruxes:
If there is reasonable consensus on any one of those, I'd much appreciate to know about it. Else, I think these should be research priorities.
I offer, no consensus, but my own opinions:
Will AI get takeover capability? When?
0-5 years.
Single ASI or many AGIs?
There will be a first ASI that "rules the world" because its algorithm or architecture is so superior. If there are further ASIs, that will be because the first ASI wants there to be.
Will we solve technical alignment?
Contingent.
Value alignment, intent alignment, or CEV?
For an ASI you need the equivalent of CEV: values complete enough to govern an entire transhuman civilization.
Defense>offense or offense>defense?
Of...
AGI doom by noise-cancelling headphones:
ML is already used to train what sound-waves to emit to cancel those from the environment. This works well with constant high-entropy sound waves easy to predict, but not with low-entropy sounds like speech. Bose or Soundcloud or whoever train very hard on...
FWIW it was obvious to me
I've found an interesting "bug" in my cognition: a reluctance to rate subjective experiences on a subjective scale useful for comparing them. When I fuzz this reluctance against many possible rating scales, I find that it seems to arise from the comparison-power itself.
The concrete case is that I've spun up a habit tracker on my phone and I'm trying to build a routine of gathering some trivial subjective-wellbeing and lifestyle-factor data into it. My prototype of this system includes tracking the high and low points of my mood through the day as recalled ...
I'm not alexithymic; I directly experience my emotions and have, additionally, introspective access to my preferences. However, some things manifest directly as preferences which I have been shocked to realize in my old age, were in fact emotions all along. (In rare cases these are stronger than the ones directly-felt even, despite reliably seeming on initial inspection to be simply neutral metadata).
Here's an example for you: I used to turn the faucet on while going to the bathroom, thinking it was due simply to having a preference for somewhat-masking the sound of my elimination habits from my housemates, then one day I walked into the bathroom listening to something-or-other via earphones and forgetting to turn the faucet on only to realize about halfway through that apparently I actually didn't much care about such masking, previously being able to hear myself just seemed to trigger some minor anxiety about it I'd failed to recognize, though its ab...
I'm against intuitive terminology [epistemic status: 60%] because it creates the illusion of transparency; opaque terms make it clear you're missing something, but if you already have an intuitive definition that differs from the author's it's easy to substitute yours in without realizing you've misunderstood.
I like the rough thoughts way though. I'm not here to like read a textbook.
Nathan and Carson's Manifold discussion.
As of the last edit my position is something like:
"Manifold could have handled this better, so as not to force everyone with large amounts of mana to have to do something urgently, when many were busy.
Beyond that they are attempting to satisfy two classes of people:
To this end, and modulo the above hassle this decision is good.
It is unclear to me whether there...
Nevertheless lots of people were hassled. That has real costs, both to them and to you.
Various sailors made important discoveries back when geography was cutting-edge science. And they don't seem particularly bright.
Vasco De Gama discovered that Africa was circumnavigable.
Columbus was wrong about the shape of the Earth, and he discovered America. He died convinced that his newly discovered islands were just off the coast of Asia, so that's a negative sign for his intelligence (or a positive sign for his arrogance, which he had in plenty.)
Cortez discovered that the Aztecs were rich and easily conquered.
Of course, lots of other wou...
I expect large parts of interpretability work could be safely automatable very soon (e.g. GPT-5 timelines) using (V)LM agents; see A Multimodal Automated Interpretability Agent for a prototype.
Notably, MAIA (GPT-4V-based) seems approximately human-level on a bunch of interp tasks, while (overwhelmingly likely) being non-scheming (e.g. current models are bad at situational awareness and out-of-context reasoning) and basically-not-x-risky (e.g. bad at ARA).
Given the potential scalability of automated interp, I'd be excited to see plans to use large amo...
Noteably, the mainline approach for catching doesn't involve any internals usage at all, let alone labeling a bunch of things.
This was indeed my impression (except for potentially using steering vectors, which I think are mentioned in one of the sections in 'Catching AIs red-handed'), but I think not using any internals might be overconservative / might increase the monitoring / safety tax too much (I think this is probably true more broadly of the current control agenda framing).
Yeah. It's possible to give quite accurate definitions of some vague concepts, because the words used in such definitions also express vague concepts. E.g. "cygnet" - "a young swan".
Today I learned that being successful can involve feelings of hopelessness.
When you are trying to solve a hard problem, where you have no idea if you can solve it, let alone if it is even solvable at all, your brain makes you feel bad. It makes you feel like giving up.
This is quite strange because most of the time when I am in such a situation and manage to make a real efford anyway I seem to always suprise myself with how much progress I manage to make. Empirically this feeling of hopelessness does not seem to track the actual likelyhood that you will completely fail.
I would highly recommend getting someone else to debug your subconscious for you. At least it worked for me. I don’t think it would be possible for me to have debugged myself.
My first therapist was highly directive. He’d say stuff like “Try noticing when you think X, and asking yourself what happened immediately before that. Report back next week.” And listing agenda items and drawing diagrams on a whiteboard. As an engineer, I loved it. My second therapist was more in the “providing supportive comments while I tal...
List sorting does not play well with few-shot mostly doesn't replicate with davinci-002.
When using length-10 lists (it crushes length-5 no matter the prompt), I get:
So few-shot hurts, but the fancy prompt does not seem to help. Code here.
I'm interested if anyone knows another case where a fancy prompt increases performance more than few-shot prompting, where a fancy prompt is a prompt that does not contain information that a human would use to solve the task. ...
American Philosophical Association (APA) announces two $10,000 AI2050 Prizes for philosophical work related to AI, with June 23, 2024 deadline: https://dailynous.com/2024/04/25/apa-creates-new-prizes-for-philosophical-research-on-ai/
Classic type of argument-gone-wrong (also IMO a way autistic 'hyperliteralism' or 'over-concreteness' can look in practice, though I expect that isn't always what's behind it): Ashton makes a meta-level point X based on Birch's meta point Y about object-level subject matter Z. Ashton thinks the topic of conversation is Y and Z is only relevant as the jumping-off point that sparked it, while Birch wanted to discuss Z and sees X as only relevant insofar as it pertains to Z. Birch explains that X is incorrect with respect to Z; Ashton, frustrated, reiterates ...
Meta/object level is one possible mixup but it doesn't need to be that. Alternative example, is/ought: Cedar objects to thing Y. Dusk explains that it happens because Z. Cedar reiterates that it shouldn't happen, Dusk clarifies that in fact it is the natural outcome of Z, and we're off once more.
If you use ublock (or adblock, or adguard, or anything else that uses EasyList syntax), you can add a custom rule
lesswrong.com##.NamesAttachedReactionsCommentBottom-footerReactionsRow
lesswrong.com##.InlineReactHoverableHighlight-highlight:remove-class(InlineReactHoverableHighlight-highlight)
which will remove the reaction section underneath comments and the highlights corresponding to those reactions.
The former of these you can also do through the element picker.
decision theory is no substitute for utility function
some people, upon learning about decision theories such as LDT and how it cooperates on problems such as the prisoner's dilemma, end up believing the following:
it's possible that this is true for some people, but in general i expect that to be a mistaken anal... (read more)