Proposal: if you're a social media or other content based platform, add a long-press to the "share" button which allows you to choose between "hate share" and "love share".
Therefore:
* quick tap: keep the current functionality, you get to send the link wherever / copy to clipboard
* long press and swipe to either hate or love share: you still get to send the link (optionally, the URL has some argument indicating it's a hate / love share, if the link is a redirect through the social media platform)
This would allow users to separate out between things that are...
You're right, this is not a morality-specific phenomenon. I think there's a general formulation of this that just has to do with signaling, though I haven't fully worked out the idea yet.
For example, if in a given interaction it's important for your interlocutor to believe that you're a human and not a bot, and you have something to lose if they are skeptical of your humanity, then there's lots of negative externalities that come from the Internet being filled with indistinguishable-from-human chatbots, irrespective its morality.
Since you marked as a crux the fragment "absent acceleration they are likely to die some time over the next 40ish years" I wanted to share two possibly relevant Metaculus questions. Both of these seem to suggest numbers longer than your estimates (and these are presumably inclusive of the potential impacts of AGI/TAI and ASI, so these don't have the "absent acceleration" caveat).
OK, agreed that this depends on your views of whether cryonics will work in your lifetime, and of "baseline" AGI/ASI timelines absent your finger on the scale. As you noted, it also depends on the delta between p(doom while accelerating) and baseline p(doom).
I'm guessing there's a decent number of people who think current (and near future) cryonics don't work, and that ASI is further away than 3-7 years (to use your range). Certainly the world mostly isn't behaving as if it believed ASI was 3-7 years away, which might be a total failure of people acting on their beliefs, or it may just reflect that their beliefs are for further out numbers.
Simple math suggests that anybody who is selfish should be very supportive of acceleration towards ASI even for high values of p(doom).
Suppose somebody over the age of 50 thinks that p(doom) is on the order of 50%, and that they are totally selfish. It seems rational for them to support acceleration, since absent acceleration they are likely to die some time over the next 40ish years (since it's improbable we'll have life extension tech in time) but if we successfully accelerate to ASI, there's a 1-p(doom) shot at an abundant and happy eternity.
Possibly some form of this extends beyond total selfishness.
So, if your ideas have potential important upside, and no obvious large downside, please share them.
What would be some examples of obviously large downside? Something that comes to mind is anything that tips the current scales in a bad way, like some novel research result that directs researchers to more rapid capabilities increase without a commensurate increase in alignemnt. Anything else?
Immorality has negative externalities which are diffuse, and hard to count, but quite possibly worse than its direct effects.
Take the example of Alice lying to Bob about something, to her benefit and his detriment. I will call the effects of the lie on Alice and Bob direct, and the effects on everybody else externalities. Concretely, the negative externalities here are that Bob is, on the margin, going to trust others in the future less for having been lied to by Alice than he would if Alice has been truthful. So in all of Bob's future interactions, his tr...
Agreed that ultimately everything is reverse-engineered, because we don't live in a vacuum. However, I feel like there's a meaningful distinction between:
1. let me reverse engineer the principles that best describe our moral intuition, and let me allow parsimonious principles to make me think twice about the moral contradictions that our actual behavior often implies, and perhaps even allow my behavior to change as a result
2. let me concoct a set of rules and exceptions that will justify the particular outcome I want, which is often the one that best suits...
The more complex the encoding of a system (e.g. of ethics) is, the more likely it is that it's reverse-engineered in some way. Complexity is a marker of someone working backwards to encapsulate messy object-level judgment into principles. Conversely, a system that flows outward from principles to objects will be neatly packed in its meta-level form.
In linear algebra terms, as long as the space of principles has fewer dimensions than the space of objects, we expect principled systems / rules to have a low-rank representation, with a dimensionality approachi...
What's the cost of keeping stuff stuff around vs discarding it and buying it back again?
When you have some infrequently-used items, you have to decide between keeping them around (default, typically) or discarding them and buying them again later when you need them.
If you keep them around, you clearly lose use of some of your space. Suppose you keep these in your house / apartment. The cost of keeping them around is then proportional to the amount of either surface area or volume they take up. Volume is the appropriate measure to use especially if you have...
This raises the question of what it means to want to do something, and who exactly (or which cognitive system) is doing the wanting.
Of course I do want to keep watching YT, but I also recognize there's a cost to it. So on some level, weighing the pros and cons, I (or at least an earlier version of me) sincerely do want to go to bed by 10:30pm. But, in the moment, the tradeoffs look different from how they appeared from further away, and I make (or, default into) a different decision.
An interesting hypothetical here is whether I'd stay up longer when play t...
I often mistakenly behave as if my payoff structure is binary instead of gradual. I think others do too, and this cuts across various areas.
For instance, I might wrap up my day and notice that it's already 11:30pm, though I'd planned to go to sleep an hour earlier, by 10:30pm. My choice is, do I do a couple of me-things like watch that interesting YouTube video I'd marked as "watch later", or do I just go to sleep ASAP? I often do the former and then predictably regret it the next day when I'm too tired to function well. I've reflected on what's going on i...
What if a major contributor to the weakness of LLMs' planning abilities is that the kind of step-by-step description of what a planning task looks like is content that isn't widely available in common text training datasets? It's mostly something we do silently, or we record in non-public places.
Maybe whoever gets the license to train on Jira data is going to get to crack this first.
Right - successful private companies (like nearly all the hot AI labs) are staying private for far longer (indefinitely?) so this bet will not capture any of the value they create for themselves.
It might also be that AGI is broadly deflationary, in that it will mostly melt moats and, with them, corporate margins (in most cases, except maybe the ones of the first company to roll out AGI).
Daniel Gross' [AGI Trades](https://dcgross.com/agitrades) (in particular the first question under "Markets") comes to mind.
It just seems far from certain to me that this be...
What gives you confidence that much value will accrue to the equity of the companies in those indices?
It seems like, in the past, technological revolutions mostly increase churn and are anti-incumbent in some way e.g. (this may be false in particular, but just to illustrate my argument with a concrete-sounding example) ORCL has over 150k employees whose jobs might get nuked if AGI can painlessly and securely transfer its clients to OSS instead of expensive enterprise solutions.
If I try to think about what's the most incumbent-friendly environment, almost by definition it ought to be one where not much is changing, but you're trying to capture value in the opposite scenario.
(sci-fi take?) If time travel and time loops are possible, would this not be the (general sketch of the) scenario under which it comes into existence:
1. a lab figures out some candidate particles that could be sent back in time, build a detector for them and start scanning for them. suppose the particle has some binary state. if the particle is +1 (-1) the lab buys (shorts) stock futures and exits after 5 minutes
2. the trading strategy will turn out to be very accurate and the profits from the trading strategy will be utilized to fund the research required...
Thanks for these references! I'm a big fan, but for some reason your writing sits in the silly under-exploited part of my 2-by-2 box of "how much I enjoy reading this" and "how much of this do I actually read", so I'd missed all of your posts on this topic! I caught up with some of it, and it's far further along than my thinking. On a basic level, it matches my intuitive model of a sparse-ish network of causality which generates a much much denser network of correlation on top of it. I too would have guessed that the error rate on "good" studies would be lower!
Does belief quantization explain (some amount of) polarization?
Suppose people generally do Bayesian updating on beliefs. It seems plausible that most people (unless trained to do otherwise) subconsciosuly quantize their beliefs -- let's say, for the sake of argument, by rounding to the nearest 1%. In other words, if someone's posterior on a statement is 75.2%, it will be rounded to 75%.
Consider questions that exhibit group-level polarization (e.g. on climate change, or the morality of abortion, or whatnot) and imagine that there is a series of "facts" that...
Causality is rare! The usual statement that "correlation does not imply causation" puts them, I think, on deceptively equal footing. It's really more like correlation is almost always not causation absent something strong like an RCT or a robust study set-up.
Over the past few years I'd gradually become increasingly skeptical of claims of causality just by updating on empirical observations, but it just struck me that there's a good first principles reason for this.
For each true cause of some outcome we care to influence, there are many other "measurables" ...
Perhaps that can work depending on the circumstances. In the specific case of a toddler, at the risk of not giving him enough credit, I think that type of distinction is too nuanced. I suspect that in practice this will simply make him litigate every particular application of any given rule (since it gives him hope that it might work) which raises the cost of enforcement dramatically. Potentially it might also make him more stressed, as I think there's something very mentally soothing / non-taxing about bright line rules.
I think with older kids though, it'...
Pretending not to see when a rule you've set is being violated can be optimal policy in parenting sometimes (and I bet it generalizes).
Example: suppose you have a toddler and a "rule" that food only stays in the kitchen. The motivation is that each time food is brough into the living room there is a small chance of an accident resulting in a permanent stain. There's cost to enforcing the rule as the toddler will put up a fight. Suppose that one night you feel really tired and the cost feels particularly high. If you enforce the rule, it will be much more p...
Agreed with your example, and I think that just means that L2 norm is not a pure implementation of what we mean by "simple", in that it also induces some other preferences. In other words, it does other work too. Nevertheless, it would point us in the right direction frequently e.g. it will dislike networks whose parameters perform large offsetting operations, akin to mental frameworks or beliefs that require unecessarily and reducible artifice or intermediate steps.
Worth keeping in mind that "simple" is not clearly defined in the general case (forget about machine learning). I'm sure lots has been written about this idea, including here.
Regularization implements Occam's Razor for machine learning systems.
When we have multiple hypotheses consistent with the same data (an overdetermined problem) Occam's Razor says that the "simplest" one is more likely true.
When an overparameterized LLM is traversing the subspace of parameters that solve the training set seeking the smallest l2-norm say, it's also effectively choosing the "simplest" solution from the solution set, where "simple" is defined as lower parameter norm i.e. more "concisely" expressed.
In early 2024 I think it's worth noting that deep-learning based generative models (presently, LLMs) have the property of generating many plausible hypotheses, not all of which are true. In a sense, they are creative and inaccurate.
An increasingly popular automated problem-solving paradigm seems to be bolting a slow & precise-but-uncreative verifier onto a fast & creative-but-imprecise (deep learning based) idea fountain, a la AlphaGeometry and FunSearch.
...Today, in a paper published in Nature, we introduce FunSearch, a method to search for new solut
Upon reflection, the only way this would work is if verification were easier than deception, so to speak. It's not obvious that this is the case. Among humans, for instance, it seems very difficult for a more intelligent person to tell, in the general case, whether a less intelligent person is lying or telling the truth (unless the verifier is equipped with more resources and can collect evidence and so on, which is very difficult to do about some topics such as the verified's internal state) so, in the case of humans, in general, deception seems easier than verification.
So perhapst the daisy-chain only travels down the intelligence scale, not up.
To be sure, let's say we're talking about something like "the entirety of published material" rather than the subset of it that comes from academia. This is meant to very much include the open source community.
Very curious, in what way are most CS experiments not replicable? From what I've seen in deep learning, for instance, it's standard practice to include a working github repo along with the paper (I'm sure you know lots more about this than I do). This is not the case in economics, for instance, just to pick a field I'm familiar with.
I wonder how much of the tremendously rapid progress of computer science in the last decade owes itself to structurally more rapid truth-finding, enabled by:
There are other reasons to expect rapid progress in CS (compared to, say, electrical engineering) but I wonder how much is explained by this replication dynamic.
It feels like (at least in the West) the majority of our ideation about the future is negative, e.g.
Are we at a historically negative point in the balance of "good vs bad ideation about the future" or is this type of collective pessimistic ideation normal?
If the balance towards pessimism is typical, is the promise of salvation in the afterlife in e.g. Christianity a rare example of a powerful and salient positive ideation about our futures (conditioned on some behavior)?
From personal observation, kids learn text (say, from a children's book, and from songs) back-to-front. That is, the adult will say all but the last word in the sentence, and the kid will (eventually) learn to chime in to complete the sentence.
This feels correlated to LLMs learning well when tasked with next-token prediction, and those predictions being stronger (less uniform over the vocabulary) when the preceding sequences get longer.
I wonder if there's a connection to having rhyme "live" in the last sound of each line, as opposed to the first.
Kind of related Quanta article from a few days ago: https://www.quantamagazine.org/what-your-brain-is-doing-when-youre-not-doing-anything-20240205/
For what it's worth (perhaps nothing) in private experiments I've seen that in certain toy (transformer) models, task B performance gets wiped out almost immediately when you stop training on it, in situations where the two tasks are related in some way.
I haven't looked at how deep the erasure is, and whether it is far easier to revive than it was to train it in the first place.
Reflecting on the particular ways that perfectionism differs from the optimal policy (as someone who suffers from perfectionism) and looking to come up with simple definitions, I thought of this:
So, perfectionism will be maximally costly in an environment where you have l...
The parallel to athlete pre game rituals is an interesting one, but I guess I'd be interested in seeing the comparison between the following two groups:
group A: is told to meditate the usual way for 30 minutes / day, and does
group B: is told to just sit there for 30 minutes / day, and does
So both of the groups considered are sitting quietly for 30 minutes, but one group is meditating while the other is just sitting there. In this comparison, we'd be explicitly ignoring the benefit from meditation which acts via the channel of just making it more likely you actually sit there quietly for 30 minutes.
Is meditation provably more effective than "forcing yourself to do nothing"?
Much like sleep is super important for good cognitive (and, of course, physical) functioning, it's plausible that waking periods of not being stimulated (i.e. of boredom) are very useful for unlocking increased cognitive performance. Personally I've found that if I go a long time without allowing myself to be bored, e.g. by listening to podcasts or audiobooks whenever I'm in transition between activities, I'm less energetic, creative, sharp, etc.
The problem is that as a prescriptio...
To be sure, I'm not an expert on the topic.
Declines in male fertility I think are regarded as real, though I haven't examined the primary sources.
Regarding female fertility, this report from Norway outlines the trend that I vaguely thought was representative of most of the developed world over the last 100 years.
Female fertility is trickier to measure, since female fertility and age are strongly correlated, and women have been having kids later, so it's important (and likely tricky) to disentangle this confounder from the data.
Infertility rates are rising and nobody seems to quite know why. Below is what feels like a possible (trivial) explanation that I haven't seen mentioned anywhere.
I'm not in this field personally so it's possible this theory is out there, but asking GPT about it doesn't yield the proposed explanation: https://chat.openai.com/share/ab4138f6-978c-445a-9228-674ffa5584ea
Toy model:
Thanks for the thoughtful reply. I read the fuller discussion you linked to and came away with one big question which I didn't find addressed anywhere (though it's possible I just missed it!)
Looking at the human social instinct, we see that it indeed steers us towards not wanting to harm other humans, but it weakens when extended to other creatures, somewhat in proportion to their difference from humans. We (generally) have lots of empathy for other humans, less so for apes, less so for other mammals (who we factory farm by the billions without most people...
This is drifting a bit far afield from the neurobio aspect of this research, but do you have an opinion about the likelihood that a randomly sampled human, if endowed with truly superhuman powers, would utilize those powers in a way that we'd be pleased to see from an AGI?
It seems to me like we have many salient examples of power corrupting, and absolute power corrupting to a great degree. Understanding that there's a distribution of outcomes, do you have an opinion about the likelihood of benevolent use of great power, among humans?
This is not to say that...
Understood, and agreed, but I'm still left wondering about my question as it pertains to the first sigmoidal curve that shows STEM-capable AGI. Not trying to be nitpicky, just wondering how we should reason about the likelihood that the plateau of that first curve is not already far above the current limit of human capability.
A reason to think so may be something to do with irreducible complexity making things very hard for us at around the same level that it would make them hard for a (first-gen) AGI. But a reason to think the opposite would be that we ha...
As a result, rather than indefinite and immediate exponential growth, I expect real-world AI growth to follow a series of sigmoidal curves, each eventually plateauing before different types of growth curves take over to increase capabilities based on different input resources (with all of this overlapping).
Hi Andy - how are you gauging the likely relative proportions of AI capability sigmoidal curves relative to the current ceiling of human capability? Unless I'm misreading your position, it seems like you are presuming that the sigmoidal curves will...
Might LLMs help with this? You could have a 4.3 million word conversation with an LLM (with longer context windows than what's currently available) which could then, in parallel, have similarly long conversations with arbitrarily many members of the organization, adequately addressing specific confusions individually, and perhaps escalating novel confusions to you for clarification. In practice, until the LLMs become entertaining enough, members of the organization may not engage for long enough, but perhaps this lack of seductiveness is temporary.
There’s a particular type of cognitive failure that I reliably experience, which seems like a pure kind of misconfiguration of the mind, and which I've found very difficult to will myself to not experience, which feels like some kind of fundamental limitation.
The quickest way to illustrate this is with an example: I'm playing a puzzle game that requires ordering 8 letters into a word, and I'm totally stuck. As soon as I look at a hint of what the first letter is, I can instantly find the word.
This seems wrong. In theory, I expect I can just iterate through... (read more)