All of michael_mjd's Comments + Replies

I think the state handling child rearing is the long term solution. The need for new people is a society wide problem and not ultimately one of personal responsibility.  Of course people should still be free to do it on their own if they want. It'll be weird that not everyone will have traditional parents, but I think we can figure it out. Maybe a mandatory or highly incentivized big brother/ sister program would help make it more nurturing.

Awesome ideas! These ideas are some of the things missing for LLMs to have economic impact. Companies expected them to just automate certain jobs, but that's an all or nothing solution that's never worked historically (until it eventually does, but we're not there yet).

One idea I thought of when reading Scott Aaronson's Reading Burden (https://scottaaronson.blog/?p=8217), is that people with interesting opinions and with somewhat of a public presence, have a TON of reading to do, not just to keep up with current events, but to observe people's reactions an... (read more)

I'll probably get disagree points, but wanted to share my reaction: I honestly don't mind the AI's output. I read it all and think it's just an elaboration of what you said. The only problem I noticed is it is too long.

Then again, I'm not an amazing writer, and my critical skills aren't so great for critiquing style. I will admit I rarely use assistance, because I have a tight set of points I want to include, and explaining them all to the AI is almost the same as writing the post itself.

Brief comments on what's bad about the output:

The instruction is to write an article arguing that AI-generated posts suffer from verbosity, hedging, and unclear trains of thought. But ChatGPT makes that complaint in a single sentence in the first paragraph and then spends 6 paragraphs adding a bunch of its own arguments:

  1. that the "nature of conversation itself" draws value from "human experience, emotion, and authenticity" that AI content replaces with "a hollow imitation of dialogue"
  2. that AI content creates "an artificial sense of expertise," i.e. that a du
... (read more)
5Dagon
Thank you for saying this!  It's easy to have a very limited typical-mind-fallacy view of LessWrong readers, and hearing about preferences very different from my own is extremely important. Depending on your skill with writing clear, concise English, this may be true.  For many, it may be that the effort level is the same between using AI well and just writing it yourself, but the effort type is different, and the quality is improved.  I think the potential value of LLM-assisted writing is very high, but it requires similar levels of clarity and attention to detail either way.  Low-effort posts will be remain low-value, high-effort posts could get quite a boost.

Thanks for this post! I have always been annoyed when on Reddit or even here, the response to poverty always goes back to, "but poor people have cell phones!" It all comes down to freedom -- the amount of meaningfully distinct actions one person can take in the world to accomplish their goals. If there are few real alternatives, and one's best options all involve working until exhaustion, it is not true freedom.

I agree, the poverty restoring equilibrium is more complex than probably UBI -- maybe it's part of Moloch. I think the rents increasing by the UBI ... (read more)

Not my worst prediction, given the latest news!

2O O
I predict the move to Texas will be largely fake and just whining to get CA politicians to listen to his policy suggestions. They will still have a large office in California. 

That's fair. Here are some things to consider:

1 - I think 2017 was not that long ago. My hunch is that the low level architecture of the network itself is not a bottleneck yet. I'd lean on more training procedures and algorithms. I'd throw RLHF and MoE as significant developments, and those are even more recent.

2 - I give maybe 30% chance of a stall, in the case little commercial disruption comes of LLMs. I think there will still be enough research going on at the major labs, and even universities at a smaller scale gives a decent chance at efficiency gain... (read more)

2eggsyntax
Both of those seem plausible, though the second point seems fairly different from your original 'time lines are fundamentally driven by scale and compute'.

I think it is plausible but not obvious if this is the case, that large language models have a fundamental issue with reasoning. However, I don't think this greatly impacts timelines. Here is my thinking:

I think time lines are fundamentally driven by scale and compute. We have a lot of smart people working on the problem, and there are a lot of obvious ways to address these limitations. Of course, given how research works, most of these ideas won't work, but I am skeptical of the idea that such a counter-intuitive paradigm shift is needed that nobody has e... (read more)

3eggsyntax
Two possible counterarguments: * I've heard multiple ML researchers argue that the last real breakthrough in ML architecture was transformers, in 2017. If that's the case, and if another breakthrough of that size is needed, then the base rate maybe isn't that high. * If LLMs hit significant limitations, because of the reasoning issue or because of a data wall, then companies & VCs won't necessarily keep pouring money into ever-bigger clusters, and we won't get the continued scaling you suggest.

Is there a post in the Sequences about when it is justifiable to not pursue going down a rabbit hole? It's a fairly general question, but the specific context is a tale as old as time. My brother, who has been an atheist for decades, moved to Utah. After 10 years, he now asserts that he was wrong and his "rigorous pursuit" of verifying with logic and his own eyes, leads him to believe the Bible is literally true. I worry about his mental health so I don't want to debate him, but felt like I should give some kind of justification for why I'm not personally ... (read more)

3kromem
If your brother has a history of being rational and evidence driven, you might encourage them to spend some time lurking on /r/AcademicBiblical on Reddit. They require citations for each post or comment, so he may be frustrated if he tries to participate, especially if in the midst of a mental health crisis. But lurking would be very informative very quickly. I was a long time participant there before leaving Reddit, and it's a great place for evidence driven discussion of the texts. Its a mix of atheists, Christians, Jews, Muslims, Norse pagans, etc. (I'm an Agnostic myself that strongly believes we're in a simulation, so it really was all sorts there.) Might be a healthy reality check to apologist literalism, even if not necessarily disrupting a newfound theological inclination. The nice things about a rabbit hole is that while not always, it's often the case that someone else has traveled down whatever one you aren't up for descending into. (Though I will say in its defense, that particular field is way more interesting than you'd ever think if you never engaged with the material through an academic lens. There's a lot of very helpful lessons in critical analysis wrapped up in the field given the strong anchoring and survivorship biases and how that's handled both responsibly and irresponsibly by different camps.)
2gilch
Theism is a symptom of epistemic deficiency. Atheism follows from epistemic sufficiency, but not all atheists are rational or sane. The epistemically virtuous do not believe on insufficient evidence, nor ignore or groundlessly dismiss evidence relevant to beliefs they hold. That goes for both of you. The Litany of Tarsky is the correct attitude for a rationalist, and it's about not thumbing the scales. If your brother were sane (to rationalist standards), he would not hold such a belief, given the state of readily available evidence. If he hasn't figured this out, it's either because he's put his thumb on the scales or refuses to look. Organized religions (that have survived) teach their adherents not to look (ironically), and that it is virtuous to thumb the scales (faith), and that is something they have in common with cults, although not always to the same degree. These tactics are dark arts—symmetric weapons, that can promote any other beliefs (false or otherwise) just as easily. If you feel like talking to him about it, but don't want it to devolve into a debate, Street Epistemology is a pretty good approach. It can help dislodge irrational beliefs without attacking them directly, by instead promoting better epistemics (by Socratically poking holes in bad epistemics). To answer your direct question, I think Privileging the Hypothesis is pretty relevant. Einstein's Arrogance goes into more detail about the same key rationality concept of locating the hypothesis.
5nim
More concrete than your actual question, but there's a couple options you can take: * acknowledge that there's a form of social truth whereby the things people insist upon believing are functionally true. For instance, there may be no absolute moral value to criticism of a particular leader, but in certain countries the social system creates a very unambiguous negative value to it. Stick to the observable -- if he does an experiment, replicate that experiment for yourself and share the results. If you get different results, examine why. IMO, attempting in good faith to replicate whatever experiments have convinced him that the world works differently from how he previously thought would be the best steelman for someone framing religion as rationalism. * There is of course the "which bible?" question. Irrefutable proof of the veracity of the old testament, if someone had it, wouldn't answer the question of which modern religion incorporating it is "most correct". * It's entirely valid and consistent with rationalism to have the personal preference to not accept any document as fully and literally true. If you can gently find out how he handles the internal contradictions (https://en.wikipedia.org/wiki/Internal_consistency_of_the_Bible), you've got a ready-made argument for taking some things figuratively. And as unsolicited social advice, distinct from the questions of rationalism -- don't strawman him into someone who criticizes your atheism until he as an actual human tells you what if any actual critiques he has. That's not nice. What is nice is to frame it as a harm reduction option, because organized religion can be great for some people with mental health struggles, and tell him the truth about what you see in his current behavior that you like and support. For instance if his church gets him more involved with the community, or encourages him to do more healthy behaviors or less unhealthy ones, maintain common ground by endorsing the outcomes of his bel

I would pay to see this live at a bar or one of those county fair (we had a GLaDOS cover band once so it's not out of the question)

If we don't get a song like that, take comfort that GLaDoS's songs from the Portal soundtrack are basically the same idea as the Sydney reference. Link: https://www.youtube.com/watch?v=dVVZaZ8yO6o

Let me know if I've missed something, but it seems to me the hard part is still defining harm. In the one case, where we will use the model and calculate the probability of harm, if it has goals, it may be incentivized to minimize that probability. In the case where we have separate auxiliary models whose goals are to actively look for harm, then we have a deceptively adversarial relationship between these. The optimizer can try to fool the harm finding LLMs. In fact, in the latter case, I'm imagining models which do a very good job at always finding some ... (read more)

That's fair, I read the post but did not re-read it, and asking for "more" examples out of such a huge list seems a bit asking too much. Still though, I find the process of finding these examples somewhat fun, and for whatever reason, had not found many of them too shocking, so felt the instinct to keep searching.

Dissociative identity disorder would be an interesting case, I have heard there was much debate on whether it was real. As you know someone, I assume it's not exactly like you see in movies, and probably falls on a spectrum as discussed in this post?

One fear I have is that the open source community will come out ahead, and push for greater weight sharing of very powerful models.

Edit: To make more specific, I mean that the open source community will become more attractive, because they will say, you cannot rely on individual companies whose models may or may not be available. You must build on top of open source. Related tweet:

https://twitter.com/ylecun/status/1726578588449669218

Whether their plan works or not, dunno.

One thing that would help me, not sure if others agree -- would be some more concrete predictions. I think the historical examples of autism and being gay make sense, but are quite normalized now, that one can almost say, "That was previous generations. We are open minded and rational now". What are some new applications of this logic, that would surprise us? Are these omitted due to some info hazard? Surely we can find some that are not. I am honestly having a hard time coming up with them myself, but here goes:

  • There are more regular people who believe AI
... (read more)
5ChristianKl
Crime does not need to be perfect to be undetected. We have a good idea of base rates of murder and burglary, so I would expect that most people not know one of those. Burglary also doesn't seem to be a one-of-crime but mostly done by organized gangs. Embezzlement on the other hand happens in different strengths. Plenty of employees embezzle pens and paper from their employers  Theft from supermarkets would be one example. 1/4 Britains admit to stealing at self-service checkouts. You likely know more thieves than you think.  Different kinds of fraud also happen more often relative to their visibility. That likely includes subjects like false data in scientific publications. 
5CronoDAS
It's probably not actually that hard to get away with one burglary, but the more crimes you commit, the more likely you are to get caught for at least one of them: if you roll dice enough times, eventually they come up all 1s.

It's less "you probably know a burglar" and more "successful burglaries are probably 10-100x more common than you would think, if you based your prediction solely on visible evidence."

The two that seem most obvious to me: well-behaved psychopaths (i.e. people who have little or no empathetic response but who have learned to follow the social rules anyway, for the sake of headache avoidance) and non-practicing pedophiles (i.e. people who are attracted to children but are zero percent interested in raping anyone) probably really actually are quite common.

Does the list starting shortly after "Here are some examples of things" not fit that desire for more examples?

I think the trouble is that this theory predicts everything one can offer as an example of dark matter either 1. Something that used to be very discouraged and now isn't as discouraged, like being gay or 2. Something that is currently very discouraged and therefore hidden, like being a sex worker. "What am I not seeing evidence for" is a hard thing to notice!

As far as I know, I don't personally know anyone HIV positive. Maybe that's because there a... (read more)

I share your disagreement with the original author as to the cause of the relief. For me, I find the modern day and age very confusing and difficult to measure one's value to society. Any great idea you can think of, probably someone else has thought of it, and you have little chance to be important. In a zombie apocalypse, instead of thinking how to out-compete your fellow man with some amazing invention, you fall back to survival. Important things in this world, like foraging for food, fending off zombies, etc, have quicker reward, and it's easier in som... (read more)

If we know they aren't conscious, then it is a non-issue. A random sample from conscious beings would land on the SAI with probability 0. I'm concerned we create something accidently conscious. 

I am skeptical it is easy to avoid. If it can simulate a conscious being, why isn't that simulation conscious? If consciousness is a property of the physical universe, then an isomorphic process would have the same properties. And if it can't simulate a conscious being, then it is not a superintelligence.

It can, however, possibly have a non-conscious outer-program... and avoid simulating people. That seems like a reasonable proposal.

Agree. Obviously alignment is important, but it has always creeped me out in the back of my mind, some of the strategies that involve always deferring to human preferences. It seems strange to create something so far beyond ourselves, and have its values be ultimately that of a child or a servant. What if a random consciousness sampled from our universe in the future, comes from it with probability almost 1? We probably have to keep that in mind too. Sigh, yet another constraint we have to add!

6Zac Hatfield-Dodds
Would you say the same of a steam engine, or Stockfish, or Mathematica? All of those vastly exceed human performance in various ways! I don't see much reason to think that very very capable AI systems are necessarily personlike or conscious, or have something-it-is-like-to-be-them - even if we imagine that they are designed and/or trained to behave in ways compatible with and promoting of human values and flourishing. Of course if an AI system does have these things I would also consider it a moral patient, but I'd prefer that our AI systems just aren't moral patients until humanity has sorted out a lot more of our confusions.
4dr_s
At which point maybe the moral thing is to not build this thing.

Hi Critch,

I am curious to hear more of your perspectives, specifically on two points I feel least aligned with, the empathy part, and the Microsoft part. If I hear more I may be able to update in your direction.

Regarding empathy with people working on bias and fairness, concretely, how do you go about interacting with and compromising with them?

My perspective: it's not so much that I find these topics not sufficiently x-risky (but that is true, too), but it is that I perceive a hostility to the very notion of x-risk from at a subset of this same group. The... (read more)

This might be a good time for me to ask a basic question on mechanistic interpretability:

Why does targeting single neurons work? Does it work? One would think that if there is a single dimensional quantity to measure, why would it align with the standard basis? Why wouldn't it be aligned to a random one dimensional linear subspace? Then, examining single neurons is likely to give you some weighted combination of concepts instead, rather than a single interpretation...

8Ben Amitay
It's not a full answer, but: To the degree that it is true that the quantities align with the standard basis, it must be somehow a result of asymmetry of the activation. For example ReLU trivially depend on the choice of basis. If you focus on the ReLU example, it sort of make sense: if multiple non-related concepts express in the same neuron, and one of them push the neuron in the negative direction, it may make the ReLU destroy information of the other concepts.

Those are good questions! There's some existing research which address some of your questions.

Single neurons often do represent multiple concepts: https://transformer-circuits.pub/2022/toy_model/index.html

It seems to still be unclear why the dimensions are aligned with the standard basis: https://transformer-circuits.pub/2023/privileged-basis/index.html

Fascinating, thanks for the research. Your analysis makes sense and seems to indicate that for most situations, prompt engineering is the always the first plan of attack and often works well enough. Then, a step up from there, OpenAI/etc would most likely experiment with fine-tuning or RLHF as it relates to a specific business need. To train a better chatbot and fill in any gaps, they probably would get more bang for their buck on simply fine-tuning it on a large dataset that matched their needs. For example, if they wanted to do better mathematical reason... (read more)

I agree with the analysis of the ideas overall. I think however, AI x-risk does have some issue regarding communications. First of all, I think it's very unlikely that Yann will respond to the wall of text. Even though he is responding, I imagine him more to be on the level of your college professor. He will not reply to a very detailed post. In general, I think that AI x-risk should aim to explain a bit more, rather than to take the stance that all the "But What if We Just..." has already been addressed. It may have been, but this is not the way to gettin... (read more)

5AnthonyC
This understanding has so far proven to be very shallow and does not actually control behavior, and is therefore insufficient. Users regularly get around it by asking the AI to pretend to be evil, or to write a story, and so on. It is demonstrably not robust. It is also demonstrably very easy for minds (current-AI, human, dog, corporate, or otherwise) to know things and not act on them, even when those actions control rewards.  If I try to imagine LeCun not being aware of this already, I find it hard to get my brain out of Upton Sinclair "It is difficult to get a man to understand something, when his salary depends on his not understanding it," territory.

Essentially yes, heh. I take this as a learning experience for my writing, I don't know what I was thinking, but it is obvious in hindsight that saying to just "switch on backprop" sounds very naive.

I also confess I haven't done the due diligence to find out what the actual largest model that has been tried with this, whether someone has tried it with Pythia or LLaMa. I'll do some more googling tonight.

One intuition why the largest models might be different, is that part of the training/fine-tuning going on will have to do with the model's own output. The largest models are the ones where the model's own output is not essentially word salad.

I have noted the problem of catastrophic forgetting in the section "why it might not work". In general I agree continual learning is obviously a thing, otherwise I would not have used the established terminology. What I believe however is that the problems we face in continual learning in e.g. a 100M BERT model may not be the same as what we observe in models that can now meaningfully self critique. We have explored this technique publicly, but have we tried it with GPT-4? The publicly part was really just a question of whether OpenAI actually did it on this model or not, and it would be an amazing data point if they could say "We couldn't get it to work."

6faul_sname
Ah, so the point was whether that had been explored publicly on the very largest language models that exist, because of the whole "sometimes approaches that didn't work at small scale start working when you throw enough compute at them" thing? Makes sense.

It's possible it's downvoted because it might be considered dangerous capability research. It just seems highly unlikely that this would not be one of many natural research directions perhaps already attempted, and I figure we might as well acknowledge it and find out what it actually does in practice.

Or maybe downvotes because it "obviously won't work", but I think it's not obvious to me and would welcome discussion on that.

3Chris_Leong
I'm worried that no matter how far we go, the next step will be one of the natural research directions.

Thanks, this is a great analysis on the power of agentized LLMs, which I probably need to spend some more time thinking about. I will work my way through the post over the next few days. I briefly skimmed the episodic memory section for now, and I see it is like an embedding based retrieval system for past outputs/interactions of the model, reminiscent of the way some Helper chatbots look up stuff from FAQs. My overall intuitions on this:

  • It's definitely something, but the method of embedding and retrieval, if static, would be very limiting
  • Someone will prob
... (read more)

Very interesting write up. Do you have a high level overview of why, despite all of this, P(doom) is still 5%? What do you still see as the worst failure modes?

Noticed this as well. I tried to get it to solve some integration problems, and it could try different substitutions and things, but if they did not work, it kind of gave up and said to numerically integrate it. Also, it would make small errors, and you would have to point it out, though it was happy to fix them.

I'm thinking that most documents it reads tend to omit the whole search/backtrack phase of thinking. Even work that is posted online that shows all the steps, usually filters out all the false starts. It's like how most famous mathematicians were known for throwing away their scratchwork, leaving everyone to wonder how exactly they formed their thought processes...

The media does have its biases but their reaction seems perfectly reasonable to me. Occam's razor suggests this is not only unorthodox, but shows extremely poor judgment. This demonstrates that (a) either Elon is actually NOT as smart he has been hyped to be, or (b) there's some ulterior motive, but these are long-tailed.

Typically when one joins a company, you don't do anything for X number of months and get the lay of the land. I'm inclined to believe this is not just a local minimum, but typically close to the optimal strategy for a human being (but not ... (read more)

1michael_mjd
Not my worst prediction, given the latest news!

I'll say I definitely think it's too optimistic and I don't much too much stock into it. Still, I think it's worth thinking about.

Yes, absolutely we are not following the rule. The reason why I think it might change with an AGI: (1) currently we humans, despite what we say when we talk about aliens, still place a high prior on being alone in the universe, or from dominant religious perspectives, that we are the most intelligent. Those things combine to make us think there are no consequences to our actions against other life. An AGI, itself a proof of conc... (read more)

That is a very fair criticism. I didn't mean to imply this is something I was very confident in, but was interested in for three reasons:

1) This value function aside, is this a workable strategy, or is there a solid reason for suspecting the solution is all-or-nothing? Is it reasonable to 'look for' our values with human effort, or does this have to be something searched for using algorithms?
2) It sort of gives a flavor to what's important in life. Of course the human value function will be a complicated mix of different sensory inputs, reproduction, and g... (read more)

2Donald Hobson
At the moment, we don't know how to make an AI that does something simple like making lots of diamonds.  It seems plausible that making an AI that copies human values is easier than hardcoding even a crude approximation to human values. Or maybe not. 

I'm an ML engineer at a FAANG-adjacent company. Big enough to train our own sub-1B parameter language models fairly regularly. I work on training some of these models and finding applications of them in our stack. I've seen the light after I read most of Superintelligence. I feel like I'd like to help out somehow.  I'm in my late 30s with kids, and live in the SF bay area. I kinda have to provide for them, and don't have any family money or resources to lean on, and would rather not restart my career. I also don't think I should abandon ML and try to ... (read more)

3Adrià Garriga-alonso
You should apply to Anthropic. If you’re writing ML software at semi-FAANG. they probably want to interview you ASAP. https://www.lesswrong.com/posts/YDF7XhMThhNfHfim9/ai-safety-needs-great-engineers The compensation is definitely enough to take care of your family and then save some money!
1Yonatan Cale
Anthropic offer equity, they can give you more details in private.  I recommend applying to both (it's a cheap move with a lot of potential upside), let me know if you'd like help connecting to any of them. If you learn by yourself - I'd totally get one on one advise (others linked), people will make sure you're on the best path possible
3plex
One of the paths which has non-zero hope in my mind is building a weakly aligned non-self improving research assistant for alignment researchers. Ought and EleutherAI's #accelerating-alignment are the two places I know who are working in this direction fairly directly, though the various language model alignment orgs might also contribute usefully to the project.
5James_Miller
Work your way up the ML business  hierarchy to the point where you are having conversations with decision makers.  Try to convince them that unaligned AI is a significant existential risk.  A small chance of you doing this will in expected value terms more than make up for any harm you cause by working in ML given that if you left the field someone else would take your job.
5Linda Linsefors
Given where you live, I recomend going to some local LW events. There are still LW meetups in the Bay area, right?
7Adam Jermyn
Applying to Redwood or Anthropic seems like a great idea. My understanding is that they're both looking for aligned engineers and scientists and are both very aligned orgs. The worst case seems like they (1) say no or (2) don't make an offer that's enough for you to keep your lifestyle (whatever that means for you). In either case you haven't lost much by applying, and you definitely don't have to take a job that puts you in a precarious place financially.

You might want to consider registering for the AGI Safety Fundamentals Course (or reading through the content). The final project provides a potential way of dipping your toes into the water.

Both 80,000hours and AI Safety Support are keen to offer personalised advice to people facing a career decision and interested in working on alignment (and in 80k's case, also many other problems).

Noting a conflict of interest - I work for 80,000 hours and know of but haven't used AISS. This post is in a personal capacity, I'm just flagging publicly available information rather than giving an insider take.

7lc
Pragmatic AI safety (link: pragmaticaisafety.com) is supposed to be a good sequence for helping you figure out what to do. My best advice is to talk to some people here who are smarter than me and make sure you understand the real problems, because the most common outcome besides reading a lot and doing nothing is to do something that feels like work but isn't actually working on anything important.

Has there been effort into finding a "least acceptable" value function, one that we hope would not annihilate the universe or turn it degenerate, even if the outcome itself is not ideal? My example would be to try to teach a superintelligence to value all other agents facing surmountable challenges in a variety of environments. The degeneracy condition of this, is if it does not value the real world, will simply simulate all agents in a zoo. However, if the simulations are of faithful fidelity, maybe that's not literally the worst thing. Plus, the zoo, to truly be a good test of the agents, would approach being invisible.

1AprilSR
The obvious option in this class is to try to destroy the world in a way that doesn't send out an AI to eat the lightcone that might possibly contain aliens who could have a better shot. I am really not a fan of this option.
4Donald Hobson
This doesn't select for humanlike minds. You don't want vast numbers of Ataribots similar to current RL, playing games like pong and pac-man. (And a trillion other autogenerated games sampled from the same distribution)   Even if you could somehow ensure it was human minds playing these games, the line between a fun game and total boredom is complex and subtle.

I can see the argument of capabilities vs safety both ways. On the one hand, by working on capabilities, we may get some insights. We could figure out how much data is a factor, and what kinds of data they need to be. We could figure out how long term planning emerges, and try our hand at inserting transparency into the model. We can figure out whether the system will need separate modules for world modeling vs reward modeling.  On the other hand, if intelligence turns out to be not that hard, and all we need to do is train a giant decision transforme... (read more)

I think we are getting some information. For example, we can see that token level attention is actually quite powerful for understanding language and also images. We have some understanding of scaling laws. I think the next step is a deeper understanding of how world modeling fits in with action generation -- how much can you get with just world modeling, versus world modeling plus reward/action combined?

If the transformer architecture is enough to get us there, it tells us a sort of null hypothesis for intelligence -- that the structure for predicting seq... (read more)

Not rhetorically, what kind of questions you think would better lead to understanding how AGI works?

Suppose I'm designing an engine. I try out a new design, and it surprises me - it works much worse or much better than expected. That's a few bits of information. That's basically the sort of information we get from AI experiments today.

What we'd really like is to open up that surprising engine, stick thermometers all over the place, stick pressure sensors all over the place, measure friction between the parts, measure vibration, measure fluid flow and conce... (read more)

I think the desire works because most honest people know, if they give a good-sounding answer that is ultimately meaningless, no benefits will come of the answers given. They may eventually stop asking questions, knowing the answers are always useless. It's a matter of estimating future rewards from building relationships.

Now, when a human gives advice to another human, most of the time it is also useless, but not always. Also, it tends to not be straight up lies. Even in the useless case, people still think there is some utility in there, for example, hav... (read more)

One other thing I'm interested in, is there a good mathematical model of 'search'? There may not be an obvious answer. I just feel like there is some pattern that could be leveraged. I was playing hide and seek with my kids the other day, and noticed that, in a finite space, you expect there to be finite hiding spots. True, but every time you think you've found them all, you end up finding one more. I wonder if figuring out optimizations or discoveries follow a similar pattern. There are some easy ones, then progressively harder ones, but there are far more to be found than one would expect... so to model finding these over time, in a very large room...

1TAG
If there is a mathematical model of search, it is no good to you unless it is computable.

I agree, I have also thought I am not completely sure of the dynamics of the intelligence explosion. I would like to have more concrete footing to figure out what takeoff will look like, as neither fast nor slow are proved.

My intuition however is the opposite. I can't disprove a slow takeoff, but to me it seems intuitive that there are some "easy" modifications that should take us far beyond human level. Those intuitions, though they could be wrong, are thus:

- I feel like human capability is limited in some obvious ways. If I had more time and energy to fo... (read more)

AI existential risk is like climate change. It's easy to come up with short slogans that make it seem ridiculous. Yet, when you dig deeper into each counterargument, you find none of them are very convincing, and the dangers are quite substantial. There's quite a lot of historical evidence for the risk, especially in the impact humans have had on the rest of the world. I strongly encourage further, open-minded study.

1michael_mjd
For ML researchers.

It's easy to imagine that the AI will have an off switch, and that we could keep it locked in a box and ask it questions. But just think about it. If some animals were to put you in a box, do you think you would stay in there forever? Or do you think you'd figure a way out that they hadn't thought of?

1michael_mjd
Policy makers

AI x-risk. It sounds crazy for two reasons. One, because we are used to nothing coming close to human intelligence, and two, because we are used to AI being unintelligent. For the first, the only point of comparison is imagining something that is to us what we are to cats. For the second, though we have not quite succeeded yet, it only takes one. If you have been following the news, we are getting close.

1michael_mjd
Policy makers.

Yeah, I tend to agree. Just wanted to make sure I'm not violating norms. In that case, my specific thoughts are as follows, with a thought to implementing AI transparency at the end.

There is the observation that the transformer architecture doesn't have a hidden state like an LSTM. I thought for a while something like this was needed for intelligence, to have a compact representation of the state one is in. (My biased view, that I've updated away from, was that the weights represented HOW to think, and less about knowledge.) However, it's really intractabl... (read more)

I think this is absolutely correct. GPT-3/PaLM is scary impressive, but ultimately relies on predicting missing words, and its actual memory during inference is just the words in its context! What scares me about this is that I think there are some really simple low hanging fruit to modify something like this to be, at least, slightly more like an agent. Then plugging things like this as components into existing agent frameworks, and finally, having entire research programs think about it and experiment on it. Seems like the problem would crack. You never ... (read more)

5Lone Pine
My opinion is that you're not going to be able to crack the alignment problem if you have a phobia of infohazards. Essentially you need a 'Scout Mindset'. There's already smart people working hard on the problem, including in public such as on podcasts, so realistically the best (or worst) could do on this forum is attempt to parse out what is known publicly about the scary stuff (eg agency) from DeepMind's papers and then figure out if there is a path forward towards alignment.

As a ML engineer, I think it's plausible. I also think there are some other factors that could act to cushion or mitigate slowdown. First, I think there are more low hanging fruit available. Now that we've seen what large transformer models can do on the text domain, and in a text-to-image Dall-E model, I think the obvious next step is to ingest large quantities of video data. We often talk about the sample inefficiency of modern methods as compared with humans, but I think humans are exposed to a TON of sensory data in building their world model. This see... (read more)

Answer by michael_mjd30


I work at a large, not quite FAANG company, so I'll offer my perspective. It's getting there. Generally, the research results are good, but not as good as they sound in summary. Despite the very real and very concerning progress, most papers you take at face value are a bit hyped. The exceptions to some extent are the large language models. However, not everyone has access to these. The open source versions of them are good but not earth shattering. I think they might be if the goal is to general fluent sounding chatbots, but this is not the goal of most w... (read more)

I posted something I think could be relevant to this: https://www.lesswrong.com/posts/PfbE2nTvRJjtzysLM/instrumental-convergence-to-offer-hope

The takeaway is, for a sufficiently advanced agent, who wants to hedge against the possibility of itself being destroyed by a greater power, may decide the only surviving plan is to allow the lesser life forms some room to optimize their own utility. It's sort of an asymmetrical infinite game theoretic chain. If every agent kills lower agents, only the maximum survives and no one knows if they are the maximum. If there even is a maximum.

3Lone Pine
Interesting. I think this is the reason why people like equality and find Nietzsche so nauseating. (Nietzsche's vision, in my interpretation, was that people with the opportunities to dominate others should take those opportunities, even if it causes millions of average people to suffer.)

War. Poverty. Inequality. Inhumanity. We have been seeing these for millennia caused by nation states or large corporations. But what are these entities, if not greater-than-human-intelligence systems, who happen to be misaligned with human well-being? Now, imagine that kind of optimization, not from a group of humans acting separately, but by an entity with a singular purpose, with an ever diminishing proportion of humans in the loop.

Audience: all, but maybe emphasizing policy makers

Thanks for pointing to ECL, this looks fascinating!

Load More