LESSWRONG
LW

All of Alex Flint's Comments + Replies

Announcement: Learning Theory Online Course

Got it! Good to know.

Announcement: Learning Theory Online Course

Thanks! We were wondering about that. Is there any way we could be changed to the frontpage category?

2kave3mo

The general rule is roughly "if you write a frontpage post which has an announcement at the end, that can be frontpaged". So for example, if you wrote a post about the vision for Online Learning, that included as a relatively small part the course announcement, that would probably work. By the way, posts are all personal until mods process them, usually around twice a day. So that's another reason you might sometimes see posts landing on personal for awhile.

Announcement: Learning Theory Online Course

Alex Flint3mo110

It's the latter.

The ground of optimization

Alex Flint1y20

A bomb would not be an optimizing system, because the target space is not small compared to the basin of attraction. An AI that systematically dismantles things would be an optimizing system if for no other reason than that the AI systematically preserves its own integrity.

The ground of optimization

Alex Flint1yΩ360

It's worse, even, in a certain way, than that: the existence of optimizing systems organized around a certain idea of "natural class" feeds back into more observers observing data that is distributed according to this idea of "natural class", leading to more optimizing systems being built around that idea of "natural class", and so on.

Once a certain idea of "natural class" gains a foothold somewhere, observers will make real changes in the world that further suggest this particular idea of "natural class" to others, and this forms a feedback loop.

Teleosemantics!

Alex Flint2yΩ240

If you pin down what a thing refers to according to what that thing was optimized to refer to, then don't you have to look at the structure of the one who did the optimizing in order to work out what a given thing refers to? That is, to work out what the concept "thermodynamics" refers to, it may not be enough to look at the time evolution of the concept "thermodynamics" on its own, I may instead need to know something about the humans who were driving those changes, and the goals held within their minds. But, if this is correct, then doesn't it raise anot... (read more)

2Ramana Kumar2y

The trick is that for some of the optimisations, a mind is not necessary. There is a sense perhaps in which the whole history of the universe (or life on earth, or evolution, or whatever is appropriate) will become implicated for some questions, though.

Here's the exit.

Alex Flint2y62

There seems to be some real wisdom in this post but given the length and title of the post, you haven't offered much of an exit -- you've just offered a single link to a youtube channel for a trauma healer. If what you say here is true, then this is a bit like offering an alcoholic friend the sum total of one text message containing a single link to the homepage of alcoholics anonymous -- better than nothing, but not worthy of the bombastic title of this post.

5Valentine2y

If someone feels resonance with what I'm pointing out but needs more, they're welcome to comment and/or PM me to ask for more.

Beginning to feel like a conspiracy theorist

Alex Flint2y50

friends and family significantly express their concern for my well being

What exact concerns do they have?

8Mitchell_Porter2y

Thinking that psychedelics are safe, or that masks are useless against Covid, seem like the beliefs most likely to trigger concern...

SolidGoldMagikarp (plus, prompt generation)

Alex Flint2y21

Wow, thank you for this context!

0Sparkette2y

You’re very welcome! Happy to help.

Fucking Goddamn Basics of Rationalist Discourse

Alex Flint2y120

You don't get to fucking assume any shit on the basis of "but... ah... come on". If you claim X and someone asks why, then congratulations now you're in a conversation. That means maybe possible shit is about to get real, like some treasured assumptions might soon be questioned. There are no sarcastic facial expressions or clever grunts that get you an out from this. You gotta look now at the thing itself.

1SomeoneYouOnceKnew2y

Can you link to another conversation on this site where this occurs?

I don't think MIRI "gave up"

Alex Flint2y1617

I just want to acknowledge the very high emotional weight of this topic.

For about two decades, many of us in this community have been kind of following in the wake of a certain group of very competent people tackling an amazingly frightening problem. In the last couple of years, coincident with a quite rapid upsurge in AI capabilities, that dynamic has really changed. This is truly not a small thing to live through. The situation has real breadth -- it seems good to take it in for a moment, not in order to cultivate anxiety, but in order to really engage w... (read more)

Logical induction for software engineers

Alex Flint2yΩ240

That is correct. I know it seems little weird to generate a new policy on every timestep. The reason it's done that way is that the logical inductor needs to understand the function that maps prices to the quantities that will be purchased, in order to solve for a set of prices that "defeat" the current set of trading algorithms. That function (from prices to quantities) is what I call a "trading policy", and it has to be represented in a particular way -- as a set of syntax tree over trading primitives -- in order for the logical inductor to solve for pri... (read more)

2Chris_Leong2y

Thanks for the extra detail! (Actually, I was reading a post by Mark Xu which seems to suggest that the TradingAlgorithms have access to the price history rather than the update history as I suggested above)

How it feels to have your mind hacked by an AI

Alex Flint2y00

Thank you for this extraordinarily valuable report!

I believe that what you are engaging in, when you enter into a romantic relationship with either a person or a language model, is a kind of artistic creation. What matters is not whether the person on the "other end" of the relationship is a "real person" but whether the thing you create is of true benefit to the world. If you enter into a romantic relationship with a language model and produce something of true benefit to the world, then the relationship was real, whether or not there was a "real person" on the other end of it (whatever that would mean, even in the case of a human).

Worst-case thinking in AI alignment

Alex Flint2yΩ360Review for 2021 Review

This is a relatively banal meta-commentary on reasons people sometimes give for doing worst-case analysis, and the differences between those reasons. The post reads like a list of things with no clear through-line. There is a gesture at an important idea from a Yudkowsky post (the logistic success curve idea) but the post does not helpfully expound that idea. There is a kind of trailing-off towards the end of the post as things like "planning fallacy" seem to have been added to the list with little time taken to place them in the context of the other thing... (read more)

Grokking the Intentional Stance

Alex Flint2yΩ340Review for 2021 Review

Many people believe that they already understand Dennett's intentional stance idea, and due to that will not read this post in detail. That is, in many cases, a mistake. This post makes an excellent and important point, which is wonderfully summarized in the second-to-last paragraph:

In general, I think that much of the confusion about whether some system that appears agent-y “really is an agent” derives from an intuitive sense that the beliefs and desires we experience internally are somehow fundamentally different from those that we “merely” infer and a

Alex Flint2yΩ-1-10

Have you personally ever ridden in a robot car that has no safety driver?

Soares, Tallinn, and Yudkowsky discuss AGI cognition

Alex Flint2yΩ7120Review for 2021 Review

This post consists of comments on summaries of a debate about the nature and difficulty of the alignment problem. The original debate was between Eliezer Yudkowsky and Richard Ngo but this post does not contain the content from that debate. This posts is mostly of commentary by Jaan Tallinn on that debate, with comments by Eliezer.

The post provides a kind of fascinating level of insight into true insider conversations about AI alignment. How do Eliezer and Jaan converse about alignment? Sure, this is a public setting, so perhaps they communicate differentl... (read more)

1Noosphere892y

RE the FOOM debate: On this, I think the Hansonian viewpoint that takeoff would be gradual was way more correct than the discontinuous narrative of Eliezer, where AI progress in the real world follows more of a Hansonian path. Eliezer didn't get this totally wrong, and there are some results in AI showing that there can be phase transitions/discontinuities. But overall, a good prior for AI progress is that it will look like the Hansonian continuous progress rather than the FOOM of Eliezer.

4Raemon2y

I think I disagree with this characterization. A) we totally have robot cars by now, B) I think mostly what we don't have are AI running systems where the consequence of failure is super high (which maybe happens to be more true for the physical world, but I'd expect to also be true for critical systems in the digital world)

Agency in Conway’s Game of Life

Alex Flint2y20

Thanks for the note.

In Life, I don't think it's easy to generate an X-1 time state that leads to an X time state, unfortunately. The reason is that each cell in an X time state puts a logical constraint on 9 cells in an X-1 time state. It is therefore possible to set up certain constraint satisfaction problems in terms of finding an X-1 time state that leads to an X time state, and in general these can be NP-hard.

However, in practice, it is very very often quite easy to find an X-1 time state that leads to a given X time state, so maybe this experiment cou... (read more)

1michael_dello2y

It surprises me a little that there hasn't been more work on working backwards in Life. Perhaps it's just too hard/not useful given the number of possible X-1 time slices. With the smiley face example, there could be a very large number of combinations for the squares outside the smiley face at X-1 which result in the same empty grid space (i.e. many possible self-destructing patterns). I'm unreasonably fond of brute forcing problems like these. I don't know if I'd have anything useful to say on this topic that I haven't already, but I'm interested to follow this work. I think this is a fascinating analogy for the control problem. Edit - It just occurred to me, thanks to a friend, that instead of reverse engineering the desired state, it might be easier to just randomise the inputs until you get the outcome you want (not sure why this didn't occur to me). Still very intensive, but perhaps easier.

Agency in Conway’s Game of Life

Alex Flint2y30

Interesting. Thank you for the pointer.

The real question, though, is whether it is possible within our physics.

Agency in Conway’s Game of Life

Alex Flint2y40

Oh the only information I have about that is Dave Green's comment, plus a few private messages from people over the years who had read the post and were interested in experimenting with concrete GoL constructions. I just messaged the author of the post on the GoL forum asking about whether any of that work was spurred by this post.

Logical induction for software engineers

Alex Flint2yΩ240

Thanks - fixed! And thank you for the note, too.

ChatGPT struggles to respond to the real world

Alex Flint2yΩ341

Yeah it might just be a lack of training data in 10-second-or-less interactive instructions.

The thing I really wanted to test with this experiment was actually whether ChatGPT could engage with the real world using me as a guinea pig. The 10-second-or-less thing was just the format I used to try to "get at" the phenomenon of engaging with the real world. I'm interested in improving the format to more cleanly get at the phenomenon.

I do currently have the sense that it's more than just a lack of training data. I have the sense that ChatGPT has learned much l... (read more)

ChatGPT struggles to respond to the real world

Alex Flint2yΩ120

I asked a group of friends for "someone to help me with an AI experiment" and then I gave this particular friend the context that I wanted her help guiding me through a task via text message and that she should be in front of her phone in some room that was not the kitchen.

ChatGPT struggles to respond to the real world

Alex Flint2yΩ360

If you look at how ChatGPT responds, it seems to be really struggling to "get" what's happening in the kitchen -- it never really comes to the point of giving specific instructions, and especially never comes to the point of having any sense of the "situation" in the kitchen -- e.g. whet... (read more)

Coherent extrapolated dreaming

Alex Flint2yΩ121

I'm very interested in Wei Dai's work, but I haven't followed closely in recent years. Any pointers to what I might read of his recent writings?

I do think Eliezer tackled this problem in the sequences, but I don't really think he came to an answer to these particular questions. I think what he said about meta-ethics is that it is neither that there is some measure of goodness to be found in the material world independent from our own minds, nor that goodness is completely open to be constructed based on our whims or preferences. He then says "well there ju... (read more)

1Algon2y

Unfortunately, everything I'm thinking of was written as comments, and I can't remember when or where he wrote them. You'd have to talk to Wei Dai on the topic if you want an accurate summary of his position. Yeah, I agree he didn't seem to come to a conclusion. The most in depth examples I've seen on LW of people trying to answer these questions are about as in-depth as this post in lukeprog's no-nonsense metaethics sequence. Maybe there's more available, but I don't know how to locate it.

Coherent extrapolated dreaming

Alex Flint2y20

Recursive relevance realization seems to be designed to answer about the "quantum of wisdom".

It does! But... does it really answer the question? Curious about your thoughts on this.

2Slider2y

The high concepts seem high quality concept work and when trying to fill in details with imagniation it seems workable. But the details are not in yet. If one could brigde the gap from (something like) bayesian evidence updating that touches the lower points of RRR it woudl pretty much be it. But the details are not in yet.

Coherent extrapolated dreaming

Alex Flint2y20

you ask whether you are aligned to yourself (ideals, goals etc) and find that your actuality is not coherent with your aim

Right! Very often, what it means to become wiser is to discover something within yourself that just doesn't make sense, and then to in some way resolve that.

Discovering incoherency seems very different from keeping a model on coherence rails

True. Eliezer is quite vague about the term "coherent" in his write-ups, and some more recent discussions of CEV drop it entirely. I think "coherent" was originally about balancing the extrapo... (read more)

Coherent extrapolated dreaming

Alex Flint2yΩ120

Did you ever end up reading Reducing Goodhart?

Not yet, but I hope to, and I'm grateful to you for writing it.

processes for evolving humans' values that humans themselves think are good, in the ordinary way we think ordinary good things are good

Well, sure, but the question is whether this can really be done by modelling human values and then evolving those models. If you claim yes then there are several thorny issues to contend with, including what constitutes a viable starting point for such a process, what is a reasonable dynamic for such a process, and on what basis we decide the answers to these things.

Response to Holden’s alignment plan

Alex Flint2yΩ120

Wasn't able to record it - technical difficulties :(

Response to Holden’s alignment plan

Alex Flint2yΩ120

Yes, I should be able to record the discussion and post a link in the comments here.

Notes on OpenAI’s alignment plan

Alex Flint2yΩ460

If you train a model by giving it reward when it appears to follow a particular human's intention, you probably get a model that is really optimizing for reward, or appearing to follow said humans intention, or something else completely different, while scheming to seize control so as to optimize even more effectively in the future. Rather than an aligned AI.

Right yeah I do agree with this.

Perhaps instead you mean: No really the reward signal is whether the system really deep down followed the humans intention, not merely appeared to do so [...] That

... (read more)

Notes on OpenAI’s alignment plan

Alex Flint2yΩ120

Well even if language models do generalize beyond their training domain in the way that humans can, you still need to be in contact with a given problem in order to solve that problem. Suppose I take a very intelligent human and ask them to become a world expert at some game X, but I don't actually tell them the rules of game X nor give them any way of playing out game X. No matter how intelligent the person is, they still need some information about what the game consists of.

Now suppose that you have this intelligent person write essays about how one ough... (read more)

1NickGabs2y

This makes sense, but it seems to be a fundamental difficulty of the alignment problem itself as opposed to the ability of any particular system to solve it. If the language model is superintelligent and knows everything we know, I would expect it to be able to evaluate its own alignment research as well as if not better than us. The problem is that it can't get any feedback about whether its ideas actually work from empirical reality given the issues with testing alignment problems, not that it can't get feedback from another intelligent grader/assessor reasoning in a ~a priori way.

A challenge for AGI organizations, and a challenge for readers

Alex Flint2yΩ240

Here is a critique of OpenAI's plan

Agency in Conway’s Game of Life

Alex Flint2y*Ω240Review for 2021 Review

This is a post about the mystery of agency. It sets up a thought experiment in which we consider a completely deterministic environment that operates according to very simple rules, and ask what it would be for an agentic entity to exist within that.

People in the game of life community actually spent some time investigating the empirical questions that were raised in this post. Dave Greene notes:

The technology for clearing random ash out of a region of space isn't entirely proven yet, but it's looking a lot more likely than it was a year ago, that a work

... (read more)

Agency in Conway’s Game of Life

Alex Flint2yΩ120

Thanks for this note Dave

Beware over-use of the agent model

Alex Flint2yΩ240Review for 2021 Review

This post attempts to separate a certain phenomenon from a certain very common model that we use to understand that phenomenon. The model is the "agent model" in which intelligent systems operate according to an unchanging algorithm. In order to make sense of their being an unchanging algorithm at the heart of each "agent", we suppose that this algorithm exchanges inputs and outputs with the environment via communication channels known as "observations" and "actions".

This post really is my central critique of contemporary artificial intelligence discourse.... (read more)

Three enigmas at the heart of our reasoning

Alex Flint2y40Review for 2021 Review

This is an essay about methodology. It is about the ethos with which we approach deep philosophical impasses of the kind that really matter. The first part of the essay is about those impasses themselves, and the second part is about what I learned in a monastery about addressing those impasses.

I cried a lot while writing this essay. The subject matter -- the impasses themselves -- are deeply meaningful to me, and I have the sense that they really do matter.

It is certainly true that there are these three philosophical impasses -- each has been discussed in... (read more)

AI Risk for Epistemic Minimalists

Alex Flint2yΩ360Review for 2021 Review

This post trims down the philosophical premises that sit under many accounts of AI risk. In particular it routes entirely around notions of agency, goal-directedness, and consequentialism. It argues that it is not humans losing power that we should be most worried about, but humans quickly gaining power and misusing such a rapid increase in power.

Re-reading the post now, I have the sense that the arguments are even more relevant than when it was written, due to the broad improvements in machine learning models since it was written. The arguments in this po... (read more)

Logical induction for software engineers

Alex Flint2yΩ360

Thanks Scott

Conjecture: a retrospective after 8 months of work

Alex Flint2yΩ81512

Thanks for writing this.

Alignment research has a track record of being a long slow slog. It seems that what we’re looking for is a kind of insight that is just very very hard to see, and people who have made real progress seem to have done so through long periods of staring at the problem.

With your two week research sprints, how do you decide what to work on for a given sprint?

Charging for the Dharma

Alex Flint2y42

Well suffering is a real thing, like bread or stones. It's not a word that refers to a term in anyone's utility function, although it's of course possible to formulate utility functions that refer to it.

2Slider2y

I have understood it to be the experience of adversity. You are in pain and hate it, that is you suffer. If you are in pain and like it that is not suffering. If you knew that your choice would lead to your suffering you probably would not be making that choice. Hence the main way that problematic suffering is produced is not seeing the connection between choices and outcomes. "I went to avoid pain and now I am in pain, where did I go wrong?". This kind of problem would form even if the states that you find worth seeking and avoiding would be arbitrarily given at random, as long as you don't have an infinitely competent world model. The claim that some actions get you nearer to the arbitray goals and some get you away from them would still hold. Even if it would not refer to same states or concepts for different individuals evaluating the claim for their different arbitrary goals would still check this out. Putting a needle into themselfs makes one suffers and the other blisses out which compared to the boring option of not applying the needle shows that not all actions are equal for goal aquisition. So you might form a pro-needle or con-needle opinion and form a corresponding strategy. But then you might encounter something other like ice for which the needle stuff is inapplicable. But there seems to be an innate capability to "know" whether suffering occurs and this can be done before and independent of the formation of the opinions or strategies. Thus you might believe that you are a ice-blisser but then infact discover that you are an ice-sufferer. "utility function" might mean the functionality of the black box that reveals this goalnessness perception, "We are in a bad experience right now". Or "utility function" might refer to the opinion that you profess, "I am the kind of person that blisses about ice". Over time your opinions tend to grow (refer to more stuff and make finer distinctions). It is plausible or atleast imaginable that the innate goal experie

EA (& AI Safety) has overestimated its projected funding — which decisions must be revised?

Alex Flint2y20

The direct information I'm aware of is (1) CZ's tweets about not acquiring, (2) SBF's own tweets yesterday, (3) the leaked P&L doc from Alameda. I don't think any of these are sufficient to decide "SBF committed fraud" or "SBF did something unethical". Perhaps there is additional information that I haven't seen, though.

(I do think that if SBF committed fraud, then he did something unethical.)

0Lukas_Gloor2y

You have to be confident that no such information is available to say that it's too early for others to have made up their mind. It sounds like it's too early for you, but you don't know how much time others have spent following the situation. Obviously nothing is slam-dunk certain when the situation's still developing, but it's often the case that you can draw fairly strong conclusions based on a few unusual data points. You can't assess whether that's the case if you're not aware of all the data points that are out there.

Adversarial epistemology

Alex Flint2y20

If you view people as machiavelian actors using models to pursue goals then you will eventually find social interactions to be bewildering and terrifying, because there actually is no way to discern honesty or kindness or good intention if you start from the view that each person is ultimately pursuing some kind of goal in an ends-justify-means way.

But neither does it really make sense to say "hey let's give everyone the benefit of the doubt because then such-and-such".

I think in the end you have to find a way to trust something that is not the particular beliefs or goals of a person.

Charging for the Dharma

Alex Flint2y3-1

In Buddhist ideology, the reason to pick one set of values over another is to find an end to suffering. The Buddha claimed that certain values tended to lead towards the end of suffering and other values tended to lead in the opposite direction. He recommended that people check this claim for themselves.

In this way values are seen as instrumental rather than fundamental in Buddhism -- that is, Buddhists pick values on the basis of the consequences of holding those values, rather than any fundamental rightness of the values themselves.

Now you may say that t... (read more)

2Slider2y

Is suffering something other than experiencing ununderstood negative terms of your utility function?

EA (& AI Safety) has overestimated its projected funding — which decisions must be revised?

Answer by Alex FlintNov 11, 20224-6

There's mounting evidence that FTX was engaged in theft/fraud, which would be straightforwardly unethical.

I think it's way too early to decide anything remotely like that. As far as I understand, we have a single leaked balance sheet from Alameda and a handful of tweets from CZ (CEO of Binance) who presumably got to look at some aspect of FTX internals when deciding whether to acquire. Do we have any other real information?

8GeneSmith2y

Having spent the better part of the last three days looking into this, I disagree. FTX lent $10 billion out of $16 billion in customer assets to a hedge fund in which its CEO owned a 50% stake. It accepted at least $4 billion in collateral of its own token, FTT. The total circulating supply at the time was less than that. Exchanges are NEVER supposed to lend out customer funds without their consent, and it's clear FTX did that. What's more, they should not accept their own "stock" as collateral. Accepting your own stock as collateral is like opening the book of world-ending spells and randomly chanting incantations; there's a reason no one does it. Normally it's impossible to bankrupt a company by shorting its stock. But if the company in question holds its own stock as collateral, that changes. You can crash the price of stock (or a token in this case) by borrowing a ton of it and then selling it on the market. And now the company has a major problem: they've lent out user funds, but the collateral that was supposed to protect them in case the borrower defaulted is now worthless. If users become aware of the situation, a bank panic will ensue as everyone tries to get their money out before the bank (or exchange in this case) runs out. After that, all they will have left is their useless stock, which they can't redeem for assets they owe to users. If you have huge reserves, this wouldn't really matter much because you could pay them off with those other reserves. But when you've lent out over half of all user funds to this one borrower in particular without their knowledge, accepted collateral that is now worth nothing, and lied about it publicly on Twitter... You're in for a world of pain. Sam Bankman Fried did commit fraud, and knowingly lied about it to the public. I will happily eat my shorts if he turns out to be innocent. I think what he did is a crying shame. I deeply admired him before this all went down. He had created a money printing machine and w

3Cleo Nardo2y

Are you saying that it's too early to claim "SBF committed fraud", or "SBF did something unethical", or "if SBF committed fraud, then he did something unethical"? I think we have enough evidence to assert all three.

EA (& AI Safety) has overestimated its projected funding — which decisions must be revised?

Answer by Alex FlintNov 11, 202260

I'm curious about this too. I actually have the sense that overall funding for AI alignment was already larger than overall shovel-ready projects before FTX was involved. This is normal and expected in a field that many people is working on an important problem but where most of the work is funding for research, and where hardly anyone has promising scalable uses for money.

I think this led a lot of prizes being announced. A prize is a good way to fund if you don't see enough shovel-ready projects to exhaust your funding. You offer prizes for anyone who can... (read more)

Counterfactability

Alex Flint2yΩ480

Regarding your point on ELK: to make the output of the opaque machine learning system counterfactable, wouldn't it be sufficient to include the whole program trace? Program trace means the results of all the intermediate computations computed along the way. Yet including a program trace wouldn't help us much if we don't know what function of that program trace will tell us, for example, whether the machine learning system is deliberately deceiving us.

So yes it's necessary to have an information set that includes the relevant information, but isn't the main part of the (ELK) problem to determine what function of that information corresponds to the particular latent variable that we're looking for?

2Scott Garrabrant2y

I agree, this is why I said I am being sloppy with conflating the output and our understanding of the output. We want our understanding of the output to screen off the history.

Counterfactability

Alex Flint2yΩ5100

If I understand you correctly, the reason that this notion of counterfactable connects with what we normally call a counterfactual is that when an event screens of its own history, it's easy to consider other "values" of the "variable" underlying that event without coming into any logical contradictions with other events ("values of other variables") that we're holding fixed.

For example if I try to consider what would have happened if there had been a snow storm in Vermont last night, while holding fixed the particular weather patterns observed in Vermont ... (read more)

4Alexander Gietelink Oldenziel10mo

I would argue that this is not in fact the case. Our world is highly modular, much less entangled than other possible worlds [possibly because of anthropic reasons]. The way I think about it: in practice as you zoom in on a false counterfactual you will need to pay more and more bits for conflicting 'coincidences'.

2Scott Garrabrant2y

Yeah, remember the above is all for updateless agents, which are already computationally intractable. For updateful agents, we will want to talk about conditional counterfactability. For example, if you and I are in a prisoners dilemma, we could would conditional on all the stuff that happened prior to us being put in separate cells, and given this condition, the histories are much smaller. Also, we could do all of our reasoning up to a high level world model that makes histories more reasonably sized. Also, if we could think of counterfactability as a spectrum. Some events are especially hard to reason about, because there are lots of different ways we could have done it, and we can selectively add details to make it more and more counterfactable, meaning it approximately screens off its history from that which you care about.

Counterarguments to the basic AI x-risk case

Alex Flint2yΩ120

I expect you could build a system like this that reliably runs around and tidies your house say, or runs your social media presence, without it containing any impetus to become a more coherent agent (because it doesn’t have any reflexes that lead to pondering self-improvement in this way).

I agree, but if there is any kind of evolutionary variation in the thing then surely the variations that move towards stronger goal-directedness will be favored.

I think that overcoming this molochian dynamic is the alignment problem: how do you build a powerful system ... (read more)

Counterarguments to the basic AI x-risk case

Alex Flint2yΩ120

I really appreciate this post!

For instance, employers would often prefer employees who predictably follow rules than ones who try to forward company success in unforeseen ways.

Fascinatingly, EA employers in particular seem to seek employees who do try to forward organization goals in unforeseen ways!