LESSWRONG
LW

All of kwiat.dev's Comments + Replies

Trump "announces" a lot of things. It doesn't matter until he actually does them.

2Thane Ruthenis1mo

Well, he didn't do it yet either, did he? His new announcement is, likewise, just that: an announcement. Manifold is still 35% on him not following through on it, for example.

We don't want to post again "This might be the last AI Safety Camp"

kwiat.dev2mo180

While I participated in a previous edition, and somewhat enjoyed it, I couldn't bring myself to support it now considering Remmelt is the organizer, between his anti-AI-art crusades and an overall "stop AI" activism. It's unfortunate, since technical AI safety research is very valuable, but promoting those anti-AI initiatives makes it a probable net negative in my eyes.

Maybe it's better to let AISC die a hero.

9habryka2mo

I think "stop AI" is pretty reasonable and good, but I agree that Remmelt seems kind of like he has gone off the deep end and that is really the primary reason why I am not supporting AI Safety camp. I would consider filling the funding gap myself if I hadn't seen this happen. My best guess is AISC dying is marginally good, and someone else will hopefully pick up a similar mantle.

Ten counter-arguments that AI is (not) an existential risk (for now)

kwiat.dev7mo10

Because you could make the same argument could be made earlier in the "exponential curve". I don't think we should have paused AI (or more broadly CS) in the 50's, and I don't think we should do it now.

1arisAlexis7mo

but you are comparing epochs before and after Turing test passed. Isnt' that relevant? The Turing test unanimously was/is an inflection point and arguably most experts think we have already passed it in 2023.

Ariel Kwiatkowski's Shortform

kwiat.dev7mo-1-4

Modern misaligned AI systems are good, actually. There's some recent news about Sakana AI developing a system where the agents tried to extend their own runtime by editing their code/config.

This is amazing for safety! Current systems are laughably incapable of posing x-risks. Now, thanks to capabilities research, we have a clear example of behaviour that would be dangerous in a more "serious" system. So we can proceed with empirical research, create and evaluate methods to deal with this specific risk, so that future systems do not have this failure mode.

The future of AI and AI safety has never been brighter.

Ten arguments that AI is an existential risk

kwiat.dev7mo56

Expert opinion is an argument for people who are not themselves particularly informed about the topic. For everyone else, it basically turns into an authority fallacy.

What if a tech company forced you to move to NYC?

[+]kwiat.dev9mo-11-9

Why I'm not doing PauseAI

kwiat.dev10mo12

And how would one go about procuring such a rock? Asking for a friend.

AI Safety 101 : Capabilities - Human Level AI, What? How? and When?

kwiat.dev1y31

The ML researchers saying stuff like AGI is 15 years away have either not carefully thought it through, or are lying to themselves or the survey.

Ah yes, the good ol' "If someone disagrees with me, they must be stupid or lying"

6Nathan Helm-Burger1y

Rude of me to jump to that oh-so-self-flattering conclusion, yes. And certainly me saying that should not be taken as any sort of evidence in support of my view. Instead you should judge my view by: 1. My willingness to make an explicit concrete prediction and put money on it. Admittedly a trivial amount of money 8n this, but I've made much larger bets on the topic in the past. 2. The fact that my views are self-consistent and have remained fairly stable in response to evidence gathered over the past two years about AI progress. Stable views aren't necessarily a good thing, it could mean I'm failing to update! In this case, the evidence of the past two years confirms the predictions I publicly stated before that time, thus the stability of my prediction is a plus in this case. Contrast this with the dramatic change in the predictions I was criticizing, which came about because recent evidence strongly contradicted their previous views. 3. Note that my prediction of "AGI < 10 years" is consistent with my prediction of "and we should expect lots of far reaching changes, and novel dangers which will need careful measurement and regulation". As compared to the views of many of the ML experts saying "AGI > 15 years away", and also saying things like, "the changes will be relatively small. On the same order of change as the printing press and the Internet." and also "the risks aren't very high. Everything will probably be fine, and even if things go wrong, we can easily iteratively fix the problems with only minor negative consequences". I would argue that even if one held the view that AGI is > 15 years away (but less than 50), it would still not make sense to be so unworried about the potential consequences. I claim that that set of views is "insufficiently thought through", and that if forced to specify all the detailed pieces of their predictions in a lengthy written debate, those views would show themselves to be self-contradictory. I believe that my set of pred

My thoughts on the Beff Jezos - Connor Leahy debate

kwiat.dev1y7-1

For what it's worth, I think you're approaching this in good faith, which I appreciate. But I also think you're approaching the whole thing from a very, uh, lesswrong.com-y perspective, quietly making assumptions and using concepts that are common here, but not anywhere else.

I won't reply to every individual point, because there's lots of them, so I'm choosing the (subjectively) most important ones.

This is the actual topic. It's the Black Marble thought experiment by Bostrom,

No it's not, and obviously so. The actual topic is AI safety. It's not... (read more)

5gilch1y

I see what you're saying, and yes, fully general counterarguments are suspect, but that is totally not what Connor was doing. OK, sure, instrumental goals are not terminal values. Stopping AI progress is not a terminal value. It's instrumental, and hopefully temporary. Bostrom himself has said that stopping progress on AI indefinitely would be a tragedy, even if he does see the need for it now. That's why the argument can't be turned on Connor. The difference is, and this is critical, Beff's stated position (as far as Connor or I can tell) is that acceleration of growth equals the Platonic Good. This is not instrumental for Beff; he's claiming it's the terminal value in his philosophy, i.e., the way you tell what "Good" is. See the difference? Connor thinks Beff hasn't thought this through, and this would be inconsistent with Beff's moral intuitions if pressed. That's the Fisher-Price Nick Land comment. Nick bit the bullet and said all humans die is good, actually. Beff wouldn't even look.

2gilch1y

It is, and Connor said so repeatedly throughout the conversation. AI safety is a subtopic, a special case, of Connor's main thrust, albeit the most important one. (Machine transcript, emphasis mine.) Non-ergodicity, not necessarily AI: Connor explicitly calls out AGI as not his main point: Beff starts talking before he could finish, so skipping ahead a bit: This is Connor's mindset in the whole debate. Backing up a bit: Also the rolling death comment I mentioned previously. And the comment about crazy wackos.

Literally Everything is Infinite

kwiat.dev1y13

So I genuinely don't want to be mean, but this reminds me why I dislike so much of philosophy, including many chunks of rationalist writing.

This whole proposition is based on vibes, and is obviously false - just for sake of philosophy, we decide to ignore the "obvious" part, and roll with it for fun.

The chair I'm sitting on is finite. I may not be able to draw a specific boundary, but I can have a bounding box the size of the planet, and that's still finite.

My life as a conscious being, as far as I know, is finite. It started some years ago, it will ... (read more)

1Spiral1y

No, I don't take this as mean. Criticism was a big part of why I wanted to run it by LessWrong :) I agree that we are able to draw specific boundaries around "things," e.g. chair, world, a single day, a single consciousness, a single life - and that they are very helpful conceptual tools, especially it is how we experience reality at our "zoom" level of perception. However when we zoom in to the micro, we have never found the exact edge of something - the point of where an object ends. While I do think this boundary thinking is very rooted in the Aristotelean conceptual framework that lays out the foundation of much of the rest of our conceptual frameworks (especially in West), I'm sure that people and animals all over the world function with a very similar model, as it naturally provides a lot of utility. But the conceptual model we view the world with does affect how we can relate to it. Consider Thomas Kuhn's Paradigm Shifts - everything seems obvious within the conceptual framework we have, and we consider things to be "obviously not true" if they don't fit our current model. But that doesn't mean that our current model truly fits reality. You are correct that I may not have communicated this well. Or maybe that it simply isn't true. It makes a lot of sense the more I think about it, however I will keep thinking about how to communicate it better and how it can be challenged. I am also very aware of the "anything goes with Eastern philosophy" stereotype, don't worry, but the more time I spend with Daoism in particular, and have the ground assumptions of my Western upbringing challenged, the more it actually makes sense to my lived experience. What is making sense to me, more and more, is to view the "hard edge" of a chair as a slow transformation from chair molecules and space into other molecules and space. But the same can be said for anything else. I think if I was going to ask a key question, it would be "how do we actually know that there is a beginnin

This might be the last AI Safety Camp

kwiat.dev1y193

What are the actual costs of running AISC? I participated in it some time ago, kinda participating this year again (it's complicated). As far as I can tell, the only things that are required is some amount of organization, and then maybe a paid slack workspace. Is this just about salaries for the organizers?

Eli_1y142

The answer seems to be yes.

On the manifund page it says the following:

Virtual AISC - Budget version
Software etc
$2K
Organiser salaries, 2 ppl, 4 months
$56K
Stipends for participants
$0

Total $58K

In the Budget version, the organisers do the minimum job required to get the program started, but no continuous support to AISC teams during their projects and no time for evaluations and improvement for future versions of the program.
Salaries are calculated based on $7K per person per month.

Based on the minimum threshold of $28k, that woul... (read more)

“Why can’t you just turn it off?”

kwiat.dev1y0-6

Huh, whaddayaknow, turns out Altman was in the end pushed back, the new interim CEO is someone who is pretty safety-focused, and you were entirely wrong.

Normalize waiting for more details before dropping confident hot takes.

3Roko1y

It seems that I was mostly right in the specifics, there was a lot of resistance to getting rid of Altman and he is back (for now)

Dana1y110

You're not taking your own advice. Since your message, Ilya has publicly backed down, and Polymarket has Sam coming back as CEO at coinflip odds: Polymarket | Sam back as CEO of OpenAI?

9quetzal_rainbow1y

I should note that while your attitude is understandable, event "Roko said his confident predictions out loud" is actually good, because we can evaluate his overconfidence and update our models accordingly.

“Why can’t you just turn it off?”

kwiat.dev1y1714

The board has backed down after Altman rallied staff into a mass exodus

[citation needed]

I've seen rumors and speculations, but if you're that confident, I hope you have some sources?

(for the record, I don't really buy the rest of the argument either on several levels, but this part stood out to me the most)

4Roko1y

Well the board are in negotiations to have him back https://www.theverge.com/2023/11/18/23967199/breaking-openai-board-in-discussions-with-sam-altman-to-return-as-ceo "A source close to Altman says the board had agreed in principle to resign and to allow Altman and Brockman to return, but has since waffled — missing a key 5PM PT deadline by which many OpenAI staffers were set to resign. If Altman decides to leave and start a new company, those staffers would assuredly go with him."

kwiat.dev1y10

I'm never a big fan of this sort of... cognitive rewiring? Juggling definitions? This post reinforces my bias, since it's written from a point of very stong bias itself.

AI optimists think AI will go well and be helpful.

AI pessimists think AI will go poorly and be harmful.

It's not that deep.

The post itself is bordering on insulting anyone who has a different opinion than the author (who, no doubt, would prefer the label "AI strategist" than "AI extremists"). I was thinking about going into the details of why, but honestly... this is unlikely to be pro... (read more)

1[anonymous]1y

Thanks for the feedback, but I don't think it's about "cognitive rewiring." It's more about precision of language and comprehension. You said "AI optimists think AI will go well and be helpful," but doesn't everyone believe that is a possibility? The bigger question is what probability you assign to the "go well and be helpful" outcome. Is there anything we can do to increase the probability? What about specific policies? You say you're an "AI optimist," but I still don't know the scope of what that entails w/ specific policies. Does that mean you support open source AI? Do you oppose all AI regulations? What about an AI pause in development for safety? The terms "AI optimist" and "AI pessimist" don't tell me much on their own. One inspiration for my post is the now infamous exchange that went on between Yann LeCun and Yoshua Bengio. As I'm sure you saw, Yann LeCun posted this on his Facebook page (& reposted on X): "The heretofore silent majority of AI scientists and engineers who - do not believe in AI extinction scenarios or - believe we have agency in making AI powerful, reliable, and safe and - think the best way to do so is through open source AI platforms, NEED TO SPEAK UP !" https://www.facebook.com/yann.lecun/posts/pfbid02We6SXvcqYkk34BETyTQwS1CFLYT7JmJ1gHg4YiFBYaW9Fppa3yMAgzfaov7zvgzWl Yoshua Bengio replied as follows: Let me consider your three points. (1) It is not about 'believing' in specific scenarios. It is about prudence. Neither you nor anyone has given me any rational and credible argument to suggest that we would be safe with future unaligned powerful AIs and right now we do not know how to design such AIs. Furthermore, there are people like Rich Sutton who seem to want us humans to welcome our future overlords and may *give* the gift of self-preservation to future AI systems, so even if we did find a way to make safe AIs, we would still have a socio-political problem to avoid grave misuse, excessive power concentration and the eme

TOMORROW: the largest AI Safety protest ever!

kwiat.dev1y33

In what sense do you think it will (might) not go well? My guess is that it will not go at all -- some people will show up in the various locations, maybe some local news outlets will pick it up, and within a week it will be forgotten

3the gears to ascension1y

whether its impact is net good for the world. my impression that protests don't work may have come from anti-protest propaganda, I certainly don't have a clear sense of how I got this sense; but at the same time, I do have this sense.

EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem

[+]kwiat.dev1y-54

-3[anonymous]1y

i do endorse the actual meaning of what i wrote. it is not "insane" and to call it that is callous. i added the edit because i wasn't sure if expressions of stress are productive. i think there's a case to be made that they are when it clearly stems from some ongoing discursive pattern, so that others can know the pain that their words cause. especially given this hostile reaction. --- deleted the rest of this. there's no point for two alignment researchers to be fighting over oldworld violence. i hope this will make sense looking back.

EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem

kwiat.dev1y297

There's a pretty significant difference here in my view -- "carnists" are not a coherent group, not an ideology, they do not have an agenda (unless we're talking about some very specific industry lobbyists who no doubt exist). They're just people who don't care and eat meat.

Ideological vegans (i.e. not people who just happen to not eat meat, but don't really care either way) are a very specific ideological group, and especially if we qualify them like in this post ("EA vegan advocates"), we can talk about their collective traits.

-2[anonymous]1y

(edit: idk if i endorse comments like this, i was really stressed from the things being said in the comments here) People who fund the torture of animals are not a coherent group, not an ideology, they do not have an agenda. People who don't fund the torture of animals are a coherent group, an ideology, they have an agenda. People who keep other people enslaved are not a coherent group, not an ideology, they do not have an agenda. People who seek to end slavery are a coherent group, an ideology, they have an agenda. Normal people like me are not a coherent group, not an ideology, we do not have an agenda. Atypicals like you are a coherent group, an ideology, you have an agenda. maybe a future, better, post-singularity version of yourself will understand how terribly alienating statements like this are. maybe that person will see just how out-of-frame you have kept the suffering of other life forms to think this way. my agenda is that of a confused, tortured animal, crying out in pain. it is, at most, a convulsive reaction. in desperation, it grasps onto 'instrumental rationality' like the paws of one being pulled into rotating blades flail around them, looking for a hold to force themself back. and it finds nothing, the suffering persists until the day the world ends.

2Jacob Watts1y

While I agree that there are notable differences between "vegans" and "carnists" in terms of group dynamics, I do not think that necessarily disagrees with the idea that carnists are anti-truthseeking. It seems untrue that because carnists are not an organized physical group that has meetings and such, they are thereby incapable of having shared norms or ideas/memes. I think in some contexts it can make sense/be useful to refer to a group of people who are not coherent in the sense of explicitly "working together" or having shared newletters based around a subject or whatever. In some cases, it can make sense to refer to those people's ideologies/norms. Also, I disagree with the idea that carnists are inherently neutral on the subject of animals/meat. That is, they don't "not care". In general, they actively want to eat meat and would be against things that would stop this. That's not "not caring"; it is "having an agenda", just not one that opposes the current status quo. The fact that being pro-meat and "okay with factory farming" is the more dominant stance/assumed default in our current status quo doesn't mean that it isn't a legitimate position/belief that people could be said to hold. There are many examples of other memetic environments throughout history where the assumed default may not have looked like a "stance" or an "agenda" to the people who were used to it, but nonetheless represented certain ideological claims. I don't think something only becomes an "ideology" when it disagrees with the current dominant cultural ideas; some things that are culturally common and baked into people from birth can still absolutely be "ideology" in the way I am used to using it. If we disagree on that, then perhaps we could use a different term? If nothing else, carnists share the ideological assumption that "eating meat is okay". In practice, they often also share ideas about the surrounding philosophical questions and attitudes. I don't think it is beyond the pal

3tailcalled1y

Could you expand on why you think that it makes a significant difference? * E.g. if the goal is to model what epistemic distortions you might face, or to suggests directions of change for fewer distortions, then coherence is only of limited concern (a coherent group might be easier to change, but on the other hand it might also more easily coordinate to oppose change). * I'm not sure why you say they are not an ideology, at least under my model of ideology that I have developed for other purposes, they fit the definition (i.e. I believe carnism involves a set of correlated beliefs about life and society that fit together). * Also not sure what you mean by carnists not having an agenda, in my experience most carnists have an agenda of wanting to eat lots of cheap delicious animal flesh.

Elizabeth1y4122

TBF, the meat/dairy/egg industries are specific groups of people who work pretty hard to increase animal product consumption, and are much better resourced than vegan advocates. I can understand why animal advocacy would develop some pretty aggressive norms in the face of that, and for that reason I consider it kind of besides the point to go after them in the wider world. It would basically be demanding unilateral disarmament from the weaker side.

But the fact that the wider world is so confused there's no point in pushing for truth is the point. EA needs to stay better than that, and part of that is deescalating the arms race when you're inside its boundaries.

Paper: LLMs trained on “A is B” fail to learn “B is A”

kwiat.dev1y32

Is this surprising though? When I read the title I was thinking "Yea, that seems pretty obvious"

artifex01y111

Speaking for myself, I would have confidently predicted the opposite result for the largest models.

My understanding is that LLMs work by building something like a world-model during training by compressing the data into abstractions. I would have expected something like "Tom Cruise's mother is Mary Lee Pfeiffer" to be represented in the model as an abstract association between the names that could then be "decompressed" back into language in a lot of different ways.

The fact that it's apparently represented in the model only as that exact phrase (or maybe a... (read more)

5Owain_Evans1y

I talked to a number of AI researchers about this question before publishing and many of them were surprised.

Open Call for Research Assistants in Developmental Interpretability

kwiat.dev2y113

Often academics justify this on the grounds that you're receiving more than just monetary benefits: you're receiving mentorship and training. We think the same will be true for these positions.

I don't buy this. I'm actually going through the process of getting a PhD at ~40k USD per year, and one of the main reasons why I'm sticking with it is that after that, I have a solid credential that's recognized worldwide, backed by a recognizable name (i.e. my university and my supervisor). You can't provide either of those things.

This offer seems to take the worst of both worlds between academia and industry, but if you actually find someone good at this rate, good for you I suppose

kwiat.dev2y20

My point is that your comment was extremely shallow, with a bunch of irrelevant information, and in general plagued with the annoying ultra-polite ChatGPT style - in total, not contributing anything to the conversation. You're now defensive about it and skirting around answering the question in the other comment chain ("my endorsed review"), so you clearly intuitively see that this wasn't a good contribution. Try to look inwards and understand why.

Alignment Grantmaking is Funding-Limited Right Now

kwiat.dev2y60

It's really good to see this said out loud. I don't necessarily have a broad overview of the funding field, just my experiences of trying to get into it - both into established orgs, or trying to get funding for individual research, or for alignment-adjacent stuff - and ending up in a capabilities research company.

I wonder if this is simply the result of the generally bad SWE/CS market right now. People who would otherwise be in big tech/other AI stuff, will be more inclined to do something with alignment. Similarly, if there's less money in overall tech (maybe outside of LLM-based scams), there may be less money for alignment.

8RGRGRG2y

This is roughly my situation. Waymo froze hiring and had layoffs while continuing to increase output expectations. As a result I/we had more work. I left in March to explore AI and landed on Mechanistic Interpretability research.

kwiat.dev2y32

Is it a thing now to post LLM-generated comments on LW?

-11Past Account2y

kwiat.dev2y1913

If Orthogonal wants to ever be taken seriously, by far the most important thing is improving the public-facing communication. I invested a more-than-fair amount of time (given the strong prior for "it won't work" with no author credentials, proof-of-concepts, or anything that would quickly nudge that prior) trying to understand QACI, and why it's not just gibberish (both through reading LW posts and interacting with authors/contributors on the discord server), and I'm still mostly convinced there is absolutely nothing of value in this direction.

And n... (read more)

kwiat.dev2y10

When you say "X is not a paradox", how do you define a paradox?

MetaAI: less is less for alignment.

kwiat.dev2y63

Does the original paper even refer to x-risk? The word "alignment" doesn't necessarily imply that specific aspect.

4Cleo Nardo2y

Nope, no mention of xrisk — which is fine because "alignment" means "the system does what the user/developer wanted", which is more general than xrisk mitigation. But the paper's results suggest that finetuning is much worse than RLHF or ConstitutionalAI at this more general sense of "alignment", despite the claims in their conclusion.

Snake Eyes Paradox

kwiat.dev2y52

I feel like this is one of the cases where you need to be very precise about your language, and be careful not to use an "analogous" problem which actually changes the situation.

Consider the first "bajillion dollars vs dying" variant. We know that right now, there's about 8B humans alive. What happens if the exponential increase exceed that number? We probably have to assume there's an infinite number of humans, fair enough.

What does it mean that "you've chosen to play"? This implies some intentionality, but due to the structure of the game, where th... (read more)

2Martin Randall2y

I definitely agree on the need for care in switching between variants. It can also be helpful that they can "change the situation" because this can reveal something unspecified about the original variant. Certainly I was helped by making a second variant, as this clarified for me that the probabilities are different from the deity view vs the snake view, because of anthropics. In the original variant, it's not specified when exactly players get devoured. Maybe it is instant. Maybe everyone is given a big box that contains either a bazillion dollars, or human-eating snakes, and it opens exactly a year later. In my variant, I was imagining the god initially created a batch of snakes with uncolored eyes, then played dice, then gave them red or blue eyes. So the snakes, like the players, can have experiences prior to the dice being rolled. And yes, no snakes exist before I start. (why is the god wicked? No love for snakes...) I'll update the text to clarify that no snakes exist until the god of snake creation gets to work. I think this is a great crystallization of the paradox. In this scenario, it seems like I should believe I have a 1/36 chance of red eyes, and my new friend has a 1/2 chance of red eyes. But my friend has had exactly the same experiences as me, and they reason that the probabilities are reversed.

Think carefully before calling RL policies "agents"

kwiat.dev2y4-9

Counterpoint: this is needlessly pedantic and a losing fight.

My understanding of the core argument is that "agent" in alignment/safety literature has a slightly different meaning than "agent" in RL. It might be the case that the difference turns out to be important, but there's still some connection between the two meanings.

I'm not going to argue that RL inherently creates "agentic" systems in the alignment sense. I suspect there's at least a strong correlation there (i.e. an RL-trained agent will typically create an agentic system), but that's honestly be... (read more)

2TurnTrout2y

I'm... not demanding that the field of RL change? Where in the post did you perceive me to demand this? For example, I wrote that "I wouldn't say 'reinforcement function' in e.g. a conference paper." I also took care to write "This terminology is loaded and inappropriate for my purposes." Each individual reader can choose to swap to "policy" without communication difficulties, in my experience: (As an aside, I also separately wish RL would change its terminology, but it's a losing fight as you point out, and I have better things to do with my time.)

AGI safety career advice

kwiat.dev2y*30

I would be interested in some advice going a step further -- assuming a roughly sufficient technical skill level (in my case, soon-to-be PhD in an application of ML), as well as an interest in the field, how to actually enter the field with a full-time position? I know independent research is one option, but it has its pros and cons. And companies which are interested in alignment are either very tiny (=not many positions), or very huge (like OpenAI et al., =very selective)

Deep learning models might be secretly (almost) linear

kwiat.dev2y61

Isn't this extremely easy to directly verify empirically?

Take a neural network $f$ trained on some standard task, like ImageNet or something. Evaluate $|f(kx) - kf(x)|$ on a bunch of samples $x$ from the dataset, and $f(x+y) - f(x) - f(y)$ on samples $x, y$. If it's "almost linear", then the difference should be very small on average. I'm not sure right now how to define "very small", but you could compare it e.g. to the distance distribution $|f(x) - f(y)|$ of independent samples, also depending on what the head is.

FWIW my opinion is that all this "... (read more)

TurnTrout2y102

At least how I would put this -- I don't think the important part is that NNs are literally almost linear, when viewed as input-output functions. More like, they have linearly represented features (i.e. directions in activation space, either in the network as a whole or at a fixed layer), or there are other important linear statistics of their weights (linear mode connectivity) or activations (linear probing).

Maybe beren can clarify what they had in mind, though.

My experience getting funding for my biological research

kwiat.dev2y32

"Overall, it continually gets more expensive to do the same amount of work"

This doesn't seem supported by the graph? I might be misunderstanding something, but it seems like research funding essentially followed inflation, so it didn't get more expensive in any meaningful terms. The trend even seems to be a little bit downwards for the real value.

1Metacelsus2y

Grant award amounts remained the same adjusted for the biomedical price index, but the nominal amounts had to double to match the price index increases.

Ariel Kwiatkowski's Shortform

kwiat.dev5y40

Looking for research idea feedback:

Learning to manipulate: consider a system with a large population of agents working on a certain goal, either learned or rule-based, but at this point - fixed. This could be an environment of ants using pheromones to collect food and bring it home.

Now add another agent (or some number of them) which learns in this environment, and tries to get other agents to instead fulfil a different goal. It could be ants redirecting others to a different "home", hijacking their work.

Does this sound interesting? If it works, would it potentially be publishable as a research paper? (or at least a post on LW) Any other feedback is welcome!

2romeostevensit5y

This sounds interesting to me.

Draconarius's Shortform

kwiat.dev5y70

But isn't the whole point that the hotel is full initially, and yet can accept more guests?

2Matt Goldenberg5y

Yeah, the hotel being always half full no matter how many guests it has doesn't seem as cool.

Ariel Kwiatkowski's Shortform

kwiat.dev5y110

Has anyone tried to work with neural networks predicting the weights of other neural networks? I'm thinking about that in the context of something like subsystem alignment, e.g. in an RL setting where an agent first learns about the environment, and then creates the subagent (by outputting the weights or some embedding of its policy) who actually obtains some reward

Multi-agent safety

kwiat.dev5y30

This reminds me of an idea bouncing around my mind recently, admittedly not aiming to solve this problem, but possibly exhibiting it.

Drawing inspiration from human evolution, then given a sufficiently rich environment where agents have some necessities for surviving (like gathering food), they could be pretrained with something like a survival prior which doesn't require any specific reward signals.

Then, agents produced this way could be fine-tuned for downstream tasks, or in a way obeying orders. The problem would arise when an agent is given an ord... (read more)