Ngl I did not fully understand this, but to be clear I don't think understanding alignment through the lense of agency is "excessively abstract." In fact I think I'd agree with the implicit default view that it's largely the single most productive lense to look through. My objection to the status quo is that it seems like the scale/ontology/lense/whatever I was describing is getting 0% of the research attention whereas perhaps it should be getting 10 or 20%.
Not sure this analogy works, but if NIH was spending $10B on cancer research, I would (prima facie, as a layperson) want >$0 but probably <$2B spent on looking at cancer as an atomic-scale phenomenon, and maybe some amount at an even lower-scale scale
Note: I'm probably well below median commenter in terms of technical CS/ML understanding. Anyway...
I feel like a missing chunk of research could be described as “seeing DL systems as ‘normal,’ physical things and processes that involve electrons running around inside little bits of (very complex) metal pieces” instead of mega-abstracted “agents.”
The main reason this might be fruitful is that, at least intuitively and to my understanding, failures like “the AI stops just playing chess really well and starts taking over the world to learn how to play c...
One answer to the question for me:
While writing, something close to "how does this 'sound' in my head naturally, when read, in an aesthetic sense?"
I've thought for a while that "writing quality" largely boils down to whether the writer has an intuitively salient and accurate intuition about how the words they're writing come across when read.
Ah late to the party! This was a top-level post aptly titled "Half-baked alignment idea: training to generalize" that didn't get a ton of attention.
...Thanks to Peter Barnett and Justis Mills for feedback on a draft of this post. It was inspired by Eliezer's Lethalities post and Zvi's response.
Central idea: can we train AI to generalize out of distribution?
I'm thinking, for example, of an algorithm like the following:
- Train a GPT-like ML system to predict the next word given a string of text only using, say, grade school-level w
Thank you, Solenoid! The SSC podcast is the only reason I to consume all of posts like Biological Anchors: A Trick That Might Or Might Not Work
Thanks. It's similar in one sense, but (if I'm reading the paper right) a key difference is that in the MAML examples, the ordering of the meta-level and object level training is such that you still wind up optimizing hard for a particular goal. The idea here is that the two types of training function in opposition, as a control system of sorts, such that the meta-level training should make the model perform worse at the narrow type of task it was trained on.
That said, for sure, the types of distribution shift thing is an issue. It seems like this meta-level bias might be less bad than at the object level, but I have no idea.
Inspired by Eliezer's Lethalities post and Zvi's response:
Has there been any research or writing on whether we can train AI to generalize out of distribution?
I'm thinking, for example:
Entirely agree. There are certainly chunks of my life (as a privileged first-worlder) I'd prefer not to have experienced, and these generally these seem less bad than "an average period of the same duration as a Holocaust prisoner." Given that animals are sentient, I'd put it at at ~98% that their lives are net negative.
From my perspective, this is why society at large needs to get better at communicating the content - so you wouldn't have to be good at "anticipating the content."
The meaningfulness point is interesting, but I'm not sure I fully agree. Some topics can me meaningful but not interesting (high frequency trading to donate money) and visa-versa (video game design? No offense to video game designers).
By your description, it feels like the kind of book where an author picks a word and then rambles about it like an impromptu speaker. If this had an extraordinary thesis requiring extraordinary evidence like Manufacturing Consent then lots of anecdotes would make sense. But the thesis seems too vague to be extraordinary.
I get the impression of the kind of book which where a dense blogpost is stretched out to the length of a book. This is ironic for a book about subtraction.
Yup, very well-put.
Your point about anecdotes got me thinking; an "extraordinary the...
I don't think it is operationalizable, but I fail to see why 'net positive mental states' isn't a meaningful, real value. Maybe the units would be apple*minutes or something, where one unit is equivalent to the pleasure you get by eating an apple for one minute. It seems that this could in principle be calculated with full information about everyone's conscious experience.
Thanks for your perspective.
I've never been able to do intellectual work with background music, and am baffled by people e.g. programmers who work with headphones playing music all day. But maybe for them it does just use different parts of the brain
For me, there is a huge qualitative difference between lyrical music or even "interesting" classical and electronic music, and very "boring," quiet lyric-less music. Can't focus at all listening to lyrics, but soft ambient music feels intuitively helpful (though this could be illusory). This is especially the case when its a playlist or song I've heard a hundred times before, so the tune is completely unsurprising.
Yes, I was incorrect about Matuschak's position. He commented on reddit here:
"I think Matuschak would say that, for the purpose of conveying information, it would be much more efficient to read a very short summary than to read an entire book."
...FWIW, I wouldn't say that! Actually, my research for the last couple years has been predicated on the value of embedding focused learning interactions (i.e spaced repetition prompts) into extended narrative. The underlying theory isn't (wasn't!) salience-based, but basically I believe that strong understanding is pro
I don't know much more than you could find searching around r/nootropics, but my sense is that the relationship between diet and cognition is highly personal, so experimentation is warranted. Some do best on keto, others as a vegan, etc. With respect to particular substances, it seems that creatine might have some cognitive benefits, but once again supplementation is highly personal. DHA helps some people and induces depression in others, for example.
Also, inflammation is a common culprit/risk factor for many mental issues, so I'd expect that a generally "...
Yes, you're correct. As others have correctly noted, there is no unambiguous way of determining which effects are "direct" and which are not. However, suppose decriminalization does decrease drug use. My argument emphasizes that we would need to consider the reduction in time spent enjoying drugs as a downside to decriminalization (though I doubt this would outweigh the benefits associated with lower incarceration rates). It seems to me that this point would frequently be neglected.
There is a good amount of this discussion at r/nootropics - of which some is evidence based and some is not. For example, see this post.
Basically agree with this suggestion: broader metrics are more likely to be unbiased over time. Even the electric grid example, though, isn't ideal because we can imagine a future point where going from $0.0001 to $0.000000001 per kilowatt-hour, for example, just isn't relevant.
Total factor productivity and GDP per capita are even better, agreed.
While a cop-out, my best guess is that a mixture of qualitative historical assessments (for example, asking historians, entrepreneurs, and scientists to rank decades by degree of progress) and using a v...
I was thinking the third bullet, though the question of perverse incentives needs fleshing out, which I briefly alluded to at the end of the post:
“Expected consequences”, for example, leaves under-theorized when you should seek out new, relevant information to improve your forecast about some action’s consequences.
My best guess is that this isn't actually an issue, because you have a moral duty to seek out that information, as you know a priori that seeking out such info is net-positive in itself.
Sharing https://earec.net, semantic search for the EA + rationality ecosystem. Not fully up to date, sadly (doesn't have the last month or so of content). The current version is basically a minimal viable product!
On the results page there is also an option to see EA Forum only results which allow you to sort by a weighted combination of karma and semantic similarity thanks to the API.
Unfortunately there's no corresponding system for LessWrong because of (perhaps totally sensible) rate limits (the EA Forum offers a bots site for use cases like t... (read more)