brambleboy

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

I think some long tasks are like a long list of steps that only require the output of the most recent step, and so they don't really need long context. AI improves at those just by becoming more reliable and making fewer catastrophic mistakes. On the other hand, some tasks need the AI to remember and learn from everything it's done so far, and that's where it struggles- see how Claude Plays Pokémon gets stuck in loops and has to relearn things dozens of times.

Claude finally made it to Cerulean after the "Critique Claude" component correctly identified that it was stuck in a loop, and decided to go through Mt. Moon. (I think Critique Claude is prompted specifically to stop loops.)

I'm glad you shared this, it's quite interesting. I don't think I've ever had something like that happen to me and if it did I'd be concerned, but I could believe that it's prevalent and normal for some people.

I don't think your truth machine would work because you misunderstand what makes LLMs hallucinate. Predicting what a maximum-knowledge author would write induces more hallucinations, not less. For example, say you prompted your LLM to predict text supposedly written by an omniscient oracle, and then asked "How many fingers am I holding behind my back?" The LLM would predict an answer like "three" or something, because an omniscient person would know that, even though it's probably not true.

In other words, you'd want the system to believe "this writer I'm predicting knows exactly what I do, no more, no less", not "this writer knows way more than me". Read Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? for evidence of this.

What would work even better would be for the system to simply be Writing instead of Predicting What Someone Wrote, but nobody's done that yet. (because it's hard)

I've been trying to put all my long-form reading material in one place myself, and found a brand-new service called Reader which is designed specifically for this purpose. It has support for RSS, Newsletters, YouTube transcripts, and other stuff. $10 annually / $13 monthly.

Thanks for responding.

I agree with what you're saying; I think you'd want to maintain your reward stream at least partially. However, the main point I'm trying to make is that in this hypothetical, it seems like you'd no longer be able to think of your reward stream as grounding out your values. Instead it's the other way around: you're using your values to dictate the reward stream. This happens in real life sometimes, when we try to make things we value more rewarding.

You'd end up keeping your values, I think, because your beliefs about what you value don't go away, and your behaviors that put them into practice don't immediately go away either, and through those your values are maintained (at least somewhat).

If you can still have values without reward signals that tell you about them, then doesn't that mean your values are defined by more than just what the "screen" shows? That even if you could see and understand every part of someone's reward system, you still wouldn't know everything about their values?

This conception of values raises some interesting questions for me.

Here's a thought experiment: imagine your brain loses all of its reward signals. You're in a depression-like state where you no longer feel disgust, excitement, or anything. However, you're given an advanced wireheading controller that lets you easily program rewards back into your brain. With some effort, you could approximately recreate your excitement when solving problems, disgust at the thought of eating bugs, and so on, or you could create brand-new responses. My questions:

  • What would you actually do in this situation? What "should" you do?
  • Does this cause the model of your values to break down? How can you treat your reward stream as evidence of anything if you made it? Is there anything to learn about the squirgle if you made the video of it?

My intuition says that life does not become pointless, now that you're the author of your reward stream. This suggests the values might be fictional, but the reward signals aren't the one true sourcein the same way that Harry Potter could live on even if all the books were lost.

brambleboy2715

While I don't have specifics either, my impression of ML research is that it's a lot of work to get a novel idea working, even if the idea is simple. If you're trying to implement your own idea, you'll be banging your head against the wall for weeks or months wondering why your loss is worse than the baseline. If you try to replicate a promising-sounding paper, you'll bang your head against the wall as your loss is worse than the baseline. It's hard to tell if you made a subtle error in your implementation or if the idea simply doesn't work for reasons you don't understand because ML has little in the way of theoretical backing. Even when it works it won't be optimized, so you need engineers to improve the performance and make it stable when training at scale. If you want to ship a working product quickly then it's best to choose what's tried and true.

At the start of my Ph.D. 6 months ago, I was generally wedded to writing "good code". The kind of "good code" you learn in school and standard software engineering these days: object oriented, DRY, extensible, well-commented, and unit tested.

I think you'd like Casey Muratori's advice. He's a software dev who argues that "clean code" as taught is actually bad, and that the way to write good code efficiently is more like the way you did it intuitively before you were taught OOP and stuff. He advises "Semantic Compression" instead- essentially you just straightforwardly write code that works, then pull out and reuse the parts that get repeated.

Load More