You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Addendum to applicable advice

-8 Elo 16 August 2016 12:59AM

Original post: http://bearlamp.com.au/addendum-to-applicable-advice/
(part 1: http://bearlamp.com.au/applicable-advice/)


If you see advice in the wild and think somethings along the lines of "that can't work for me", that's a cached thought.  It could be a true cached thought or it could be a false one.  Some of these thoughts should be examined thoroughly and defeated.

If you can be any kind of person - being the kind of person that advice works for - is an amazing skill to have.  This is hard.  You need to examine the advice and decide how that advice happened to work, and then you need to modify yourself to make that advice applicable to you.

All too often in this life we think of ourselves as immutable.  And our problems fixed, with the only hope of solving them to find a solution that works for the problem.  I propose it's the other way around.  All too often the solutions are immutable, we are malleable and the problems can be solved by applying known advice and known knowledge in ways that we need to think of and decide on.


Is it really the same problem if the problem isn't actually the problem any more, but rather the problem is a new method of applying a known solution to a known problem?

(what does this mean) Example: Dieting - is an easy example.

This week we have been talking about Calories in/Calories out.  It's pretty obvious that CI/CO is true on a black-box system level.  If food goes (calories in) in and work goes out (calories out - BMR, incidental exercise, purposeful exercise), that is what determines your weight.  Ignoring the fact that drinking a litre of water is a faster way to gain weight than any other way I know of.  And we know that weight is not literally health but a representation of what we consider healthy because it's the easiest way to track how much fat we store on our body (for a normal human who doesn't have massive bulk muscle mass).

CICO makes for terrible advice.  On one level, yes.  To modify the weight of our black box, we need to modify the weight going in and the weight going out so that it's not in the same feedback loop as it was (the one that caused the box to be fat).  On one level CICO is exactly all the advice you need to change the weight of a black box (or a spherical cow in a vacuum).  

On the level of human systems: People are not spherical cows in a vacuum.  Where did spherical cows in a vacuum come from?  It's a parody of what we do in physics.  We simplify a system down to it's basic of parts and generate rules that make sense.  Then we build up to a complicated model and try to find how to apply that rule.  It's why we can work out where projectiles are going to land because we have projectile motion physics (even though often air resistance and wind direction end up changing where our projectile lands, we still have a good guess.  And we later build estimation systems based on using those details for prediction too).  

So CICO is a black-box system, a spherical cow system.  It's wrong.  It's so wrong when you try to apply it to the real world.  But that doesn't matter!  It's significantly better than nothing.  Or the blueberry diet.


The applicable advice of CICO

The point of applicable advice is to look at spherical cows and not say, "I'm no spherical cow!".  Instead think of ways in which you are a spherical cow.  Ways in which the advice is applicable.  Places where - actually if I do eat less, that will improve the progress of my weight loss in cases where my problem is that I eat too much (which I guarantee is relevant for lots of people).  CICO might not be your silver bullet for whatever reason.  It might be grandma, it might be Chocolate bars, It might be really really really delicious steak.  Or dinner with friends.  Or "looking like you are able to eat forever in front of other people".  If you take your problem.  Add in a bit of CICO, and ask, "how can I make this advice applicable to me?".  Today you might make progress on your problem.


And now for some fun from Grognor:  Have you tried solving the problem?


Meta: this took 30mins to write.  All my thoughts were still clear after recently writing part 1, and didn't need any longer to process.

Part 1: http://bearlamp.com.au/applicable-advice/
(part 1 on lesswrong: http://lesswrong.com/r/discussion/lw/nu3/applicable_advice/)

How It Feels to Improve My Rationality

5 SquirrelInHell 18 March 2016 09:59AM

Note: this has started as a comment reply, but I thought it got interesting (and long) enough to deserve its own post.

Important note: this post is likely to spark some extreme reactions, because of how human brains are built. I'm including warnings, so please read this post carefully and in order written or don't read it at all.

I'm going to attempt to describe my subjective experience of progress in rationality.

Important edit: I learned from the responses to this post that there's a group of people which whom this resonates pretty well, and there's also a substantial group with whom it does not at all resonate, to the degree they don't know if what I'm saying even makes sense and is correlated to rationality in any meaningful way. If you find yourself in the second group, please notice that trying to verify if I'm doing "real rationality" or not is not a way to resolve your doubts. There is no reason why you would need to feel the same. It's OK to have different experiences. How you experience things is not a test of your rationality. It's also not a test of my rationality. All in all, because of publishing this and reading the comments, I've found out some interesting stuff about how some clusters of people tend to think about this :)

Also, I need to mention that I am not an advanced rationalist, and my rationality background is mostly reading Eliezer's sequences and self-experimentation.

I'm still going to give this a shot, because I think it's going to be a useful reference for a certain level in rationality progress.

I even expect myself to find all that I write here silly and stupid some time later.

But that's the whole point, isn't it?

What I can say about how rationality feels to me now, is going to be pretty irrelevant pretty soon.

I also expect a significant part of readers to be outraged by it, one way or the other.

If you think this is has no value, maybe try to imagine a rationality-beginner version of you that would find a description such as this useful. If only as a reference that says, yes, there is a difference. No, rationality does not feel like a lot of abstract knowledge that you remember from a book. Yes, it does change you deeply, probably deeper than you suspect.

In case you want to downvote this, please do me a favour and write a private message to me, suggesting how I could change this so that it stops offending you.

Please stop any feeling of wanting to compare yourself to me or anyone else, or to prove anyone's superiority or inferiority.

If you can't do this please bookmark this post and return to it some other time.

...

...

Ready?

So, here we go. If you are free from againstness and competitiveness, please be welcome to read on, and feel free to tell me how this resonates, and how different it feels inside your own head and on your own level.


Part 1. Pastures and fences

Let's imagine a vast landscape, full of vibrant greenery of various sorts.

Now, my visualization of object-level rationality is staking out territories, like small parcels of a pasture surrounded by fences.

Inside of the fences, I tend to gave more of neat grass than anything else. It's never perfect, but when I keep working on an area, it's slowly improving. If neglected, weeds will start growing back sooner or later.

Let's also imagine that the ideas and concepts I generalize as I go about my work become seeds of grass, carried by the wind.

What the work feels like, is that I'm running back and forth between object level (my pastures) and meta-level (scattering seeds).

As result of this running back and forth I'm able to stake new territories, or improve previous ones, to have better coverage and less weeds.

The progress I make in my pastures feeds back into interesting meta-level insights (more seeds carried by the wind), which in turn tend to spread to new areas even when I'm not helping with this process on purpose.

My pastures tend to concentrate in clusters, in areas that I have worked on the most.

When I have lots of action in one area, the large amounts of seeds generated (meta techniques) are more often carried to other places, and at those times I experience the most change happening in other, especially new and unexplored, areas.

However even if I can reuse some of my meta-ideas (seeds), then still to have a nice and clear territory I need to go over there, and put in the manual work of clearing it up.

As I'm getting better and more efficient at this, it becomes less work to gain new territories and improve old ones.

But there's always some amount of manual labor involved.


Part 2. Tells of epistemic high ground

Disclaimer: not using this for the Dark Side requires a considerable amount of self-honesty. I'm only posting this because I believe most of you folks reading this are advanced enough not to shoot yourself in the foot by e.g. using this in arguments.

Note: If you feel the slightest urge to flaunt your rationality level, pause and catch it. (You are welcome.) Please do not start any discussion motivated by this.

So, what clues do I tend to notice when my rationality level is going up, relative to other people?

Important note: This is not the same as "how do I notice if I'm mistaken" or "how do I know if I'm on the right path". These are things I notice after the fact, that I judge to be correlates, but they are not to be used to choose direction in learning or sorting out beliefs. I wrote the list below exactly because it is the less talked about part, and it's fun to notice things. Somehow everyone seems to have thought this is more than I meant it to be.

Edit: check Viliam's comment for some concrete examples that make this list better.

In a particular field:

  • My language becomes more precise. Where others use one word, I now use two, or six.
  • I see more confusion all around.
  • Polarization in my evaluations increases. E.g. two sensible sounding ideas become one great idea and one stupid idea.
  • I start getting strong impulses that tell me to educate people who I now see are clearly confused, and could be saved from their mistake in one minute if I could tell them what I know... (spoiler alert, this doesn't work).

Rationality level in general:

  • I stop having problems in my life that seem to be common all around, and that I used to have in the past.
  • I forget how it is to have certain problems, and I need to remind myself constantly that what seems easy to me is not easy for everyone.
  • Writings of other people move forward on the path from intimidating to insightful to sensible to confused to pitiful.
  • I start to intuitively discriminate between rationality levels of more people above me.
  • Intuitively judging someone's level requires less and less data, from reading a book to reading ten articles to reading one article.

Important note: although I am aware that my mind automatically estimates rationality levels of various people, I very strongly discourage anyone (including myself) from ever publishing such scores/lists/rankings. If you ever have an urge to do this, especially in public, think twice, and then think again, and then shut up. The same applies to ever telling your estimates to the people in question.

Note: Growth mindset!


Now let's briefly return to the post I started out replying to. Gram_Stone suggested that:

You might say that one possible statement of the problem of human rationality is obtaining a complete understanding of the algorithm implicit in the physical structure of our brains that allows us to generate such new and improved rules.

Now after everything I've seen until now, my intuition suggests Gram_Stone's idealized method wouldn't work from inside a human brain.

A generalized meta-technique could become one of the many seeds that help me in my work, or even a very important one that would spread very widely, but it still wouldn't magically turn raw territory into perfect grassland.


Part 3. OK or Cancel?

The closest I've come to Gram_Stone's ideal is when I witnessed a whole cycle of improving in a certain area being executed subconsciously.

It was only brought to my full attention when an already polished solution in verbal form popped into my head when I was taking a shower.

It felt like a popup on a computer screen that had "Cancel" and "OK" buttons, and after I chose OK the rest continued automatically.

After this single short moment, I found a subconscious habit was already in place that ensured changing my previous thought patterns, and it proved to work reliably long after.


That's it! I hope I've left you better off reading this, than not reading this.

Meta-note about my writing agenda: I've developed a few useful (I hope) and unique techniques and ideas for applied rationality, which I don't (yet) know how to share with the community. To get that chunk of data birthed out of me, I need some continued engagement from readers who would give me feedback and generally show interest (this needs to be done slowly and in the right order, so I would have trouble persisting otherwise). So for now I'm writing separate posts noncommittally, to test reactions and (hopefully) gather some folks that could support me in the process of communicating my more developed ideas.

Is Pragmatarianism (Tax Choice) Less Wrong?

-16 Xerographica 12 February 2015 04:47AM

I sure think it is!  But I could be wrong...

This is my first article/post? here and to be honest, I have this website open in another tab and I keep refreshing it to see if I still have enough points to post.  I wish I would have taken a screenshot every time my karma changed.  First it was 0, then it was -1, then it was back to 0, then I think it jumped up to 5.  I thought I was safe but then this morning it was down to 0.  So if this post seems "linky" then it might be because I'm trying to share as much information as I can while my window of opportunity is still open.  

Pragmatarianism (tax choice) is the belief that taxpayers should be able to choose where their taxes go.  Tax choice is the broad concept while pragmatarianism is my own personal spin on it... but sometimes I use "tax choice" when I mean pragmatarianism.  Eh, at this point I don't think it's a big deal.  Really the only thing nice about the word "pragmatarianism" is that it functions as a unique ID... which is extremely helpful when it comes to searches.  Don't have to worry about wading through irrelevant results. 

Here are some links from my blog which should help you decide whether pragmatarianism is more or less wrong...

Pragmatarianism FAQ - a good place to start.  It's pretty short.  

Key concepts - a work in progress.  Some of the concepts are linked to entries which have PDF files with a bunch of relevant quotes and passages.  If you like any of them then please share them in this thread... Quotes Repository.  I shared a few but they didn't fare so well... so I'm guessing that most people here aren't fans of economics... or they aren't fans of my economics. 

Progress as a Function of Freedom - hedging bets, the impossibility of hostile aliens, the problem with "rights".  

What Do Coywolves, Mr. Nobody, Plants And Fungi All Have In Common? - the universal drive to choose the most valuable option, the carrying model as an explanation for our intelligence, a bit on rationality.

Builderism - where better options come from, globalization, debunking Piketty, eliminating poverty. 

My Robin Hanson trilogy...

Is Robin Hanson's Path To Efficient Voting Pragmatic Or Brilliant Or Both? - maybe we should have a civic currency?

Rescuing Robin Hanson From Unmet Demand - how many other people are in the same boat?

Futarchy vs Pragmatarianism - is it logically inconsistent to support one but not the other?  

/trilogy.

AI Box Experiment vs Xero's Rule - my first brainstorm attempt to wrap my mind around the idea of an AI box.

Is A Procreation License Consistent With Libertarianism? - would a procreation license be less wrong?

Why I Love Your Freedom - my critique of the best critique of libertarianism.  A bit on rationality.

So what do you think?  Am I in the right place?  

What else?  Of course I'm an atheist!  And I love sci-fi... and for sure I want to live forever.  The major obstacle is that too many people fail to grasp that progress depends on difference.  I do my best to try and eliminate this obstacle.  Unfortunately I suck at writing and my drawings are even worse.  Oh well.

Let me know if you have any questions.

[Link] Lost and Found

13 [deleted] 01 November 2013 02:26PM

Related: Son of Low Hanging Fruit, Low Hanging Poop

A post by Gregory Cochran's and Henry Harpending's blog West Hunter.

Marcus Terentius Varro  was called the most learned of the Romans.  But what did he know, and how did he know it? I ask because of this quote, from Rerum rusticarum libri III  (Agricultural Topics in Three Books):

“Especial care should be taken, in locating the steading, to place it at the foot of a wooded hill, where there are broad pastures, and so as to be exposed to the most healthful winds that blow in the region. A steading facing the east has the best situation, as it has the shade in summer and the sun in winter. If you are forced to build on the bank of a river, be careful not to let the steading face the river, as it will be extremely cold in winter, and unwholesome in summer. 2 Precautions must also be taken in the neighbourhood of swamps, both for the reasons given, and because there are bred certain minute creatures which cannot be seen by the eyes, which float in the air and enter the body through the mouth and nose and there cause serious diseases.” “What can I do,” asked Fundanius, “to prevent disease if  I should inherit a farm of that kind?” “Even I can answer that question,” replied Agrius; “sell it for the highest cash price; or if you can’t sell it, abandon it.”

I get the distinct impression that someone (probably someone other than Varro) came up with an approximation of germ theory 1500 years before Girolamo Fracastoro.  But his work was lost.

Everybody knows, or should know, that the vast majority of Classical literature has not been preserved.  Those lost works contained facts and ideas that might have value today – certainly there are topics that we understand much better because of insights from Classical literature. For example,  Reich and Patterson find that some of the Indian castes have existed for something like three thousand years:  this is easier to believe when you consider that Megasthenes wrote about the caste system as early as 300 BC.

We don’t put much effort into recovering lost Classical literature.  But there are ways in which we could push harder – by increased funding for work on the Herculaneum scrolls, or the Oxyrhynchus papyri collection, for example.  Some old-fashioned motivated archaeology might get lucky and find another set of Amarna cuneiform letters, or a new Antikythera  mechanism.

The Classic Literature Workshop

2 Ritalin 16 June 2013 09:54AM

From EY's Facebook page, there were two posts that got me thinking about fiction and how to work it better and make it stronger:

It would have been trivial to fix _Revenge of the Sith_'s inadequate motivation of Anakin's dark turn; have Padme already in the hospital slowly dying as her children come to term, not just some nebulous "visions". (Bonus points if you have Yoda lecture Anakin about the inevitability of death, but I'd understand if they didn't go there.) At the end, Anakin doesn't try to choke Padme; he watches the ship with her fly out of his reach, away from his ability to use his unnatural Sith powers to save her. Now Anakin's motives are 320% more sympathetic and the movie makes 170% more sense. If I'd put some serious work in, I'm pretty sure I could've had the movie audience in tears.

I still feel a sense of genuine puzzlement on how such disastrous writing happens in movies and TV shows. Are the viewers who care about this such a tiny percentage that it's not worth trying to sell to them? Are there really so few writers who could read over the script and see in 30 seconds how to fix something like this? (If option 2 is really the problem and people know it's the problem, I'd happily do it for $10,000 a shot.) Is it Graham's Design Paradox - can Hollywood moguls just not tell the difference between competent writers making such an offer, and fakers who'll take the money and run? Are the producers' egos so grotesque that they can't ask a writer for help? Is there some twisted sense of superiority bound up with believing that the audience is too dumb to care about this kind of thing, even though it looks to me like they do? I don't understand how a >$100M movie ends up with flaws that I could fix at the script stage with 30 seconds of advice.

A helpful key to understanding the art and technique of character in storytelling, is to consider the folk-psychological notion from Internal Family Systems of people being composed of different 'parts' embodying different drives or goals. A shallow character is then a character with only one 'part'.

A good rule of thumb is that to create a 3D character, that person must contain at least two different 2D characters who come into conflict. Contrary to the first thought that crosses your mind, three-dimensional good people are constructed by combining at least two different good people with two different ideals, not by combining a good person and a bad person. Deep sympathetic characters have two sympathetic parts in conflict, not a sympathetic part in conflict with an unsympathetic part. Deep smart characters are created by combining at least two different people who are geniuses.

E.g. HPMOR!Hermione contains both a sensible young girl who tries to keep herself and her friends out of trouble, and a starry-eyed heroine, neither of whom are stupid. (Actually, since HPMOR!Hermione is also the one character who I created as close to her canon self as I could manage - she didn't *need* upgrading - I should credit this one to J. K. Rowling.) (Admittedly, I didn't actually follow that rule deliberately to construct Methods, I figured it out afterward when everyone was praising the characterization and I was like, "Wait, people are calling me a character author now? What the hell did I just do right?")

If instead you try to construct a genius character by having an emotionally impoverished 'genius' part in conflict with a warm nongenius part... ugh. Cliche. Don't write the first thing that pops into your head from watching Star Trek. This is not how real geniuses work. HPMOR!Harry, the primary protagonist, contains so many different people he has to give them names, and none of them are stupid, nor does any one of them contain his emotions set aside in a neat jar; they contain different mixtures of emotions and ideals. Combining two cliche characters won't be enough to build a deep character. Combining two different realistic people in that character's situation works much better. Two is not a limit, it's a minimum, but everyone involved still has to be recognizably the same person when combined.

Closely related is Orson Scott Card's observation that a conflict between Good and Evil can be interesting, but it's often not half as interesting as a conflict between Good and Good. All standard rules about cliches still apply, and a conflict between good and good which you've previously read about and to which the reader can already guess your correct approved answer, cannot carry the story. A good rule of thumb is that if you have a conflict between good and good which you feel unsure about yourself, or which you can remember feeling unsure about, or you're not sure where exactly to draw the line, you can build a story around it. I consider the most successful moral conflict in HPMOR to be the argument between Harry and Dumbledore in Ch. 77 because it almost perfectly divided the readers on who was in the right *and* about whose side the author was taking. (*This* was done by deliberately following Orson Scott Card's rule, not by accident. Likewise _Three Worlds Collide_, though it was only afterward that I realized how much of the praise for that story, which I hadn't dreamed would be considered literarily meritful by serious SF writers, stemmed from the sheer rarity of stories built around genuinely open moral arguments. Orson Scott Card: "Propaganda only works when the reader feels like you've been absolutely fair to other side", and writing about a moral dilemma where *you're* still trying to figure out the answer is an excellent way to achieve this.)

Character shallowness can be a symptom of moral shallowness if it reflects a conflict between Good and Evil drawn along lines too clear to bring two good parts of a good character into conflict. This is why it would've been hard for Lord of the Rings to contain conflicted characters without becoming an entirely different story, though as Robin Hanson has just remarked, LotR is a Mileu story, not a Character story. Conflicts between evil and evil are even shallower than conflicts between good and evil, which is why what passes for 'maturity' in some literature is so uninteresting. There's nothing to choose there, no decision to await with bated breath, just an author showing off their disillusionment as a claim of sophistication.

 

I was wondering if we could apply this process to older fiction, Great Literature that is historically praised, and excellent by its own time's standards, but which, if published by a modern author, would seem substandard or inappropriate in one way or another.

Given our community's propensity for challenging sacred cows, and the unique tool-set available to us, I am sure we could take some great works of the past and turn them into awesome works of the present.


Of course, it doesn't have to be a laboratory where we rewrite the whole damn things. Just proprely-grounded suggestions on how to improve this or that work would be great.

 

P.S. This post is itself a work in progress, and will update and improve as comments come. It's been a long time since I've last posted on LW, so advice is quite welcome. Our work is never over.

 

EDIT: Well, I like that this thread has turned out so lively, but I've got finals to prepare for and I can't afford to keep participating in the discussion to my satisfaction. I'll be back in July, and apologize in advance for being such a poor OP. That said, cheers!

[Transcript] Portion of Jürgen Schmidhuber at Singularity Summit 2009

7 [deleted] 10 January 2012 08:05PM

"Compression Progress: The Algorithmic Principle Behind Curiosity and Creativity":

(full transcript with time stamps emailed to Louie Helm)

 

What should an unsupervised intelligent agent, be it a human baby or an artificial agent, what should it do?  How should it deal with data that is streaming in through the input centers in response to the actions that it's executing?

First of all, and this is a very trivial thing to do in principle at least, you should store all the data that is coming in.  You shouldn't throw away any of the data, if you can.  And it makes sense because within a couple of years we will be able to store one hundred years of lifetime at the resolution of a high-definition TV video.  And maybe human brains can also store one hundred years of human lifetime at a rate -- I once made a rough calculation -- comparable to a low-resolution MPEG video.  

So in principle that is not a problem, but with that by itself you cannot do anything.  You have to find regularities in this history of inputs and actions that you store and, in other words, you have to compress it.  You have to compress that history.

Whenever there's a regularity, a symmetry, whatever, then you can write a program that needs less bits than the raw data and still encodes the entire data.  So that's what compression's about.  Now let's define the simplicity or the subjective compressibility or the subjective beauty of some data point X, given some subjective observer O at a given point in his life, T.  And that is just the number of bits you need to encode the incoming data -- the X -- at this point in time with the given limited compression algorithm that you have.  

For example, most of you know a lot about human faces, and that's because you saw so many of these faces.  Now you are carrying around with you some sort of prototype face which allows you to encode new faces in the visual field, but just encoding the deviations from the prototype.   So whenever a new face comes along and it looks very much like the prototype face, then you just need a few extra bits to store that new face.  And your lazy brain likes that because it doesn't want to waste a lot of storage space.  The more the face looks like the prototype face, you could assume the fewer bits you need to encode it, and the prettier in a certain sense you find it.

This is just a word.  We just count the bits we need to store the new incoming data.  For example, a face that is very regular doesn't need a lot of bits to be encoded.  

The important thing is not the compression by itself, but the first derivative of the compressibility.  Because what's really going on is that, as new data is coming in, your compression algorithm improves all the time and becomes a better predictor of the data.  Whatever you can predict, you can compress, because you don't have to store as extra what you already can predict.  

So prediction and compression are almost the same thing, and to the extent that your learning algorithm is improving the predictor such that it becomes a better predictor on the observed data so far, you are saving bits.  You can count this progress in bits you are saving.  That's the only interesting thing which signifies that there's a novel pattern in the inference stream where you still have some learning progress.

So what you're interested in is, what is the interestingness of some data X?  Well, it's not the number of bits that you need to encode the data.  It's the first derivative, the change of the number of bits as your subjective learning algorithm based on your subjective previous knowledge is improving the compressibility.  So you have to count the number of bits that you're saving.

Once you have that in place and you can formally nail it down and implement it in computers and robots, you just need an additional learning algorithm: a reward-optimizing algorithm.  Whenever you save a few bits, it means you have a novel pattern and you count how novel it is by counting how many bits did you save and that's an internal reward signal, an intrinsic motivation.  That's what you want to maximize for the future.  You want your controller that is directing your arms and your actuators to move such that you get additional data from the environment where you can still get additional compression programs of this type, where your compression algorithm can still make this type of progress.

There are many reward-maximizing algorithms and reinforcement learning algorithms that in principle can do this.  This is the basic principle.  I'm going to explain the rest of my talk only [in terms of] how this explains art and science, and whatever.

Again, in discrete time, the formulation without derivatives, if you don't like that.  The simplicity or compressibility -- or beauty, if you want -- of the data is the number of bits you need to encode it given what you already know about the data.  The interestingness of the data is the change in the number of bits.  So you get the data, you learn a little bit on it which means you can now compress it a little bit better.  So the raw data is like that.  The compressed data is like that.  Then you improve the compressor a little bit.  It learns something.  It becomes a better neural network that predicts the data.  And now it takes so many bits, and this is what you save, and that's your internal reward signal because you have a novel pattern which you didn't know yet.  And that's why you find it interesting.  You can just subtract the number of bits you needed before from the number of bits that you need afterwards and there you go.  So that's the reward signal.

Let me give you a very simple example: a robot sitting in a dark room.  The input doesn't change.  It sits there and no matter what it does it's always black, black, black.  So it's extremely compressible input, because it already can predict that very easily because the next frame is exactly like the previous one.  You can totally compress the input and it's totally boring because there is no compression progress because you don't see a pattern that it didn't already know. 

Now let me give you another extreme example which is just the opposite.  Suppose you are sitting in front of a screen with white noise.  There are all these black and white pixels coming with equal probability at you, conveying maximum traditional Shannon information or Boltzmann information.  And still this stream of inputs is totally boring again because, yes, it's very uncompressible.  You cannot find a short pattern and you cannot improve your current description of the signal, which again means that there is no compression progress, so this is also boring.  The only thing that is interesting is stuff like certain types of music which you didn't know yet but which was maybe a little bit similar to what you already knew about music, and whether there was a new little harmony in there which you hadn't heard just in this way, and there you have a little pattern where you save a couple of bits.  That's what motivates you to listen to the same song again.

Again, here we have boring white noise and no internal reward for things like that.  So a discovery in physics for example is just a very large compression improvement.  Suppose you have one million videos of falling apples and they all fall in the same way.  It's always the same way they fall down.  You can extract the rule behind this behavior and it turns out it's a very simple program that describes gravity, essentially.  It's always a very short program that you can use again and again for all these many different videos of falling apples to greatly compress these orange blobs that are falling down.

You cannot compress everything.  There are random fluctuations and noise and whatever that you can't compress, but there is a substantial aspect of the incoming data that you can compress.  And there you can make a lot of compression progress and suddenly save a lot of bits.

The same is true also in the arts.  Suppose there's a guy who figured out a way of drawing Obama with just five lines, such that everybody says, "Hey, that's Obama."  You have an artist who's somehow extracted the essence of the face such that you have the same impression as you're looking at this face as you get when you are looking at a high-resolution photograph with a million pixels.  Somehow there was a compression progress in the artist as he was trying many times to come up with a convincing caricature, and there is a similar thing happening in the observer when he sees that for the first time.

So the scientist and the artist have something in common; they always try to make new data which is compressible in a new, previously unknown way.  A new pattern, a novel pattern means yes, it's compressible, but in a way that I didn't know yet, such that my compressor can make this learning progress and save a couple of bits.