Google Deepmind and FHI collaborate to present research at UAI 2016

23 Stuart_Armstrong 09 June 2016 06:08PM

Safely Interruptible Agents

Oxford academics are teaming up with Google DeepMind to make artificial intelligence safer. Laurent Orseau, of Google DeepMind, and Stuart Armstrong, the Alexander Tamas Fellow in Artificial Intelligence and Machine Learning at the Future of Humanity Institute at the University of Oxford, will be presenting their research on reinforcement learning agent interruptibility at UAI 2016. The conference, one of the most prestigious in the field of machine learning, will be held in New York City from June 25-29. The paper which resulted from this collaborative research will be published in the Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI).

Orseau and Armstrong’s research explores a method to ensure that reinforcement learning agents can be repeatedly safely interrupted by human or automatic overseers. This ensures that the agents do not “learn” about these interruptions, and do not take steps to avoid or manipulate the interruptions. When there are control procedures during the training of the agent, we do not want the agent to learn about these procedures, as they will not exist once the agent is on its own. This is useful for agents that have a substantially different training and testing environment (for instance, when training a Martian rover on Earth, shutting it down, replacing it at its initial location and turning it on again when it goes out of bounds—something that may be impossible once alone unsupervised on Mars), for agents not known to be fully trustworthy (such as an automated delivery vehicle, that we do not want to learn to behave differently when watched), or simply for agents that need continual adjustments to their learnt behaviour. In all cases where it makes sense to include an emergency “off” mechanism, it also makes sense to ensure the agent doesn’t learn to plan around that mechanism.

Interruptibility has several advantages as an approach over previous methods of control. As Dr. Armstrong explains, “Interruptibility has applications for many current agents, especially when we need the agent to not learn from specific experiences during training. Many of the naive ideas for accomplishing this—such as deleting certain histories from the training set—change the behaviour of the agent in unfortunate ways.”

In the paper, the researchers provide a formal definition of safe interruptibility, show that some types of agents already have this property, and show that others can be easily modified to gain it. They also demonstrate that even an ideal agent that tends to the optimal behaviour in any computable environment can be made safely interruptible.

These results will have implications in future research directions in AI safety. As the paper says, “Safe interruptibility can be useful to take control of a robot that is misbehaving… take it out of a delicate situation, or even to temporarily use it to achieve a task it did not learn to perform….” As Armstrong explains, “Machine learning is one of the most powerful tools for building AI that has ever existed. But applying it to questions of AI motivations is problematic: just as we humans would not willingly change to an alien system of values, any agent has a natural tendency to avoid changing its current values, even if we want to change or tune them. Interruptibility and the related general idea of corrigibility, allow such changes to happen without the agent trying to resist them or force them. The newness of the field of AI safety means that there is relatively little awareness of these problems in the wider machine learning community.  As with other areas of AI research, DeepMind remains at the cutting edge of this important subfield.”

On the prospect of continuing collaboration in this field with DeepMind, Stuart said, “I personally had a really illuminating time writing this paper—Laurent is a brilliant researcher… I sincerely look forward to productive collaboration with him and other researchers at DeepMind into the future.” The same sentiment is echoed by Laurent, who said, “It was a real pleasure to work with Stuart on this. His creativity and critical thinking as well as his technical skills were essential components to the success of this work. This collaboration is one of the first steps toward AI Safety research, and there’s no doubt FHI and Google DeepMind will work again together to make AI safer.”

For more information, or to schedule an interview, please contact Kyle Scott at fhipa@philosophy.ox.ac.uk

The correct response to uncertainty is *not* half-speed

77 AnnaSalamon 15 January 2016 10:55PM

Related to: Half-assing it with everything you've got; Wasted motionSay it Loud.

Once upon a time (true story), I was on my way to a hotel in a new city.  I knew the hotel was many miles down this long, branchless road.  So I drove for a long while.

After a while, I began to worry I had passed the hotel.

 

 

So, instead of proceeding at 60 miles per hour the way I had been, I continued in the same direction for several more minutes at 30 miles per hour, wondering if I should keep going or turn around.

After a while, I realized: I was being silly!  If the hotel was ahead of me, I'd get there fastest if I kept going 60mph.  And if the hotel was behind me, I'd get there fastest by heading at 60 miles per hour in the other direction.  And if I wasn't going to turn around yet -- if my best bet given the uncertainty was to check N more miles of highway first, before I turned around -- then, again, I'd get there fastest by choosing a value of N, speeding along at 60 miles per hour until my odometer said I'd gone N miles, and then turning around and heading at 60 miles per hour in the opposite direction.  

Either way, fullspeed was best.  My mind had been naively averaging two courses of action -- the thought was something like: "maybe I should go forward, and maybe I should go backward.  So, since I'm uncertain, I should go forward at half-speed!"  But averages don't actually work that way.[1]

Following this, I started noticing lots of hotels in my life (and, perhaps less tactfully, in my friends' lives).  For example:
  • I wasn't sure if I was a good enough writer to write a given doc myself, or if I should try to outsource it.  So, I sat there kind-of-writing it while also fretting about whether the task was correct.
    • (Solution:  Take a minute out to think through heuristics.  Then, either: (1) write the post at full speed; or (2) try to outsource it; or (3) write full force for some fixed time period, and then pause and evaluate.)
  • I wasn't sure (back in early 2012) that CFAR was worthwhile.  So, I kind-of worked on it.
  • An old friend came to my door unexpectedly, and I was tempted to hang out with her, but I also thought I should finish my work.  So I kind-of hung out with her while feeling bad and distracted about my work.
  • A friend of mine, when teaching me math, seems to mumble specifically those words that he doesn't expect me to understand (in a sort of compromise between saying them and not saying them)...
  • Duncan reports that novice Parkour students are unable to safely undertake certain sorts of jumps, because they risk aborting the move mid-stream, after the actual last safe stopping point (apparently kind-of-attempting these jumps is more dangerous than either attempting, or not attempting the jumps)
  • It is said that start-up founders need to be irrationally certain that their startup will succeed, lest they be unable to do more than kind-of work on it...

That is, it seems to me that often there are two different actions that would make sense under two different models, and we are uncertain which model is true... and so we find ourselves taking an intermediate of half-speed action... even when that action makes no sense under any probabilistic mixture of the two models.



You might try looking out for such examples in your life.


[1] Edited to add: The hotel example has received much nitpicking in the comments.  But: (A) the actual example was legit, I think.  Yes, stopping to think has some legitimacy, but driving slowly for a long time because uncertain does not optimize for thinking.  Similarly, it may make sense to drive slowly to stare at the buildings in some contexts... but I was on a very long empty country road, with no buildings anywhere (true historical fact), and also I was not squinting carefully at the scenery.  The thing I needed to do was to execute an efficient search pattern, with a threshold for a future time at which to switch from full-speed in some direction to full-speed in the other.  Also: (B) consider some of the other examples; "kind of working", "kind of hanging out with my friend", etc. seem to be common behaviors that are mostly not all that useful in the usual case.

List of techniques to help you remember names

8 Elo 11 December 2015 12:41AM

Name are very important. Everyone has one; everyone likes to know when you know their name. Everyone knows them to be a part of social interaction. You can't avoid names (well you can, but it gets tricky). In becoming more awesome at names, here is a bunch of suggestions that can help you.

 

The following is an incomplete list of some reasonably good techniques to help you remember names.  Good luck and put them to good use.


0. Everyone can learn to remember names

in a growth mindset sense, stop thinking you can’t.  Stop saying that, everyone is bad at it.  Your 0’th task is to actually try harder than that, if you can’t do that - stop reading.  Face blindness does exist but most of these will help with that.

 

1. Decide that names are important. 

If you don't think they are important then change your mind. They are. Everyone says they are, everyone responds to their name. It’s a fact of life that being able to be communicated with directly by name will be useful.

 

2. Make sure you hear the name clearly the first time, and repeat it till you have it. 

I tend to shake people's hands, then not let go until they tell me their name, and share them mine clearly (sometimes twice).  

 

3. Repeat their name*

Part of 2, but also – if you repeat it (at least once) you have a higher chance of remembering it. Look them in the eye and say their name. "Nice to meet you Bob". Suddenly your brain got a good picture of their face as well as a good cue as to their name.  If you want to supercharge this particular part; “Nice to meet you bob with the hat”, “susan with the glasses”, “john in the dress” works great!

*Repeating a name also has the effect of someone correcting you if you have it wrong.  And if you are in a group - allowing other people to learn or remember a name more easily.

 

4. Associating that name.

Does that name have a meaning as another thing? Mark, Ivy, Jack.

Does that name rhyme with something? Or sound like something? Victoria, IsaBelle, Dusty, Bill, Norris, Jarrod (Jar + Rod), Leopold.

Does someone you already know have that name? Can you make a mental link between this person and the person who's name you already remember. Worst case about being able to remember their name, "oh I have a cousin also called Alexa"-type statements are harmless.

Is the name famous? Luke, Albert, Jesus, Bill, Simba, Bruce, Clark, Edward, Victoria. Any thing that you can connect to this person to hold their name.

 

5. Write it down

Do you have a spare piece of paper? Can you write it down?  I literally carry a notebook and write names down as I hear them.  Usually people compliment me on it if they ever find out.

 

6. Running a script about it

There are naturally lulls in your conversation.  You don’t speak like a wall of text, or if you do you could probably learn to do this over the top. If you take a moment during one of those lulls, while someone else is talking - to look around and take note of if you have forgotten someone's name, do so at 1minute, 5minutes, 10minutes (or where necessary).  Just recite each person’s name in your head.

 

7. The first letter.

There are 26 English letters. If you can't remember – try to remember the first letter. If you get it and it doesn't jog your memory, try use the statement, "your name started with J right?"

 

8. Facebook, LinkedIn, Anki

Use the resources available to you. Check Facebook if you forget! Similarly if people are wearing nametags; test yourself (think – her name is Mary – then check) if you don't remember at all then certainly check. Build an anki deck - I am yet to see a script to make an anki deck from a Facebook friends list but this would be an excellent feature. 

 

9. put that name somewhere.  

It seems to help some people to give the name a box to go in.  “This name goes with the rest of the names of people I am related to”, “this name goes with the box of the rest of my tennis club”, By allocating boxes you can bring back names via the box of names.  (works for some people)

 

10. Mnemonics

I never bothered because with the above list; I don’t need this yet.  Apparently they work excellently. It’s about creating a sensory object in your head that reminds you of the thing you are after, i.e. a person named Rose – imagine a rose on top of her head, that was bright red, and smelt like a rose. Use all senses and make something vivid. You want to remember? Make it vivid and ridiculous.  Yes this works; And yes it’s more effort.  Names are really valuable and worth remembering.


Disclaimer: All of these things work for some of the people some of the time.  You should try the ones you think will work; if they do - excellent, if they don’t - oh well.  keep trying.

Also see: http://lesswrong.com/lw/gx5/boring_advice_repository/8ywe

and this video on name skill: https://www.youtube.com/watch?v=G1_o4oZCEmM

Note: This is also recommended from the book "how to win friends and influence people"


Meta: I wrote this post for a dojo in the Sydney Lesswrong group on the name remembering skills following a lightning talk that I gave in the Melbourne Lesswrong group on the same ideas.

time: 3hrs to write.

To see my other posts - check out my Table of contents

Any suggestions, recommendations or updates please advise below.

MIRI's 2015 Winter Fundraiser!

28 So8res 09 December 2015 07:00PM

MIRI's Winter Fundraising Drive has begun! Our current progress, updated live:

 

Donate Now

 

Like our last fundraiser, this will be a non-matching fundraiser with multiple funding targets our donors can choose between to help shape MIRI’s trajectory. The drive will run until December 31st, and will help support MIRI's research efforts aimed at ensuring that smarter-than-human AI systems have a positive impact.

continue reading »

Why startup founders have mood swings (and why they may have uses)

48 AnnaSalamon 09 December 2015 06:59PM

(This post was collaboratively written together with Duncan Sabien.)

 

Startup founders stereotypically experience some pretty serious mood swings.  One day, their product seems destined to be bigger than Google, and the next, it’s a mess of incoherent, unrealistic nonsense that no one in their right mind would ever pay a dime for.  Many of them spend half of their time full of drive and enthusiasm, and the other half crippled by self-doubt, despair, and guilt.  Often this rollercoaster ride goes on for years before the company either finds its feet or goes under.

 

 

 

 

 

Well, sure, you might say.  Running a startup is stressful.  Stress comes with mood swings.  

 

But that’s not really an explanation—it’s like saying stuff falls when you let it go.  There’s something about the “launching a startup” situation that induces these kinds of mood swings in many people, including plenty who would otherwise be entirely stable.

 

continue reading »

Probabilities Small Enough To Ignore: An attack on Pascal's Mugging

20 Kaj_Sotala 16 September 2015 10:45AM

Summary: the problem with Pascal's Mugging arguments is that, intuitively, some probabilities are just too small to care about. There might be a principled reason for ignoring some probabilities, namely that they violate an implicit assumption behind expected utility theory. This suggests a possible approach for formally defining a "probability small enough to ignore", though there's still a bit of arbitrariness in it.

This post is about finding a way to resolve the paradox inherent in Pascal's Mugging. Note that I'm not talking about the bastardized version of Pascal's Mugging that's gotten popular of late, where it's used to refer to any argument involving low probabilities and huge stakes (e.g. low chance of thwarting unsafe AI vs. astronomical stakes). Neither am I talking specifically about the "mugging" illustration, where a "mugger" shows up to threaten you.

Rather I'm talking about the general decision-theoretic problem, where it makes no difference how low of a probability you put on some deal paying off, because one can always choose a humongous enough payoff to make "make this deal" be the dominating option. This is a problem that needs to be solved in order to build e.g. an AI system that uses expected utility and will behave in a reasonable manner.

Intuition: how Pascal's Mugging breaks implicit assumptions in expected utility theory

Intuitively, the problem with Pascal's Mugging type arguments is that some probabilities are just too low to care about. And we need a way to look at just the probability part component in the expected utility calculation and ignore the utility component, since the core of PM is that the utility can always be arbitrarily increased to overwhelm the low probability. 

Let's look at the concept of expected utility a bit. If you have a 10% chance of getting a dollar each time when you make a deal, and this has an expected value of 0.1, then this is just a different way of saying that if you took the deal ten times, then you would on average have 1 dollar at the end of that deal. 

More generally, it means that if you had the opportunity to make ten different deals that all had the same expected value, then after making all of those, you would on average end up with one dollar. This is the justification for why it makes sense to follow expected value even for unique non-repeating events: because even if that particular event wouldn't repeat, if your general strategy is to accept other bets with the same EV, then you will end up with the same outcome as if you'd taken the same repeating bet many times. And even though you only get the dollar after ten deals on average, if you repeat the trials sufficiently many times, your probability of having the average payout will approach one.

Now consider a Pascal's Mugging scenario. Say someone offers to create 10^100 happy lives in exchange for something, and you assign them a 0.000000000000000000001 probability to them being capable and willing to carry through their promise. Naively, this has an overwhelmingly positive expected value.

But is it really a beneficial trade? Suppose that you could make one deal like this per second, and you expect to live for 60 more years, for about 1,9 billion trades in total. Then, there would be a probability of 0,999999999998 that the deal would never once have paid off for you. Which suggests that the EU calculation's implicit assumption - that you can repeat this often enough for the utility to converge to the expected value - would be violated.

Our first attempt

This suggests an initial way of defining a "probability small enough to be ignored":

1. Define a "probability small enough to be ignored" (PSET, or by slight rearranging of letters, PEST) such that, over your lifetime, the expected times that the event happens will be less than one. 
2. Ignore deals where the probability component of the EU calculation involves a PEST.

Looking at the first attempt in detail

To calculate PEST, we need to know how often we might be offered a deal with such a probability. E.g. a 10% chance for something might be a PEST if we only lived for a short enough time that we could make a deal with a 10% chance once. So, a more precise definition of a PEST might be that it's a probability such that

(amount of deals that you can make in your life that have this probability) * (PEST) < 1

But defining "one" as the minimum times we should expect the event to happen for the probability to not be a PEST feels a little arbitrary. Intuitively, it feels like the threshold should depend on our degree of risk aversion: maybe if we're risk averse, we want to reduce the expected amount of times something happens during our lives to (say) 0,001 before we're ready to ignore it. But part of our motivation was that we wanted a way to ignore the utility part of the calculation: bringing in our degree of risk aversion seems like it might introduce the utility again.

What if redefined risk aversion/neutrality/preference (at least in this context) as how low one would be willing to let the "expected amount of times this might happen" fall before considering a probability a PEST?

Let's use this idea to define an Expected Lifetime Utility:

ELU(S,L,R) = the ELU of a strategy S over a lifetime L is the expected utility you would get if you could make L deals in your life, and were only willing to accept deals with a minimum probability P of at least S, taking into account your risk aversion R and assuming that each deal will pay off approximately P*L times.

ELU example

Suppose that we a have a world where we can take three kinds of actions. 

- Action A takes 1 unit of time and has an expected utility of 2 and probability 1/3 of paying off on any one occasion.
- Action B takes 3 units of time and has an expected utility of 10^(Graham's number) and probability 1/100000000000000 of paying off one any one occasion.
- Action C takes 5 units of time and has an expected utility of 20 and probability 1/100 of paying off on an one occasion.

Assuming that the world's lifetime is fixed at L = 1000 and R = 1:

ELU("always choose A"): we expect A to pay off on ((1000 / 1) * 1/3) = 333 individual occasions, so with R = 1, we deem it acceptable to consider the utility of A. The ELU of this strategy becomes (1000 / 1) * 2 = 2000.

ELU("always choose B"): we expect B to pay off on ((1000 / 3) * 1/100000000000000) = 0.00000000000333 occasions, so with R = 1, we consider the expected utility of B to be 0. The ELU of this strategy thus becomes ((1000 / 3) * 0) = 0.

ELU("always choose C"): we expect C to pay off on ((1000 / 5) * 1/100) = 2 individual occasions, so with R = 1, we consider the expected utility of C to be ((1000 / 5) * 20) = 4000.

Thus, "always choose C" is the best strategy. 

Defining R

Is R something totally arbitrary, or can we determine some more objective criteria for it?

Here's where I'm stuck. Thoughts are welcome. I do know that while setting R = 1 was a convenient example, it's most likely too high, because it would suggest things like not using seat belts.

General thoughts on this approach

An interesting thing about this approach is that the threshold for a PEST becomes dependent on one's expected lifetime. This is surprising at first, but actually makes some intuitive sense. If you're living in a dangerous environment where you might be killed anytime soon, you won't be very interested in speculative low-probability options; rather you want to focus on making sure you survive now. Whereas if you live in a modern-day Western society, you may be willing to invest some amount of effort in weird low-probability high-payoff deals, like cryonics.

On the other hand, whereas investing in that low-probability, high-utility option might not be good for you individually, it could still be a good evolutionary strategy for your genes. You yourself might be very likely to die, but someone else carrying the risk-taking genes might hit big and be very successful in spreading their genes. So it seems like our definition of L, lifetime length, should vary based on what we want: are we looking to implement this strategy just in ourselves, our whole species, or something else? Exactly what are we maximizing over?

Experiences in applying "The Biodeterminist's Guide to Parenting"

64 juliawise 17 July 2015 07:19PM

I'm posting this because LessWrong was very influential on how I viewed parenting, particularly the emphasis on helping one's brain work better. In this context, creating and influencing another person's brain is an awesome responsibility.


It turned out to be a lot more anxiety-provoking than I expected. I don't think that's necessarily a bad thing, as the possibility of screwing up someone's brain should make a parent anxious, but it's something to be aware of. I've heard some blithe "Rational parenting could be a very high-impact activity!" statements from childless LWers who may be interested to hear some experiences in actually applying that.


One thing that really scared me about trying to raise a child with the healthiest-possible brain and body was the possibility that I might not love her if she turned out to not be smart. 15 months in, I'm no longer worried. Evolution has been very successful at producing parents and children that love each other despite their flaws, and our family is no exception. Our daughter Lily seems to be doing fine, but if she turns out to have disabilities or other problems, I'm confident that we'll roll with the punches.

 

Cross-posted from The Whole Sky.

 


Before I got pregnant, I read Scott Alexander's (Yvain's) excellent Biodeterminist's Guide to Parenting and was so excited to have this knowledge. I thought how lucky my child would be to have parents who knew and cared about how to protect her from things that would damage her brain.

Real life, of course, got more complicated. It's one thing to intend to avoid neurotoxins, but another to arrive at the grandparents' house and find they've just had ant poison sprayed. What do you do then?


Here are some tradeoffs Jeff and I have made between things that are good for children in one way but bad in another, or things that are good for children but really difficult or expensive.


Germs and parasites


The hygiene hypothesis states that lack of exposure to germs and parasites increases risk of auto-immune disease. Our pediatrician recommended letting Lily playing in the dirt for this reason.


While exposure to animal dander and pollution increase asthma later in life, it seems that being exposed to these in the first year of life actually protects against asthma. Apparently if you're going to live in a house with roaches, you should do it in the first year or not at all.


Except some stuff in dirt is actually bad for you.


Scott writes:

Parasite-infestedness of an area correlates with national IQ at about r = -0.82. The same is true of US states, with a slightly reduced correlation coefficient of -0.67 (p<0.0001). . . . When an area eliminates parasites (like the US did for malaria and hookworm in the early 1900s) the IQ for the area goes up at about the right time.


Living with cats as a child seems to increase risk of schizophrenia, apparently via toxoplasmosis. But in order to catch toxoplasmosis from a cat, you have to eat its feces during the two weeks after it first becomes infected (which it’s most likely to do by eating birds or rodents carrying the disease). This makes me guess that most kids get it through tasting a handful of cat litter, dirt from the yard, or sand from the sandbox rather than simply through cat ownership. We live with indoor cats who don’t seem to be mousers, so I’m not concerned about them giving anyone toxoplasmosis. If we build Lily a sandbox, we’ll keep it covered when not in use.


The evidence is mixed about whether infections like colds during the first year of life increase or decrease your risk of asthma later. After the newborn period, we defaulted to being pretty casual about germ exposure.


Toxins in buildings


Our experiences with lead. Our experiences with mercury.


In some areas, it’s not that feasible to live in a house with zero lead. We live in Boston, where 87% of the housing was built before lead paint was banned. Even in a new building, we’d need to go far out of town before reaching soil that wasn’t near where a lead-painted building had been.


It is possible to do some renovations without exposing kids to lead. Jeff recently did some demolition of walls with lead paint, very carefully sealed off and cleaned up, while Lily and I spent the day elsewhere. Afterwards her lead level was no higher than it had been.


But Jeff got serious lead poisoning as a toddler while his parents did major renovations on their old house. If I didn’t think I could keep the child away from the dust, I wouldn’t renovate.


Recently a house across the street from us was gutted, with workers throwing debris out the windows and creating big plumes of dust (presumable lead-laden) that blew all down the street. Later I realized I should have called city building inspection services, which would have at least made them carry the debris into the dumpster instead of throwing it from the second story.


Floor varnish releases formaldehyde and other nasties as it cures. We kept Lily out of the house for a few weeks after Jeff redid the floors. We found it worthwhile to pay rent at our previous house in order to not have to live in the new house while this kind of work was happening.

 

Pressure-treated wood was treated with arsenic and chromium until around 2004 in the US. It has a greenish tint, though this may have faded with time. Playing on playsets or decks made of such wood increases children's cancer risk. It should not be used for furniture (I thought this would be obvious, but apparently it wasn't to some of my handyman relatives).


I found it difficult to know how to deal with fresh paint and other fumes in my building at work while I was pregnant. Women of reproductive age have a heightened sense of smell, and many pregnant women have heightened aversion to smells, so you can literally smell things some of your coworkers can’t (or don’t mind). The most critical period of development is during the first trimester, when most women aren’t telling the world they’re pregnant (because it’s also the time when a miscarriage is most likely, and if you do lose the pregnancy you might not want to have to tell the world). During that period, I found it difficult to explain why I was concerned about the fumes from the roofing adhesive being used in our building. I didn’t want to seem like a princess who thought she was too good to work in conditions that everybody else found acceptable. (After I told them I was pregnant, my coworkers were very understanding about such things.)


Food


Recommendations usually focus on what you should eat during pregnancy, but obviously children’s brain development doesn’t stop there. I’ve opted to take precautions with the food Lily and I eat for as long as I’m nursing her.


Claims that pesticide residues are poisoning children scare me, although most scientists seem to think the paper cited is overblown. Other sources say the levels of pesticides in conventionally grown produce are fine. We buy organic produce at home but eat whatever we’re served elsewhere.


I would love to see a study with families randomly selected to receive organic produce for the first 8 years of the kids’ lives, then looking at IQ and hyperactivity. But no one’s going to do that study because of how expensive 8 years of organic produce would be.
The Biodeterminist’s Guide doesn’t mention PCBs in the section on fish, but fish (particularly farmed salmon) are a major source of these pollutants. They don’t seem to be as bad as mercury, but are neurotoxic. Unfortunately their half-life in the body is around 14 years, so if you have even a vague idea of getting pregnant ever in your life you shouldn’t be eating farmed salmon (or Atlantic/farmed salmon, bluefish, wild striped bass, white and Atlantic croaker, blackback or winter flounder, summer flounder, or blue crab).


I had the best intentions of eating lots of the right kind of high-omega-3, low-pollutant fish during and after pregnancy. Unfortunately, fish was the only food I developed an aversion to. Now that Lily is eating food on her own, we tried several sources of omega-3 and found that kippered herring was the only success. Lesson: it’s hard to predict what foods kids will eat, so keep trying.


In terms of hassle, I underestimated how long I would be “eating for two” in the sense that anything I put in my body ends up in my child’s body. Counting pre-pregnancy (because mercury has a half-life of around 50 days in the body, so sushi you eat before getting pregnant could still affect your child), pregnancy, breastfeeding, and presuming a second pregnancy, I’ll probably spend about 5 solid years feeding another person via my body, sometimes two children at once. That’s a long time in which you have to consider the effect of every medication, every cup of coffee, every glass of wine on your child. There are hardly any medications considered completely safe during pregnancy and lactationmost things are in Category C, meaning there’s some evidence from animal trials that they may be bad for human children.


Fluoride


Too much fluoride is bad for children’s brains. The CDC recently recommended lowering fluoride levels in municipal water (though apparently because of concerns about tooth discoloration more than neurotoxicity). Around the same time, the American Dental Association began recommending the use of fluoride toothpaste as soon as babies have teeth, rather than waiting until they can rinse and spit.


Cavities are actually a serious problem even in baby teeth, because of the pain and possible infection they cause children. Pulling them messes up the alignment of adult teeth. Drilling on children too young to hold still requires full anesthesia, which is dangerous itself.


But Lily isn’t particularly at risk for cavities. 20% of children get a cavity by age six, and they are disproportionately poor, African-American, and particularly Mexican-American children (presumably because of different diet and less ability to afford dentists). 75% of cavities in children under 5 occur in 8% of the population.


We decided to have Lily brush without toothpaste, avoid juice and other sugary drinks, and see the dentist regularly.


Home pesticides


One of the most commonly applied insecticides makes kids less smart. This isn’t too surprising, given that it kills insects by disabling their nervous system. But it’s not something you can observe on a small scale, so it’s not surprising that the exterminator I talked to brushed off my questions with “I’ve never heard of a problem!”


If you get carpenter ants in your house, you basically have to choose between poisoning them or letting them structurally damage the house. We’ve only seen a few so far, but if the problem progresses, we plan to:

1) remove any rotting wood in the yard where they could be nesting

2) have the perimeter of the building sprayed

3) place gel bait in areas kids can’t access

4) only then spray poison inside the house.


If we have mice we’ll plan to use mechanical traps rather than poison.


Flame retardants


Since the 1970s, California required a high degree of flame-resistance from furniture. This basically meant that US manufacturers sprayed flame retardant chemicals on anything made of polyurethane foam, such as sofas, rug pads, nursing pillows, and baby mattresses.

The law recently changed, due to growing acknowledgement that the carcinogenic and neurotoxic chemicals were more dangerous than the fires they were supposed to be preventing. Even firefighters opposed the use of the flame retardants, because when people die in fires it’s usually from smoke inhalation rather than burns, and firefighters don’t want to breathe the smoke from your toxic sofa (which will eventually catch fire even with the flame retardants).


We’ve opted to use furniture from companies that have stopped using flame retardants (like Ikea and others listed here). Apparently futons are okay if they’re stuffed with cotton rather than foam. We also have some pre-1970s furniture that tested clean for flame retardants. You can get foam samples tested for free.


The main vehicle for children ingesting the flame retardants is that it settles into dust on the floor, and children crawl around in the dust. If you don’t want to get rid of your furniture, frequent damp-mopping would probably help.


The standards for mattresses are so stringent that the chemical sprays aren’t generally used, and instead most mattresses are wrapped in a flame-resistant barrier which apparently isn’t toxic. I contacted the companies that made our mattresses and they’re fine.


Ratings for chemical safety of children’s car seats here.


Thoughts on IQ


A lot of people, when I start talking like this, say things like “Well, I lived in a house with lead paint/played with mercury/etc. and I’m still alive.” And yes, I played with mercury as a child, and Jeff is still one of the smartest people I know even after getting acute lead poisoning as a child.

But I do wonder if my mind would work a little better without the mercury exposure, and if Jeff would have had an easier time in school without the hyperactivity (a symptom of lead exposure). Given the choice between a brain that works a little better and one that works a little worse, who wouldn’t choose the one that works better?


We’ll never know how an individual’s nervous system might have been different with a different childhood. But we can see population-level effects. The Environmental Protection Agency, for example, is fine with calculating the expected benefit of making coal plants stop releasing mercury by looking at the expected gains in terms of children’s IQ and increased earnings.


Scott writes:

A 15 to 20 point rise in IQ, which is a little more than you get from supplementing iodine in an iodine-deficient region, is associated with half the chance of living in poverty, going to prison, or being on welfare, and with only one-fifth the chance of dropping out of high-school (“associated with” does not mean “causes”).


Salkever concludes that for each lost IQ point, males experience a 1.93% decrease in lifetime earnings and females experience a 3.23% decrease. If Lily would earn about what I do, saving her one IQ point would save her $1600 a year or $64000 over her career. (And that’s not counting the other benefits she and others will reap from her having a better-functioning mind!) I use that for perspective when making decisions. $64000 would buy a lot of the posh prenatal vitamins that actually contain iodine, or organic food, or alternate housing while we’re fixing up the new house.


Conclusion


There are times when Jeff and I prioritize social relationships over protecting Lily from everything that might harm her physical development. It’s awkward to refuse to go to someone’s house because of the chemicals they use, or to refuse to eat food we’re offered. Social interactions are good for children’s development, and we value those as well as physical safety. And there are times when I’ve had to stop being so careful because I was getting paralyzed by anxiety (literally perched in the rocker with the baby trying not to touch anything after my in-laws scraped lead paint off the outside of the house).


But we also prioritize neurological development more than most parents, and we hope that will have good outcomes for Lily.

Analogical Reasoning and Creativity

25 jacob_cannell 01 July 2015 08:38PM

This article explores analogism and creativity, starting with a detailed investigation into IQ-test style analogy problems and how both the brain and some new artificial neural networks solve them.  Next we analyze concept map formation in the cortex and the role of the hippocampal complex in establishing novel semantic connections: the neural basis of creative insights.  From there we move into learning strategies, and finally conclude with speculations on how a grounded understanding of analogical creative reasoning could be applied towards advancing the art of rationality.


  1. Introduction
  2. Under the Hood
  3. Conceptual Abstractions and Cortical Maps
  4. The Hippocampal Association Engine
  5. Cultivate memetic heterogeneity and heterozygosity
  6. Construct and maintain clean conceptual taxonomies
  7. Conclusion

Introduction

The computer is like a bicycle for the mind.

-- Steve Jobs

The kingdom of heaven is like a mustard seed, the smallest of all seeds, but when it falls on prepared soil, it produces a large plant and becomes a shelter for the birds of the sky.

-- Jesus

Sigmoidal neural networks are like multi-layered logistic regression.

-- various

The threat of superintelligence is like a tribe of sparrows who find a large egg to hatch and raise.  It grows up into a great owl which devours them all.

-- Nick Bostrom (see this video)

Analogical reasoning is one of the key foundational mechanisms underlying human intelligence, and perhaps a key missing ingredient in machine intelligence.  For some - such as Douglas Hofstadter - analogy is the essence of cognition itself.[1] 

Steve Job's bicycle analogy is clever because it encapsulates the whole cybernetic idea of computers as extensions of the nervous system into a single memorable sentence using everyday terms.  

A large chunk of Jesus's known sayings are parables about the 'Kingdom of Heaven': a complex enigmatic concept that he explains indirectly through various analogies, of which the mustard seed is perhaps the most memorable.  It conveys the notions of exponential/sigmoidal growth of ideas and social movements (see also the Parable of the Leaven), while also hinting at greater future purpose.

In a number of fields, including the technical, analogical reasoning is key to creativity: most new insights come from establishing mappings between or with concepts from other fields or domains, or from generalizing existing insights/concepts (which is closely related).  These abilities all depend on deep, wide, and well organized internal conceptual maps.

In a previous post, I presented a high level working hypothesis of the brain as a biological implementation of a universal learning machine, using various familiar computational concepts as analogies to explain brain subsystems.  In my last post, I used the conceptions of unfriendly superintelligence and value alignment as analogies for market mechanism design and the healthcare problem (and vice versa).

A clever analogy is like a sophisticated conceptual compressor that helps maximize knowledge transmission.  Coming up with good novel analogies is hard because it requires compressing a complex large body of knowledge into a succinct message that heavily exploits the recipient's existing knowledgebase.  Due to the deep connections between compression, inference, intelligence and creativity, a deeper investigation of analogical reasoning is useful from a variety of angles.

It is the hard task of coming up with novel analogical connections that can lead to creative insights, but to understand that process we should start first with the mechanics of recognition.

Under the Hood

You can think of the development of IQ tests as a search for simple tests which have high predictive power for g-factor in humans, while being relatively insensitive to specific domain knowledge.  That search process resulted in a number of problem categories, many of which are based on verbal and mathematical analogies.

The image to the right is an example of a simple geometric analogy problem.  As an experiment, start a timer before having a go at it.  For bonus points, attempt to introspect on your mental algorithm.

Solving this problem requires first reducing the images to simpler compact abstract representations.  The first rows of images then become something like sentences describing relations or constraints (Z is to ? as A is to B and C is to D).  The solution to the query sentence can then be found by finding the image which best satisfies the likely analogous relations.

Imagine watching a human subject (such as your previous self) solve this problem while hooked up to a future high resolution brain imaging device.  Viewed in slow motion, you would see the subject move their eyes from location to location through a series of saccades, while various vectors or mental variable maps flowed through their brain modules.  Each fixation lasts about 300ms[2], which gives enough time for one complete feedforward pass through the dorsal vision stream and perhaps one backwards sweep.  

The output of the dorsal stream in inferior temporal cortex (TE on the bottom) results in abstract encodings which end up in working memory buffers in prefrontal cortex.  From there some sort of learned 'mental program' implements the actual analogy evaluations, probably involving several more steps in PFC, cingulate cortex, and various other cortical modules (coordinated by the Basal Ganglia and PFC). Meanwhile the eye frontal fields and various related modules are computing the next saccade decision every 300ms or so.

If we assume that visual parsing requires one fixation on each object and 50ms saccades, this suggests that solving this problem would take a typical brain a minimum of about 4 seconds (and much longer on average).  The minimum estimate assumes - probably unrealistically - that the subject can perform the analogy checks or mental rotations near instantly without any backtracking to help prime working memory.  Of course faster times are also theoretically possible - but not dramatically faster.

These types of visual analogy problems test a wide set of cognitive operations, which by itself can explain much of the correlation with IQ or g-factor: speed and efficiency of neural processing, working memory, module communication, etc.  

However once we lay all of that aside, there remains a core dependency on the ability for conceptual abstraction.  The mapping between these simple visual images and their compact internal encodings is ambiguous, as is the predictive relationship.  Solving these problems requires the ability to find efficient and useful abstractions - a general pattern recognition ability which we can relate to efficient encoding, representation learning, and nonlinear dimension reduction: the very essence of learning in both man and machine[3].

The machine learning perspective can help make these connections more concrete when we look into state of the art programs for IQ tests in general and analogy problems in particular.  Many of the specific problem subtypes used in IQ tests can be solved by relatively simple programs.  In 2003, Sange and Dowe created a simple Perl program (less than 1000 lines of code) that can solve several specific subtypes of common IQ problems[4] - but not analogies.  It scored an IQ of a little over 100, simply by excelling in a few categories and making random guesses for the remaining harder problem types.  Thus its score is highly dependent on the test's particular mix of subproblems, but that is also true for humans to some extent.  

The IQ test sub-problems that remain hard for computers are those that require pattern recognition combined with analogical reasoning and or inductive inference.  Precise mathematical inductive inference is easier for machines, whereas humans excel at natural reasoning - inference problems involving huge numbers of variables that can only be solved by scalable approximations.

For natural language tasks, neural networks have recently been used to learn vector embeddings which map words or sentences to abstract conceptual spaces encoded as vectors (typically of dimensionality 100 to 1000).  Combining word vector embeddings with some new techniques for handling multiple word senses, Wang and Gao et al just recently trained a system that can solve typical verbal reasoning problems from IQ tests (or the GRE) at upper human level - including verbal analogies[5].

The word vector embedding is learned as a component of an ANN trained via backprop on a large corpus of text data - Wikipedia.  This particular model is rather complex: it combines a multi-sense word embedding, a local sliding window prediction objective, task-specific geometric objectives, and relational regularization constraints.  Unlike the recent crop of general linguistic modeling RNNs, this particular system doesn't model full sentence structure or longer term dependencies - as those aren't necessary for answering these specific questions.  Surprisingly all it takes to solve the verbal analogy problems typical of IQ/SAT/GRE style tests are very simple geometric operations in the word vector space - once the appropriate embedding is learned.  

As a trivial example: "Uncle is to Aunt as King is to ?" literally reduces to:

Uncle + X = Aunt, King + X = ?, and thus X = Aunt-Uncle, and:

? = King + (Aunt-Uncle).

The (Aunt-Uncle) expression encapsulates the concept of 'femaleness', which can be combined with any male version of a word to get the female version.  This is perhaps the simplest example, but more complex transformations build on this same principle.  The embedded concept space allows for easy mixing and transforms of memetic sub-features to get new concepts.

Conceptual Abstractions and Cortical Maps

The success of these simplistic geometric transforms operating on word vector embeddings should not come as a huge surprise to one familiar with the structure of the brain.  The brain is extraordinarily slow, so it must learn to solve complex problems via extremely simple and short mental programs operating on huge wide vectors.  Humans (and now convolutional neural networks) can perform complex visual recognition tasks in just 10-15 individual computational steps (150 ms), or 'cortical clock cycles'.  The entire program that you used to solve the earlier visual analogy problem probably took on the order of a few thousand cycles (assuming it took you a few dozen seconds).  Einstein solved general relativity in - very roughly - around 10 billion low level cortical cycles.

The core principle behind word vector embeddings, convolutional neural networks, and the cortex itself is the same: learning to represent the statistical structure of the world by an efficient low complexity linear algebra program (consisting of local matrix vector products and per-element non-linearities).  The local wiring structure within each cortical module is equivalent to a matrix with sparse local connectivity, optimized heavily for wiring and computation such that semantically related concepts cluster close together.

(Concept mapping the cortex, from this research page)

The image above is from the paper "A Continous Semantic Space Describes the Representation of Thousands of Object and Action Categories across the Human Brain" by Huth et al.[5] They used fMRI to record activity across the cortex while subjects watched annotated video clips, and then used that data to find out roughly what types of concepts each voxel of cortex responds to.  It correctly identifies the FFA region as specializing in people-face things and the PPA as specializing in man-made objects and buildings.  A limitation of the above image visualizations is that they don't show response variance or breadth, so the voxel colors are especially misleading for lower level cortical regions that represent generic local features (such as gabor edges in V1).

The power of analogical reasoning depends entirely on the formation of efficient conceptual maps that carve reality at the joints.  The visual pathway learns a conceptual hierarchy that builds up objects from their parts: a series of hierarchical has-a relationships encoded in the connections between V1, V2, V4 and so on.  Meanwhile the semantic clustering within individual cortical maps allows for fast computations of is-a relationships through simple local pooling filters.  

An individual person can be encoded as a specific active subnetwork in the face region, and simple pooling over a local cluster of neurons across the face region can then compute the presence of a face in general.  Smaller local pooling filters with more specific shapes can then compute the presence of a female or male face, and so on - all starting from the full specific feature encoding.  

The pooling filter concept has been extensively studied in the lower levels of the visual system, where 'complex' cells higher up in V1 pool over 'simple' cell features: abstracting away gabor edges at specific positions to get edges OR'd over a range of positions (CNNs use this same technique to gain invariance to small local translations).  

This key semantic organization principle is used throughout the cortex: is-a relations and more general abstractions/invariances are computed through fast local intramodule connections that exploit the physical semantic clustering on the cortical surface, and more complex has-a relations and arbitrary transforms (ex: mapping between an eye centered coordinate basis and a body centered coordinate basis) are computed through intermodule connections (which also exploit physical clustering).

 

The Hippocampal Association Engine

The Hippocampus is a tubular seahorse shaped module located in the center of the brain, to the exterior side of the central structures (basal ganglia, thalamus).  It is the brain's associative database and search engine responsible for storing, retrieving, and consolidating patterns and declarative memories (those which we are consciously aware of and can verbally declare) over long time scales beyond the reach of short term memory in the cortex itself.

A human (or animal) unfortunate enough to suffer complete loss of hippocampal functionality basically loses the ability to form and consolidate new long term episodic and semantic memories.  They also lose more recent memories that have not yet been consolidated down the cortical hierarchy.  In rats and humans, problems in the hippocampal complex can also lead to spatial navigation impairments (forgetting current location or recent path), as the HC is used to compute and retrieve spatial map information associated with current sensory impressions (a specific instance of the HC's more general function).

In terms of module connectivity, the hippocampal complex sits on top of the cortical sensory hierarchy.  It receives inputs from a number of cortical modules, largely in the nearby associative cortex, which collectively provide a summary of the recent sensory stream and overall brain state.  The HC then has several sub circuits which further compress the mental summary into something like a compact key which is then sent into a hetero-auto-associative memory circuit to find suitable matches.  

If a good match is found, it can then cause retrieval: reactivation of the cortical subnetworks that originally formed the memory.  As the hippocampus can't know for sure which memories will be useful in the future, it tends to store everything with emphasis on the recent, perhaps as a sort of slow exponentially fading stream.  Each memory retrieval involves a new decoding and encoding to drive learning in the cortex through distillation/consolidation/retraining (this also helps prevent ontological crisis).  The amygdala is a little cap on the edge of the hippocampus which connects to the various emotion subsystems and helps estimate the importance of current memories for prioritization in the HC.

A very strong retrieval of an episodic memory causes the inner experience of reliving the past (or imagining the future), but more typical weaker retrievals (those which load information into the cortex without overriding much of the existing context) are a crucial component in general higher cognition.

In short the computation that the HC performs is that of dynamic association between the current mental pattern/state loaded into short term memory across the cortex and some previous mental pattern/state.  This is the very essence of creative insight.

Associative recall can be viewed as a type of pattern recognition with the attendant familiar tradeoffs between precision/recall or sensitivity/specificity.  At the extreme of low recall high precision the network is very conservative and risk averse: it only returns high confidence associations, maximizing precision at the expense of recall (few associations found, many potentially useful matches are lost).  At the other extreme is the over-confident crazy network which maximizes recall at the expense of precision (many associations are made, most of which are poor).  This can also be viewed in terms of the exploitation vs exploration tradeoff.

This general analogy or framework - although oversimplified - also provides a useful perspective for understanding both schizotypy and hallucinogenic drugs.  There is a large body of accumulated evidence in the form of use cases or trip reports, with a general consensus that hallucinogens can provide occasional flashes of creative insight at the expense of pushing one farther towards madness.

From a skeptical stance, using hallucinogenic drugs in an attempt to improve the mind is like doing surgery with butter-knives.  Nonetheless, careful exploration of the sanity border can help one understand more on how the mind works from the inside. 

Cannabis in particular is believed - by many of its users - to enhance creativity via occasional flashes of insight.  Most of its main mental effects: time dilation, random associations, memory impairment, spatial navigation impairment, etc appear to involve the hippocampus.  We could explain much of this as a general shift in the precision/recall tradeoff to make the hippocampus less selective.  Mainly that makes the HC just work less effectively, but it also can occasionally lead to atypical creative insights, and appears to elevate some related low level measures such as schizotypy and divergent thinking[7].  The tradeoff is one must be willing to first sift through a pile of low value random associations.

 

Cultivate memetic heterogeneity and heterozygosity

Fluid intelligence is obviously important, but in many endeavors net creativity is even more important.  

Of all the components underlying creativity, improving the efficiency of learning, the quality of knowledge learned, and the organizational efficiency of one's internal cortical maps are probably the most profitable dimensions of improvement: the low hanging fruits.

Our learning process is largely automatic and subconscious : we do not need to teach children how to perceive the world.  But this just means it takes some extra work to analyze the underlying machinery and understand how to best utilize it.

Over long time scales humanity has learned a great deal on how to improve on natural innate learning: education is more or less learning-engineering.  The first obvious lesson from education is the need for curriculum: acquiring concepts in stages of escalating complexity and order-dependency (which of course is already now increasingly a thing in machine learning).

In most competitive creative domains, formal education can only train you up to the starting gate.  This of course is to be expected, for the creation of novel and useful ideas requires uncommon insights.

Memetic evolution is similar to genetic evolution in that novelty comes more from recombination than mutation.  We can draw some additional practical lessons from this analogy: cultivate memetic heterogeneity and heterozygosity.

The first part - cultivate memetic heterogeneity - should be straightforward, but it is worth examining some examples.  If you possess only the same baseline memetic population as your peers, then the chances of your mind evolving truly novel creative combinations are substantially diminished.  You have no edge - your insights are likely to be common.

To illustrate this point, let us consider a few examples:

Geoffrey Hinton is one of the most successful researchers in machine learning - which itself is a diverse field.  He first formally studied psychology, and then artificial intelligence.  His various 200 research publications integrate ideas from statistics, neuroscience and physics.  His work on boltzmann machines and variants in particular imports concepts from statistical physics whole cloth.

Before founding DeepMind (now one of the premier DL research groups in the world), Demis Hassabis studied the brain and hippocampus in particular at the Gatsby Computational Neuroscience Unit, and before that he worked for years in the video game industry after studying computer science.

Before the Annus Mirabilis, Einstein worked at the patent office for four years, during which time he was exposed to a large variety of ideas relating to the transmission of electric signals and electrical-mechanical synchronization of time, core concepts which show up in his later thought experiments.[8]

Creative people also tend to have a diverse social circle of creative friends to share and exchange ideas across fields.

Genetic heterozygosity is the quality of having two different alleles at a gene locus; summed over the organism this leads to a different but related concept of diversity.

Within developing fields of knowledge we often find key questions or subdomains for which there are multiple competing hypotheses or approaches.  Good old fashioned AI vs Connectionism, Ray tracing vs Rasterization, and so on.

In these scenarios, it is almost always better to understand both viewpoints or knowledge clusters - at least to some degree.  Each cluster is likely to have some unique ideas which are useful for understanding the greater truth or at the very least for later recombination.  

This then is memetic heterozygosity.  It invokes the Jain version of the blind men and the elephant.

Construct and maintain clean conceptual taxonomies

Formal education has developed various methods and rituals which have been found to be effective through a long process of experimentation.  Some of these techniques are still quite useful for autodidacts.

When one sets out to learn, it is best to start with a clear goal.  The goal of high school is just to provide a generalist background.  In college one then chooses a major suitable for a particular goal cluster: do you want to become a computer programmer? a physicist? a biologist? etc.  A significant amount of work then goes into structuring a learning curriculum most suitable for these goal types.

Once out of the educational system we all end up creating our own curriculums, whether intentionally or not.  It can be helpful to think strategically as if planning a curriculum to suit one's longer term goals.

For example, about four years ago I decided to learn how the brain works and how AGI could be built in particular.  When starting on this journey, I had a background mainly in computer graphics, simulation, and game related programming.  I decided to focus about equally on mainstream AI, machine learning, computational neuroscience, and the AGI literature.  I quickly discovered that my statistics background was a little weak, so I had to shore that up.  Doing it all over again I may have started with a statistics book.  Instead I started with AI: a modern approach (of course I mostly learn from the online research literature).

Learning works best when it is applied.  Education exploits this principle and it is just as important for autodidactic learning.  The best way to learn many math or programming concepts is learning by doing, where you create reasonable subtasks or subgoals for yourself along the way.  

For general knowledge, application can take the form of writing about what you have learned.  Academics are doing this all the time as they write papers and textbooks, but the same idea applies outside of academia.

In particular a good exercise is to imagine that you need to communicate all that you have learned about the domain.  Imagine that you are writing a textbook or survey paper for example, and then you need to compress all that knowledge into a summary chapter or paper, and then all of that again down into an abstract.  Then actually do write up a summary - at least in the form of a blog post (even if you don't show it to anybody).

The same ideas apply on some level to giving oral presentations or just discussing what you have learned informally - all of which are also features of the academic learning environment.

Early on, your first attempts to distill what you have learned into written form will be ... poor.  But doing this process forces you to attempt to compress what you have learned, and thus it helps encourage the formation of well structured concept maps in the cortex.

A well structured conceptual map can be thought of as a memetic taxonomy.  The point of a taxonomy is to organize all the invariances and 'is-a' relationships between objects so that higher level inferences and transformations can generalize well across categories.  

Explicitly asking questions which probe the conceptual taxonomy can help force said structure to take form.  For example in computer science/programming the question: "what is the greater generalization of this algorithm?" is a powerful tool.

In some domains, it may even be possible to semi-automate or at least guide the creative process using a structured method.

For example consider sci-fi/fantasy genre novels.  Many of the great works have a general analogical structure based on real history ported over into a more exotic setting.  The foundation series uses the model of the fall of the roman empire.  Dune is like Lawrence of Arabia in space.  Stranger in a Strange Land is like the Mormon version of Jesus the space alien, but from Mars instead of Kolob.  A Song of Fire and Ice is partly a fantasy port of the war of the roses.  And so on.

One could probably find some new ideas for novels just by creating and exploring a sufficiently large table of historical events and figures and comparing it to a map of the currently colonized space of ideas.  Obviously having an idea for a novel is just the tiniest tip of the iceberg in the process, but a semi-formal method is interesting nonetheless for brainstorming and applies across domains (others have proposed similar techniques for generating startup ideas, for example).

Conclusion

We are born equipped with sophisticated learning machinery and yet lack innate knowledge on how to use it effectively - for this too we must learn.

The greatest constraint on creative ability is the quality of conceptual maps in the cortex.  Understanding how these maps form doesn't automagically increase creativity, but it does help ground our intuitions and knowledge about learning, and could pave the way for future improved techniques.

In the meantime: cultivate memetic heterogeneity and heterozygosity, create a learning strategy, develop and test your conceptual taxonomy, continuously compress what you learn by writing and summarizing, and find ways to apply what you learn as you go.

Solving sleep: just a toe-dipping

37 Capla 30 June 2015 07:38PM
[For the past few months I’ve been undertaking a mostly independent study of sleep, and looking to build a coherent model of what sleep does and find ways to optimize it. I’d like to write a series of posts outlining my findings and hypotheses. I’m not sure if this is the best venue for such a project, and I’d like to gauge community interest. This first post is a brief overview of one important aspect of sleep, with a few related points of recommendation, to provide some background knowledge.]

 

In the quest to become more effective and productive, sleep is an enormously important process to optimize. Most of us spend (or at least think we should spend) 7.5 to 8.5 hours in bed every night, a third of a 24 hour day. Not sleeping well and not sleeping sufficiently have known and large drawbacks, including decreased attention, greater irritability, depressed immune function, and generally weakened cognitive ability. If you’re looking for more time, either for subjective life-extension, or so that you can get more done in a day, taking steps to sleep most efficiently, so as to not spend more than the required amount of time in bed and to get the full benefit of the rest, is of high value.

Understanding the inner mechanisms of this process, can let us work around them. Sleep, baffling as it is (and it is extremely baffling), is not a black box. Knowing how it works, you can organize your behavior to accommodate the world as it is, just as taking advantage of the principles of aerodynamics, thrust, and lift, enables one to build an airplane.

The most important thing to know about sleep and wakefulness is that it is the result of a dual process: how alert a person feels is determined by two different and opposite functions. The first is termed  the homeostatic sleep drive (also, homeostatic drive, sleep load, sleep pressure, and process S), which is determined solely by how long it has been since an individual has last slept fully. The longer he/she’s been awake, the greater his/her sleep drive. It is the brain's biological need to sleep. Just as sufficient need for calories produces hunger, sufficient sleep-drive produces sleepiness. Sleeping decreases sleep drive, and sleep drive drops faster (when sleeping) then it rises (when awake).

Neuroscience is complicated, but it seems the chemical correlate of sleep drive is the build-up of adenosine in the basal forebrain and this is used as the brain’s internal measure of how badly one needs sleep.1 (Caffeine makes us feel alert by competing with adenosine for bonding sites and thereby inhibiting reuptake.)

This is only half the story, however. Adenosine levels are much higher (and sleep drive correspondingly lower) in the evening, when one has been awake for a while, than in the middle of the night, when one has just slept for several hours. If sleepiness were only determined by sleep drive, you would have a much more fragmented sleep: sleeping several times during the day, and waking up several times during the night. Instead, humans typically stay awake through the day, and sleep through the whole night. This is due to the second influence on wakefulness: the circadian alerting signal.

For most of human history, there was little that could be done at night. Darkness made it much more difficult to hunt or gather than it was during day. Given that the brain requires some fraction of the nychthemeron (meaning a 24-hour period) asleep, it is evolutionarily preferable to concentrate that fraction of of the nychthemeron in the nighttime, freeing the day to do other things. For this reason, there is also a cyclical component to one’s alertness: independent of how long it has been since an individual has slept, there will be times in the nychthemeron when he/she will feel more or less tired.   

Roughly, the circadian alerting signal (also known as process C) counters the sleep-drive, so that as sleep drive builds up during the day, alertness stays constant, and as sleep drive increases over the course of the night, the individual will stay asleep.

The alerting signal is synchronized to circadian rhythms, which are in turn attuned to light exposure. The circadian clock is set so that the alerting signal begins to increase again (after a night of sleep) at the time when the optic nerve is first exposed to light in the morning (or rather, when the the optic nerve has habitually been first exposed to light, since it takes up to a week to reset circadian rhythms), and increases with the sleep drive until about 14 hours later (from the point that the alerting signal started rising).

This is why if you pull an “all-nighter” you might find it difficult to fall asleep during the following day, even if you feel exhausted. Your sleep drive is high, but the alerting signal is triggering wakefulness, which makes it hard to fall asleep.

For unknown reasons, there is a dip in the circadian alerting about 8 hours after the beginning of the cycle. This is why people sometimes experience that “2:30 feeling.” This is also the time at which biphasic cultures typically have an afternoon siesta. This is useful to know, because this is the best time to take a nap if you want to make up sleep missed the night before.

 

http://bonytobombshell.com/wp-content/uploads/2015/05/energy-levels-sleep-drive-alert-chart-1-bony-bombshell.jpg

 

The neurochemistry of the circadian alerting signal is more complex than that of the sleep drive, but one of the key chemicals of process C is melatonin, which is secreted by the pineal gland about 12 hours after the start of the circadian cycle (two hours before habitual bedtime). It is mildly sleep-inducing.

This is why taking melatonin tablets before is recommended by gwern and others. I second this recommendation. Though not FDA-approved, there seem to be little in the way of negative side effects and they make it much easier to fall asleep.

The natural release of melatonin is inhibited by light, and in particular blue light (which is why it is beneficial applications to red-shift the light of their computer screens, like flux or reds.shift, or wear red-tinted goggles, before bed). By limiting light exposure in the late evening you allow natural melatonin secretion, which both stimulates sleep and prevents the circadian clock from shifting (which would make it even more difficult to fall asleep the following night). Recent studies have shown bright screens ant night do demonstrably disrupt sleep.2

The thing that interests me about this fact that alertness is controlled by both process S and process C, is that it may be possible to modulate each of those processes independently. It would be enormously useful to be able to “turn off” the circadian alerting signal on demand, so that a person can fall asleep at any time off the day, to make up sleep loss whenever is convenient. Instead of accommodating circadian rhythms when scheduling, we could adjust the circadian effect to better fit our lives. When you know you’ll need to be awake all night, for instance, you could turn off the alerting signal around midday and sleep until your sleep drive is reset. In fact, is suspect that those people who are able to live successfully on a polyphasic sleep schedule get the benefits by retraining the circadian influence. In the coming posts, I want to outline a few of the possibilities and (significant) problems in that direction. 

continue reading »

Pattern-botching: when you forget you understand

31 malcolmocean 15 June 2015 10:58PM

It’s all too easy to let a false understanding of something replace your actual understanding. Sometimes this is an oversimplification, but it can also take the form of an overcomplication. I have an illuminating story:

Years ago, when I was young and foolish, I found myself in a particular romantic relationship that would later end for epistemic reasons, when I was slightly less young and slightly less foolish. Anyway, this particular girlfriend of mine was very into healthy eating: raw, organic, home-cooked, etc. During her visits my diet would change substantially for a few days. At one point, we got in a tiny fight about something, and in a not-actually-desperate chance to placate her, I semi-jokingly offered: “I’ll go vegetarian!”

“I don’t care,” she said with a sneer.

…and she didn’t. She wasn’t a vegetarian. Duhhh... I knew that. We’d made some ground beef together the day before.

So what was I thinking? Why did I say “I’ll go vegetarian” as an attempt to appeal to her values?

 

(I’ll invite you to take a moment to come up with your own model of why that happened. You don't have to, but it can be helpful for evading hindsight bias of obviousness.)

 

(Got one?)

 

Here's my take: I pattern-matched a bunch of actual preferences she had with a general "healthy-eating" cluster, and then I went and pulled out something random that felt vaguely associated. It's telling, I think, that I don't even explicitly believe that vegetarianism is healthy. But to my pattern-matcher, they go together nicely.

I'm going to call this pattern-botching.† Pattern-botching is when you pattern-match a thing "X", as following a certain model, but then implicit queries to that model return properties that aren't true about X. What makes this different from just having false beliefs is that you know the truth, but you're forgetting to use it because there's a botched model that is easier to use.

†Maybe this already has a name, but I've read a lot of stuff and it feels like a distinct concept to me.

Examples of pattern-botching

So, that's pattern-botching, in a nutshell. Now, examples! We'll start with some simple ones.

Calmness and pretending to be a zen master

In my Againstness Training video, past!me tries a bunch of things to calm down. In the pursuit of "calm", I tried things like...

  • dissociating
  • trying to imitate a zen master
  • speaking really quietly and timidly

None of these are the desired state. The desired state is present, authentic, and can project well while speaking assertively.

But that would require actually being in a different state, which to my brain at the time seemed hard. So my brain constructed a pattern around the target state, and said "what's easy and looks vaguely like this?" and generated the list above. Not as a list, of course! That would be too easy. It generated each one individually as a plausible course of action, which I then tried, and which Val then called me out on.

Personality Types

I'm quite gregarious, extraverted, and generally unflappable by noise and social situations. Many people I know describe themselves as HSPs (Highly Sensitive Persons) or as very introverted, or as "not having a lot of spoons". These concepts are related—or perhaps not related, but at least correlated—but they're not the same. And even if these three terms did all mean the same thing, individual people would still vary in their needs and preferences.

Just this past week, I found myself talking with an HSP friend L, and noting that I didn't really know what her needs were. Like I knew that she was easily startled by loud noises and often found them painful, and that she found motion in her periphery distracting. But beyond that... yeah. So I told her this, in the context of a more general conversation about her HSPness, and I said that I'd like to learn more about her needs.

L responded positively, and suggested we talk about it at some point. I said, "Sure," then added, "though it would be helpful for me to know just this one thing: how would you feel about me asking you about a specific need in the middle of an interaction we're having?"

"I would love that!" she said.

"Great! Then I suspect our future interactions will go more smoothly," I responded. I realized what had happened was that I had conflated L's HSPness with... something else. I'm not exactly sure what, but a preference for indirect communication, perhaps? I have another friend, who is also sometimes short on spoons, who I model as finding that kind of question stressful because it would kind of put them on the spot.

I've only just recently been realizing this, so I suspect that I'm still doing a ton of this pattern-botching with people, that I haven't specifically noticed.

Of course, having clusters makes it easier to have heuristics about what people will do, without knowing them too well. A loose cluster is better than nothing. I think the issue is when we do know the person well, but we're still relying on this cluster-based model of them. It's telling that I was not actually surprised when L said that she would like it if I asked about her needs. On some level I kind of already knew it. But my botched pattern was making me doubt what I knew.

False aversions

CFAR teaches a technique called "Aversion Factoring", in which you try to break down the reasons why you don't do something, and then consider each reason. In some cases, the reasons are sound reasons, so you decide not to try to force yourself to do the thing. If not, then you want to make the reasons go away. There are three types of reasons, with different approaches.

One is for when you have a legitimate issue, and you have to redesign your plan to avert that issue. The second is where the thing you're averse to is real but isn't actually bad, and you can kind of ignore it, or maybe use exposure therapy to get yourself more comfortable with it. The third is... when the outcome would be an issue, but it's not actually a necessary outcome of the thing. As in, it's a fear that's vaguely associated with the thing at hand, but the thing you're afraid of isn't real.

All of these share a structural similarity with pattern-botching, but the third one in particular is a great example. The aversion is generated from a property that the thing you're averse to doesn't actually have. Unlike a miscalibrated aversion (#2 above) it's usually pretty obvious under careful inspection that the fear itself is based on a botched model of the thing you're averse to.

Taking the training wheels off of your model

One other place this structure shows up is in the difference between what something looks like when you're learning it versus what it looks like once you've learned it. Many people learn to ride a bike while actually riding a four-wheeled vehicle: training wheels. I don't think anyone makes the mistake of thinking that the ultimate bike will have training wheels, but in other contexts it's much less obvious.

The remaining three examples look at how pattern-botching shows up in learning contexts, where people implicitly forget that they're only partway there.

Rationality as a way of thinking

CFAR runs 4-day rationality workshops, which currently are evenly split between specific techniques and how to approach things in general. Let's consider what kinds of behaviours spring to mind when someone encounters a problem and asks themselves: "what would be a rational approach to this problem?"

  • someone with a really naïve model, who hasn't actually learned much about applied rationality, might pattern-match "rational" to "hyper-logical", and think "What Would Spock Do?"
  • someone who is somewhat familiar with CFAR and its instructors but who still doesn't know any rationality techniques, might complete the pattern with something that they think of as being archetypal of CFAR-folk: "What Would Anna Salamon Do?"
  • CFAR alumni, especially new ones, might pattern-match "rational" as "using these rationality techniques" and conclude that they need to "goal factor" or "use trigger-action plans"
  • someone who gets rationality would simply apply that particular structure of thinking to their problem

In the case of a bike, we see hundreds of people biking around without training wheels, and so that becomes the obvious example from which we generalize the pattern of "bike". In other learning contexts, though, most people—including, sometimes, the people at the leading edge—are still in the early learning phases, so the training wheels are the rule, not the exception.

So people start thinking that the figurative bikes are supposed to have training wheels.

Incidentally, this can also be the grounds for strawman arguments where detractors of the thing say, "Look at these bikes [with training wheels]! How are you supposed to get anywhere on them?!"

Effective Altruism

We potentially see a similar effect with topics like Effective Altruism. It's a movement that is still in its infancy, which means that nobody has it all figured out. So when trying to answer "How do I be an effective altruist?" our pattern-matchers might pull up a bunch of examples of things that EA-identified people have been commonly observed to do.

  • donating 10% of one's income to a strategically selected charity
  • going to a coding bootcamp and switching careers, in order to Earn to Give
  • starting a new organization to serve an unmet need, or to serve a need more efficiently
  • supporting the Against Malaria Fund

...and this generated list might be helpful for various things, but be wary of thinking that it represents what Effective Altruism is. It's possible—it's almost inevitable—that we don't actually know what the most effective interventions are yet. We will potentially never actually know, but we can expect that in the future we will generally know more than at present. Which means that the current sampling of good EA behaviours likely does not actually even cluster around the ultimate set of behaviours we might expect.

Creating a new (platform for) culture

At my intentional community in Waterloo, we're building a new culture. But that's actually a by-product: our goal isn't to build this particular culture but to build a platform on which many cultures can be built. It's like how as a company you don't just want to be building the product but rather building the company itself, or "the machine that builds the product,” as Foursquare founder Dennis Crowley puts it.

What I started to notice though, is that we started to confused the particular, transitionary culture that we have at our house, with either (a) the particular, target culture, that we're aiming for, or (b) the more abstract range of cultures that will be constructable on our platform.

So from a training wheels perspective, we might totally eradicate words like "should". I did this! It was really helpful. But once I had removed the word from my idiolect, it became unhelpful to still be treating it as being a touchy word. Then I heard my mentor use it, and I remembered that the point of removing the word wasn't to not ever use it, but to train my brain to think without a particular structure that "should" represented.

This shows up on much larger scales too. Val from CFAR was talking about a particular kind of fierceness, "hellfire", that he sees as fundamental and important, and he noted that it seemed to be incompatible with the kind of culture my group is building. I initially agreed with him, which was kind of dissonant for my brain, but then I realized that hellfire was only incompatible with our training culture, not the entire set of cultures that could ultimately be built on our platform. That is, engaging with hellfire would potentially interfere with the learning process, but it's not ultimately proscribed by our culture platform.

Conscious cargo-culting

I think it might be helpful to repeat the definition:

Pattern-botching is you pattern-match a thing "X", as following a certain model, but then but then implicit queries to that model return properties that aren't true about X. What makes this different from just having false beliefs is that you know the truth, but you're forgetting to use it because there's a botched model that is easier to use.

It's kind of like if you were doing a cargo-cult, except you knew how airplanes worked.

(Cross-posted from malcolmocean.com)

View more: Next