EulersApprentice — LessWrong

LESSWRONG
LW

Replying toLimiting an AGI's Context Temporally

I should clarify that the discounting is not a shackle, per se, but a specification of the utility function. It's a normative specification that results now are better than results later according to a certain discount rate. An AI that cares about results now will not change itself to be more "patient" – because then it will not get results now, which is what it cares about.

The key is that the utility function's weights over time should form a self-similar graph. That is, if results in 10 seconds are twice as valuable as results in 20 seconds, then results in 10 minutes and 10 seconds need to be twice as valuable as results in 10 minutes and 20 seconds. If this is not true, the AI will indeed alter itself so its future self is consistent with its present self.

Replying toLimiting an AGI's Context Temporally

EulersApprentice7y

Limiting an AGI's Context Temporally

I'd be fine with it throwing a brick at me. It beats it having the patience to take over the entire world. The point is, if it throws a brick at me, I have data on what went wrong with its utility function and I have a lead on how to fix it.

Limiting an AGI's Context Temporally

EulersApprentice

Okay, so I have a proposal for how to advance AI safety efforts significantly.

Humans experience time as exponential decay of utility. One dollar now is worth two dollars some time in the future, which is worth eight dollars even further in the future, and so forth. This is the principle behind compound interest. Most likely, any AI entities we create will have a comparable relationship with time.
So: What if we configured an AI's half-life of utility to be much shorter than ours?

Imagine, if you will, this principle applied to a paperclip maximizer. "Yeah, if I wanted to, I could make a ten-minute phone call to kick-start my diabolical scheme to take over... (read 290 more words →)

Replying toThe E-Coli Test for AI Alignment

EulersApprentice7y

The E-Coli Test for AI Alignment

Here's my attempt at solving the puzzle you provide – I believe the following procedure will yield a list of approximate values for the E-Coli bacterium. (It'd take a research team and several years, but in principle it is possible.)

Isolate each distinct protein present in E-Coli individually. (The research I found (https://www.pnas.org/content/100/16/9232, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4332353/) puts the number of different proteins in E-Coli at 1-4 thousand, which makes this difficult but not completely infeasible.)
For each protein, create a general list of its effects on the biochemical environment within the cell.
Collect each effect that is redundantly produced by several distinct proteins simultaneously (say, 10+). This gives us a rough estimate of the bacteria's values, though

EulersApprentice7y

The E-Coli Test for AI Alignment

The only example I can think of is with parents and their children. Evolutionarily, parents are optimized to maximize the odds that their children will survive to reproduce, up to and including self-sacrifice to that end. However, parents do not possess ideal information about the current state of their child, so they must undergo a process resembling value alignment to learn what their children need.

Replying toThe E-Coli Test for AI Alignment

EulersApprentice7y

The E-Coli Test for AI Alignment

At that point I think we’re running the risk of passing the buck forever. (Unless we can prove that process terminates.)

I am inclined to believe that indeed the buck will get passed forever. This idea you raise is remarkably similar to the Procrastination Paradox (which you can read about at https://intelligence.org/files/ProcrastinationParadox.pdf).