All of elbow921's Comments + Replies

The Edge home page featured an online editorial that downplayed AI art because it just combines images that already exist. If you look closely enough, human artwork is also combinations of things that already existed.

One example is Blackballed Totem Drawing: Roger 'The Rajah' Brown. James Pate drew this charcoal drawing in 2016. It was the Individual Artist Winner of the Governor's Award for the Arts. At the microscopic scale, this artwork is microscopic black particles embedded in a large sheet of paper. I doubt he made the paper he drew on, and the black... (read more)

I like the mantra, "If we choose to give more effort today, then we are sure to go beyond our past mistakes." This mantra is on my desktop screen.

Hedonic Treadmill and the Economy

The hedonic treadmill is when permanent changes to living conditions lead to only temporary increases in happiness. This keeps us always wanting improvements to our lives. We often spend money on the newest Iphones and focus our attention on improving our external circumstances. We ignore the quote:

"What lies before us and what lies behind us are tiny matters compared to what lies within us" 

Some people eat chips to quell their boredom. The hedonic treadmill ensures that, despite improvements in income, people are not ... (read more)

2[comment deleted]

I have an idea for a possible utility function combination method. It basically normalizes based on how much utility is at stake in a random dictatorship. The combined utility function has these nice properties:

Pareto-optimality wrt all input utilities on all lotteries

Adding Pareto-dominated options (threats) does not change players' utilities

Invariance to utility scaling

Invariance to cloning every utility function

Threat resistance

 

The combination method goes like this:

X=list of utility functions to combine

dist(U)=worlds where random utility function ... (read more)

For the examples in this article, for each option only take the monetary value that goes last. log(amount after year)~0.79*log(amount now)+0.79 is the indifference curve. If U(now)=log(amount now), U(year)=(log(amount after year)-0.79)/0.79.

There is a hypothetical example of simulating a ridiculous number of humans typing text and seeing what fraction of those people that type out the current text type out each next token. In the limit, this approaches the best possible text predictor. This would simulate a lot of consciousness.

What if most people would develop superhuman intelligences in their brains without school but, because they have to write essays in school, these superhuman intelligences become aligned with writing essays fast? And no doomsday scenario has happened because they mostly cancel out each others' attempted manipulations and they couldn't program nanobots with their complicated utility functions. ChatGPT writes faster than us and has 20B parameters where humans have 100T parameters, but our neural activations are more noisy than floating-point arithmetic.

This is what I am wondering: Does this algorithm, when run, instantiate a subjective experience with the same moral relevance as the subjective experience that happens when mu opioids are released in biological brains?

‘By 'obvious to the algorithm' I mean that, to the algorithm, A is referenced with no intermediate computation. This is how pleasure and pain feel to me.  I do not believe all reinforcement learning algorithms feel pleasure/pain. A simple example that does not suffer is the Simpleton iterated prisoner’s dilemma strategy. I believe pain and pleasure are effective ways to implement reinforcement learning. In animals, reinforcement learning is called operant conditioning. See Reinforcement learning on a chicken  for a chicken that has experienced it... (read more)

As this algorithm executes, the last and 2last variables become the program's last 2 outputs. L1's even indexes become the average input(reward?) given the number of ones the program outputted the last 2 times. I called L1's odd indexes 'confidence' because, as they get higher, the corresponding average reward changes less based on evidence. When L1 becomes entangled with the input generation process, the algorithm chooses which outputs make the inputs higher on average. That is why I called the input 'reward'. L2 reads off the average reward given the las... (read more)

2Richard_Kennaway
In effect, you're saying that all reinforcement learners experience pleasure and suffering. But how do these algorithms "feel from the inside"? What does it mean for the variable A to be "obvious to the algorithm"? We know how we feel, but how do you determine whether there is anything it is like to be that program? Are railway lines screaming in pain when the wheel flanges rub against them? Does ChatGPT feel sorrow when it apologises on being told that its output was bad? I see no reason to attribute emotional states to any of these things.