Bachelor in general and applied physics. AI safety researcher wannabe. Interested in agent foundations.
Email: roman.malov27@gmail.com
GitHub: https://github.com/RomanMalov
TG channel (in Russian): https://t.me/healwithcomedy
and
Shouldn't the second singularity be at the point ?
It would be interesting if any of them decided to (instrumentally) stream games (using a vtuber avatar for example) to earn money from donations. They need to figure out how to actually be a good streamer in order for this to work.
Why not just save them to an offline hard drive?
I am a bit confused. If the question is, 'Will this alignment paradigm work with superintelligence?' is the recommendation from the tweet to try it and see if it works?
I meant to imply that we do not have a robot capable of performing tasks of a similar level of difficulty to the 'saving grandma' task, with safety properties comparable to those that a human firefighter can provide when performing 'saving grandma' task.
Thanks for pointing that out, I will adjust the post.
I recently prepared an overview lecture about research directions in AI alignment for the Moscow AI Safety Hub. I had limited time, so I did the following: I reviewed all the sites on the AI safety map, examined the 'research' sections, and attempted to classify the problems they tackle and the research paths they pursue. I encountered difficulties in this process, partly because most sites lack a brief summary of their activities and objectives (Conjecture is one of the counterexamples). I believe that the field of AI safety would greatly benefit from improved communication, and providing a brief summary of a research direction seems like low-hanging fruit.
So, is a random variable in the sense that it is drawn from a distribution of functions, and the expected value of those functions at each point is equal to . Am I understanding you correctly?
I've read it as a part of Agents Foundation course, and I consider this post really effective and clarifying. It got me thinking, can this generalize to other failure modes? Like if programers notice that AI spend too much resources on self-preservation, and then train against such behavior, this failure mode would still arise because self-preservation is an instrumental goal and is a fact about the world and ways in which goal can be achieved in this world.
I'm not a native speaker, can someone please explain the meaning of "Hell is wasted on the evil" in simpler terms?
I think R2→R3 are possible (and smth like this is already being used in complex functions visualizations). Not sure if you could display i.e. 5d hypercube this way (by the same reason there are no R→R function which looks like a square)