Roman Malov

Bachelor in general and applied physics. AI safety researcher wannabe. Interested in agent foundations.

TG channel (in Russian):

Wiki Contributions


Sorted by

Why not just save them to an offline hard drive?

I am a bit confused. If the question is, 'Will this alignment paradigm work with superintelligence?' is the recommendation from the tweet to try it and see if it works?

I meant to imply that we do not have a robot capable of performing tasks of a similar level of difficulty to the 'saving grandma' task, with safety properties comparable to those that a human firefighter can provide when performing 'saving grandma' task.

Thanks for pointing that out, I will adjust the post.

I recently prepared an overview lecture about research directions in AI alignment for the Moscow AI Safety Hub. I had limited time, so I did the following: I reviewed all the sites on the AI safety map, examined the 'research' sections, and attempted to classify the problems they tackle and the research paths they pursue. I encountered difficulties in this process, partly because most sites lack a brief summary of their activities and objectives (Conjecture is one of the counterexamples). I believe that the field of AI safety would greatly benefit from improved communication, and providing a brief summary of a research direction seems like low-hanging fruit.

So,  is a random variable in the sense that it is drawn from a distribution of functions, and the expected value of those functions at each point  is equal to . Am I understanding you correctly?

I've read it as a part of Agents Foundation course, and I consider this post really effective and clarifying. It got me thinking, can this generalize to other failure modes? Like if programers notice that AI spend too much resources on self-preservation, and then train against such behavior, this failure mode would still arise because self-preservation is an instrumental goal and is a fact about the world and ways in which goal can be achieved in this world.

I'm not a native speaker, can someone please explain the meaning of "Hell is wasted on the evil" in simpler terms?

Load More