I'm studying the impact of overseer failure on RL-based IDA,
because I want to know under what conditions the amplification
increases or decreases the failure rate,
in order to help my reader understand whether we need to
combine capability amplification with explicit reliability
amplification in all cases.
In this project I will:
Take the implementation of iterated distillation and amplification from
Christiano et al.'s ‘Supervising strong learners by amplifying weak
experts’ and adapt it to reinforcement
learning. (It is using supervised learning now.)
Introduce overseer failures and see how they influence the overall failure
rate.
Paul Christiano has been funding this project,
giving me a chance to try my hand at research (again). The next evaluation will
be in December/January. Depending on the progress, the funding and therewith the
project can be discontinued.
Announcing the Farlamp project
Project definition:
In this project I will:
Overseer failures in SupAmp and ReAmp contains a more extensive introduction, as well as an explanation of the relevant terms, concepts etc.
The project repo contains all the public artifacts I have produced so far.
At the moment I'm expanding my ML skills using the book Hands-On Machine Learning with Scikit-Learn & TensorFlow. After this I will start working on the IDA code.
Paul Christiano has been funding this project, giving me a chance to try my hand at research (again). The next evaluation will be in December/January. Depending on the progress, the funding and therewith the project can be discontinued.
I'm also Looking for remote writing partners.