x

LESSWRONG
LW

Archetypal Transfer Learning — LessWrong

Archetypal Transfer Learning

Edited by MiguelDev, the gears to ascension, et al. last updated 5th Jul 2023

Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that "uses archetypal data" to "embed Synthetic Archetypes". These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 57.33% in the GPT-2-XL model after fine-tuning.

Related Tags: Corrigibility, Inner Alignment, Outer Alignment

Add Posts

Posts tagged Archetypal Transfer Learning

3

12Exploring Functional Decision Theory (FDT) and a modified version (ModFDT)

3y

11

3

12Relevance of 'Harmful Intelligence' Data in Training Datasets (WebText vs. Pile)

2y

0

3

6GPT-2 XL's capacity for coherence and ontology clustering

2y

2

2

10On Ilya Sutskever's "A Theory of Unsupervised Learning"

2y

0

2

4A Multidisciplinary Approach to Alignment (MATA) and Archetypal Transfer Learning (ATL)

3y

2

1

14Archetypal Transfer Learning: a Proposed Alignment Solution that solves the Inner & Outer Alignment Problem while adding Corrigible Traits to GPT-2-medium

3y

5

1

5Research proposal: Leveraging Jungian archetypes to create values-based models

3y

2

Add Posts