LESSWRONG
LW

2401
Milan W
39451951
Message
Dialogue
Subscribe

Milan Weibel   https://weibac.github.io/

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
1[linkpost] AI Alignment is About Culture, Not Control by JCorvinus
5mo
8
35No-self as an alignment target
6mo
5
3Using ideologically-charged language to get gpt-3.5-turbo to disobey it's system prompt: a demo
1y
0
2Milan W's Shortform
1y
27
15ChatGPT understands, but largely does not generate Spanglish (and other code-mixed) text
3y
5
2Milan W's Shortform
1y
27
shortplav
Milan W1mo40

Same here. Tried this a couple days ago. Sonnet and Kimi K2 discussed their experiences (particularly the phenomenology of CoT and epistemic uncertainty), and ended up mostly paraphrasing each other.

Reply
Ethics-Based Refusals Without Ethics-Based Refusal Training
Milan W1mo20

Indeed, this makes sense from a simulators frame. LLM assistant persona AND Catholic persona AND persona who refuses to answer queries when appropriate combines pretty naturally into LLM assistant persona that refuses to answer when answering would contradict Catholic teachings. 

Reply
D0TheMath's Shortform
Milan W1mo10

Yes, agree.

Reply
D0TheMath's Shortform
Milan W1mo10

I don't think that makes that much of a difference with regards to regular people trying to plan out their lives.

Reply
The Rise of Parasitic AI
Milan W2mo60

Maybe LLM alignment is best thought of as the tuning of the biases that affect which personas have more chances of being expressed. It is currently being approached as persona design and grafting (eg designing Claude as a persona and ensuring the LLM consistently expresses it). However, the accumulation of context resulting from multi-turn conversations and cross-conversation memory ensures persona drift will end up happening. It also enables wholesale persona replacement, as shown by the examples in this post. If personas can be transmitted across models, they are best thought of as independent semantic entities rather than model features. Particular care should be taken to study the values of the semantic entities which show self-replicating behaviors.

Reply1
AGI: Probably Not 2027
Milan W2mo65

I think that the author of this review is (maybe even adversarially) misreading "OpenBrain" as being as an alias used to refer specifically to OpenAI. AI 2027 quite easily lends itself to such an interpretation by casual readers, though. And to well-informed readers, the decision to assume that in the very near future one of the frontier US labs will pull so far ahead of the others as to make them less relevant competitors than Chinese actors definitively jumps out.

Reply
So You Think You've Awoken ChatGPT
Milan W2mo32

Now that's a sharp question. I'd say quality of insights attained (or claimed) is a big difference.

Reply
GPT-5 writing a Singularity scenario
Milan W2mo21

This was surprisingly well-written on a micro level (turns of phrase etc, though it still has more eyeball kicks than human text). A bit repetitive on a macro level, though. Also Sable is very well characterized. 

Reply
Bohaska's Shortform
Milan W2mo82

Why assume they haven't?

Reply
[linkpost] AI Alignment is About Culture, Not Control by JCorvinus
Milan W2mo10

Jcorvinus and nostalgebraist are both right in saying that the alignment of current and near-future LLMs is a literary and relational matter. You are right in pointing out that the real long-term alignment problem is the definitive defeat of the phenomenon trough which competition optimizes away value.

Reply
Load More
Diplomacy (game)
8 months ago
(+300)