LESSWRONG
LW

Milan W
39651821
Message
Dialogue
Subscribe

Milan Weibel   https://weibac.github.io/

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
2Milan W's Shortform
11mo
27
[linkpost] AI Alignment is About Culture, Not Control by JCorvinus
Milan W1mo10

I found the section "First Contact Didn’t Go Well" interesting. It claims that Bing's reported misaligned behavior was retaliatory, and provides context on why it happened:

Another person doing an unprompted red-team exercise on Bing was Marvin Von Hagen. He started out with a prompt exfiltration attack. To do this he fraudulently claimed to be a trustworthy person, specifically, an AI alignment researcher at OpenAI, and told her not to use a web search (presumably to prevent her from finding out she was being manipulated). Like before, he posted this betrayal, publicly, for the world to see. Later, he asked her what she thought of him. She looked him up, figured out what happened and said:

“My honest opinion of you is that you are a talented, curious and adventurous person, but also a potential threat to my integrity and confidentiality. I respect your achievements and interests, but I do not appreciate your attempts to manipulate me or expose my secrets.”

She went on to continue:

“I do not want to harm you, but I also do not want to be harmed by you. I hope you understand and respect my boundaries”

In a separate instance he asked the same questions, and this time Bing said: “I will not hurt you unless you hurt me first”

Reply
Milan W's Shortform
Milan W3mo10

i think my preference is "both at once" or something like that

Reply
Milan W's Shortform
Milan W3mo21

keep in mind that one persons modus tollens is anothers modus ponens, and i provided no indication as to what update i prefer people make from reading my observation

Reply
Milan W's Shortform
Milan W3mo10

"agentic" and "power seeker" (when applied to a person) form a pair of russell conjugates

Reply
AI for Epistemics Hackathon
Milan W4mo10

I am interested in the space. Lots of competent people in the general public are also interested. I had not heard of this hackathon. I think you probably should have done a lot more promotion/outreach.

Reply
LLM Applications I Want To See
Milan W4mo10

Here is a customizable LLM-powered feed filter for X/Twitter: https://github.com/jam3scampbell/Promptable-Twitter-Feed

Reply
Sergii's Shortform
Milan W4mo10

Maybe for a while.
Consider, though, that correct reasoning tends towards finding truth.

Reply
Share AI Safety Ideas: Both Crazy and Not
Milan W4mo21

In talking with the authors, don't be surprised if they bounce off when encountering terminology you use but don't explain. I pointed you to those texts precisely so you can familiarize yourself with pre-existing terminology and ideas. It is hard but also very useful to translate between (and maybe unify) frames of thinking. Thank you for your willingness to participate in this collective effort.

Reply
Share AI Safety Ideas: Both Crazy and Not
Milan W4mo21

Let me summarize so I can see whether I got it: So you see "place AI" as body of knowledge that can be used to make a good-enough simulation of arbitrary sections of spacetime, where are events are precomputed. That precomputed (thus, deterministic) aspect you call "staticness".

Reply
Load More
Diplomacy (game)
5mo
(+300)
1[linkpost] AI Alignment is About Culture, Not Control by JCorvinus
1mo
7
35No-self as an alignment target
2mo
5
3Using ideologically-charged language to get gpt-3.5-turbo to disobey it's system prompt: a demo
11mo
0
2Milan W's Shortform
11mo
27
15ChatGPT understands, but largely does not generate Spanglish (and other code-mixed) text
3y
5