Roman Malov

Bachelor in general and applied physics. AI safety/Agent foundations researcher wannabe.

I love talking to people, and if you are an alignment researcher we will have at least one common topic (but I am very interested in talking about unknown to me topics too!), so I encourage you to book a call with me: https://calendly.com/roman-malov27/new-meeting

Email: roman.malov27@gmail.com
GitHub: https://github.com/RomanMalov
TG channels (in Russian): https://t.me/healwithcomedy, https://t.me/ai_safety_digest

Posts

Sorted by New

3Roman Malov's Shortform

6mo

5Question to LW devs: does LessWrong tries to be facebooky?

15d

12Could we go another route with computers?

19d

4Neuron Activations to CLIP Embeddings: Geometry of Linear Combinations in Latent Space

4mo

11Is "hidden complexity of wishes problem" solved?

5mo

3Roman Malov's Shortform

6mo

25Visual demonstration of Optimizer's curse

7mo

Wikitag Contributions

Comments

Sorted by

Newest

Morpheus's Shortform

Roman Malov17h10

I would also add: if in the group chat your messages spark lots of conversation every time, this chat is underexploited (you are not sending enough/thinking too much).

(Conversely, if your messages are ignored, spend a little bit more time on them)

Morpheus's Shortform

Roman Malov17h10

are very young

Why would that affect the frequency of the responses? How would receivers even deduce the age of the sender?

Roman Malov's Shortform

Roman Malov15d20

Is there something in particular that you think could be more distilled?

What I had in mind is something like a more detailed explanation of recent reward hacking/misalignment results. Like, sure, we have old arguments about reward hacking and misalignment, but what I want is more gears for when particular reward hacking would happen in which model class.

Maybe check MIRI, PIBBBS, ARC (theoretical research), Conjecture check who went to ILIAD.

Those are top-down approaches, where you have an idea and then do research for it, which is, of course, useful, but that's doing more frontier research via expanding surface area. Trying to apply my distillation intuition to them would be like having some overarching theory unifying all approaches, which seems super hard and maybe not even possible. But looking at the intersection of pairs of agendas might prove useful.

Cole Wyeth's Shortform

Roman Malov15d10

What are your timelines?

Roman Malov's Shortform

Roman Malov15d306

It doesn't take 400 years to learn physics and get to the frontier.

But staying on the frontier seems to be a really hard job. Lots of new research comes every day, and scientists struggle to follow it. New research has lots of value while it's hot, and loses it as the field progresses and finds itself a part of general theory (and learning it is a much more worthwhile use of time).

Which does introduce the question: if you are not currently at the cutting edge and actively advancing your field, why follow new research at all? After a bit of time, the field would condense the most important and useful research into neat textbooks and overview articles, and reading them when they appear would be a much more efficient use of time. While you are not at the cutting edge — read condensations of previous works until you get there.

Also, it seems like there is not much of that in the field of alignment. I want there to be more work on unifying (previously frontier) alignment research and more effort to construct paradigms in this preparadigmatic field (but maybe I just haven't looked hard enough).

50 Ideas for Life I Repeatedly Share

Roman Malov18d20

It doesn't matter if you want to dance at your friend's wedding; if you think the wedding would be "better" if more people danced, and you dancing would meaningfully contribute to others being more likely to dance, you should be dancing. You should incorporate the positive externality of the social contagion effect of your actions for most things you do (eg if should you drink alcohol, bike, use Twitter etc.).

Yes! I wish more people adopted FDT/UDT style decision theory. We already (to some extent, and not deliberately) borrow wisdom from timeless decision theories (i.e. “treat others like you would like them to treat you”, “if everybody thought like that the world would be on fire” etc.), but not for the small scale low stakes social situations, and this exactly the point you bring here.

50 Ideas for Life I Repeatedly Share

Roman Malov18d11

I would assume that this would not be categorised as “complaining”. Sure, if something is 2/10 bothering you and there is an easy way to fix it, then it is worth fixing, but I would assume that “being bothered” in that context means “entering the state of dissatisfaction with current affairs”.

Decision Theory but also Ghosts

Roman Malov18d24

I haven't fully thought it out, but there might be some counterargument in the style of anti-Pascals-mugging counterargument, where if your priors say that you might be modeled by a hostile entity, there is an incentive to confuse it, and it's all going to balance out (somehow) and you just need to use your decision theory as if you are real always.

Could we go another route with computers?

Roman Malov18d30

Yes, those are importantly different and do in fact add diversity. They are still made from transistors though.

GTFO of the Social Internet Before you Can't: The Miro & Yindi Story

Roman Malov1mo20

Great post!

It ironically has lot's of youtube links, and when I instinctively clicked on one, I was stopped by LeechBlock plugin I installed to make my youtube-related screentime shorter (due to Rob Miles's advice).