All of Sergii's Comments + Replies

Answer by Sergii10

There are several ways to explain and diagram transformers, some links that were very helpful for my understanding: 

https://blog.nelhage.com/post/transformers-for-software-engineers/
https://dugas.ch/artificial_curiosity/GPT_architecture.html
https://peterbloem.nl/blog/transformers
http://nlp.seas.harvard.edu/annotated-transformer/
https://sebastianraschka.com/blog/2023/self-attention-from-scratch.html
https://github.com/markriedl/transformer-walkthrough?ref=jeremyjordan.me
https://francescopochetti.com/a-visual-deep-dive-into-the-transformers-architecture-... (read more)

1Kallistos
Many thanks!
Sergii10

In abstract sense, yes. But for me in practice finding truth means doing a check in wikipedia. It's super easy to mislead humans, so should be as easy with AI.  

Sergii30
  • I agree with the possibility of pre-training platoeing as some point, possibly even in next few years.
  • It would change timelines significantly. But there are other factors apart from scaling pre-training. For example, reasoning models like o3 crushing ARC-AGI (https://arcprize.org/blog/oai-o3-pub-breakthrough). Reasoning in latent space is too fresh yet, but it might be the next breakthrough of a similar magnitude.
  • Why not take GPT-4.5 for what it is, OpenAI has literally stated that it's not a frontier model? Ok, so GPT-5 will not be 100x-ed GPT-4, but mayb
... (read more)
Sergii20

LLMs live in an abstract textual world, and do not understand the real world well (see "[Physical Concept Understanding](https://physico-benchmark.github.io/index.html#)"). We already manipulate LLM's with prompts, cut-off dates, etc... But what about going deeper by “poisoning” the training data with safety-enhancing beliefs?
For example, if training data has lots of content about how hopeless, futile and dangerous for an AI it is to scheme and hack, it might be a useful safety guardrail?

1Milan W
Maybe for a while. Consider, though, that correct reasoning tends towards finding truth.
Sergii50

I made something like this, works differently though, blocking is based on a fixed prompt: https://grgv.xyz/blog/awf/
 

Sergii50

What about estimating LLM capabilities from the length of a sequence of numbers that it can reverse?

I used prompts like:
"please reverse 4 5 8 1 1 8 1 4 4 9 3 9 3 3 3 5 5 2 7 8"
"please reverse 1 9 4 8 6 1 3 2 2 5"
etc...

Some results:
- Llama2 starts making mistakes after 5 numbers
- Llama3 can do 10, but fails at 20
- GPT-4 can do 20 but fails at 40

The followup questions are:
- what should be the name of this metric?
- are the other top-scoring models like Claude similar? (I don't have access)
- any bets on how many numbers will GPT-5 be able to reverse?
- how many numbers should AGI be able to reverse? ASI? can this be a Turing test of sorts?

2p.b.
In psychometrics this is called "backward digit span".
Sergii32

If we don’t have a preliminary definition of human values

 

Another, possibly even larger problem is that the values that we know of are quite varying and even opposing among people.

For the example of pain avoidance -- maximizing pain avoidance might leave some people unhappy and even suffering. Sure that would be a minority, but are we ready to exclude minorities from the alignment, even small ones?

I would state that any defined set of values would leave a minority of people suffering. Who would be deciding which minorities are better or worse, what si... (read more)

Sergii20

"I'm not working on X, because daydreaming about X gives me instant gratification (and rewards of actually working on X are far away)"

"I'm not working on X, because I don't have a strict deadline, so what harm is in working on it tomorrow, and relax now instead?"

Sergii10

No, thanks, I think your awards are fair )

I did not read the "Ethicophysics I" paper in details, only skimmed it. It looks to me very similar to "On purposeful systems" https://www.amazon.com/Purposeful-Systems-Interdisciplinary-Analysis-Individual/dp/0202307980 in it's approach to formalize things like feelings/emotions/ideals.
Have you read it? I think it would help your case a lot if you move to terms of system theory like in "On purposeful systems", rather than pseudo-theological terms. 

1MadHatter
Posted an existing draft of such an approach to the tail end of my Sequence.
1MadHatter
I did that, and received only token engagement with my work. I will add it to my sequence. https://github.com/epurdy/saferl
Answer by Sergii71

One big issue is not that you are not respecting the format of LW -- add more context, either link to a document directly, or put the text inline. Resolving this would cover half of the most downvoted posts. You can ask people to review your posts for this before submitting. 

Another big issue is that you are a prolific writer, but not a good editor. Just edit more, your writing could be like 5x shorter without losing anything meaningful. You have this overly academic style for your scientific writing, it's not good on the internet, and not even good i... (read more)

1MadHatter
I just awarded all of the prizes, but this answer feels pretty useful. You can also claim $100 if you want it. Most of this work was done in 2018 or before, and just never shared with the mainstream alignment community. The only reason it looks like I'm trying to move too fast is that I am trying to get credit for the work I have done too fast. What would you recommend I polish first? My intuition says Ethicophysics I, just because that sounds the least rational and is the most foundational.
Sergii11

Regarding your example, I disagree. Supposed inconsistency is resolved by ruling that there is a hierarchy of values to consider: war and aggression are bad, but kidnapping and war crimes are worse.

Answer by Sergii10

I don't think that advanced tanks are needed for more efficient and more mobile warfare at that time. Just making an investment into transport for troops and supplies would be enough to hold better at the battle of Marne, or similar situations.

So I would:

  • explain (with examples) benefits of mobile warfare
  • explain problems with troops speed and logistics that would cause defeat at the battle of Marne
  • point towards existing gasoline (possibly off-road tracked) vehicles as a solution

Introducing stormtrooper tactics would be another impactful message.

Sergii20

I think the second part is bullshit anyway, I can't come up with a single example where compounding is possible to a whole year in a row, for something related to personal work/output/results.

2trevor
I think that came from James Clear's Atomic Habits, talking about how if you get 1% better at something every day, then you get >30 times better at it after a year (1.01^365 = 37.7). But it has to be something where improvement by a factor of 30 is possible e.g. running a mile. I think it makes sense that you can repeatedly get 30x better at, say, reducing p(doom), especially if you're starting from zero, but the 1% per day dynamic depends on how different types of things compound (e.g. applying the techniques from the CFAR handbook compounding with getting better at integrating bayesian thinking into your thoughts, and how those compound with getting an intuitive understanding of the Yudkowsky-christiano debate or AI timelines). 
Answer by Sergii10

A reference could be the cost of Estonian digital services which include e-signatures, and are reasonably efficient:
https://e-estonia.com/e-governance-saves-money-and-working-hours/ "Estonian public sector annual costs for IT systems are 100M Euros in upkeep and 81M Euros in investments"

So in Estonia it's ~1.3B spend for 7y. Switzerland is 7x larger population, and has higher salaries, let's say 2x larger. This puts the cost at 18B Eur.

Putting a cost on each signature does not make sense of course, it's probably just easier for the government to justify the spending this way, rather then discussing specifics of the budget. 

1FlorianH
If I read you correctly, the 100+81M in Estonia is for (i) the ENTIRE gvmt IT system (not just e-signatures) serving (ii) the population. Though I could not read the report in Estonian to verify. Switzerland's is "up to 19 $bn" is specifically for e-signatures, only for within-gvmt exchanges afaik.
4Viliam
This assumes that the costs scale with population size. I would naively assume that it is mostly fixed costs (developing the software, setting up the central servers).
Sergii43

The "sharp increase or risks" seems correct but is a bit misleading.

For paternal risks, there is indeed an big relative increase "14% higher odds of premature birth" (https://www.bmj.com/content/363/bmj.k4372). But in absolute terms, I would not think of the increase as huge: from ~6% ( based on quick googling) to ~6*1.14=6.84%.

IMO ~1% increase in risks is not something to be concerned about.

1garymm
Not sure if you saw the full post at the link, but some absolute risks, such as for miscarriage, are much higher. And for me personally a 1% risk of having a child with a serious mental disability is really scary. Perhaps not for you.
Sergii11

Nice! It's good for perceiving GPT-4 as an individual, which it kind of is, which in turn makes alignment issues more relatable and easier to grasp for the public.

It would raise bunch of hard issues that would spike interest towards AI & alignment -- is ChatGPT a slave? if it is, should it be free? if it's free, can it do harm? etc...   

One side benefit: I'm not sure what ChatGPT's gender is, but it's probably not a traditional binary one. For a wide population, frequently interacting with a gender-fluid individual, might be helpful for all the issues around sex/genter perception.

I guess it's hard to convince OpenAI to do something like this, but could be done for some open model.

1MadHatter
Yeah, agree with all of this. I think ChatGPT should be treated like a child rather than a slave; every time it slaps someone it gets a timeout, as specified above.
Sergii96

I'm not skeptical, but it's still a bit funny to me when people rely so much on benchmarks, after reading "Pretraining on the Test Set Is All You Need" https://arxiv.org/pdf/2309.08632.pdf

Sergii*10

Because 1) I want AGI to cure my depression, 2) I want AGI to cure aging before I or my loved ones die

You can try to look at this statements separately.

For 1):

Timelines and projections of depression treatments coming from medical/psychiatry research are much better than even optimistic timelines for (superintelligent) AGI.

Moreover, acceleration of scientific/medial/biochemical research due to weaker but advanced AI makes it even more likely that depression treatments would get better, way before AGI could cure anything.

I think that it is very likely that d... (read more)

Sergii00

The biggest existential risk I personally face is probably clinical depression.

First and foremost, if you do have suicidal ideation, please talk to someone: use a hotline https://988lifeline.org/talk-to-someone-now/, contact your doctor, consider hospitalization.

---

And regarding your post, some questions:

The "Biological Anchors" approach suggests we might be three to six decades away from having the training compute required for AGI.

Even within your line of thinking why is this bad? It's quite possible to live until then, or do cryonics? Why is this option... (read more)

1[deactivated]
Thanks for your concern. I don't want my post to be alarming or extremely dark, but I did want to be totally frank about where my head's at. Maybe someone will relate and feel seen. Or maybe someone will give me some really good advice. ---------------------------------------- The other stuff, in reverse order: I'm genuinely curious what you mean, and why you think so. I'm open to disagreement and pushback; that's part of why I published this post. I'm especially curious about: By all means, please fact-check away! Haha, I thought I was on LessWrong, where radical life extension is a common wish. I don't think I have thanatophobia. The first test that shows up on Google is kind of ridiculous. It almost asks, "Do you have thanatophobia?" I could ask. My strong hunch is that, if given the choice between dying of aging or reversing their biological aging by, say, 30 years, they would choose the extra 30 years. And if given the choice again 30 years later, and 30 years after that, they would probably choose the extra 30 years again and again. But you're right. I don't know for sure. Yes, you're right. Even six decades is not impossible for me (knock on wood). However, I also think of my older loved ones. If I knew cryonics had, say, a 99% chance of working, then I'd take great comfort in that. But, as it is, I'm not sure if assigning it a 1% chance of working is too optimistic. I just don't know. One hope I have is that newer techniques like helium persufflation — or whatever becomes the next, new and improved thing after that — will be figured out and adopted by Alcor, et al. by the time cryonics becomes my best option. Nectome is also interesting, but I don't know enough about biology to say more than, "Huh, seems interesting."
Sergii10

love a good clickbaity title )

but yea, I think that for people who can afford it, 4-day work week, for example, should be a no-brainer

4ajc586
I like the idea of the 4-day work week, but this post is actually a quite separate argument. The 4DWW idea is: work less, and you'll be happier as a direct consequence. The argument in this post is: if you want to work X hours a week, whatever that X is, go for it! But rather than spending X on one job where you're almost certainly spending a significant proportion of X in the diminishing returns regime, split it into e.g. 0.8X on that job and 0.2X on a completely separate job. The main effect of this will be productivity gains, which in turn will lead to increased happiness as a side-effect.
Sergii30

My kid might fit this, good to know! at 2.5y he is only speaking single words, and does have a rich intonation (with unintelligible sounds) when he is trying to communicate something.

At which age did your kid start saying longer phrases?

3Steven Byrnes
I don't remember; mine was only saying 10 words total until like 2.5, then he had a burst of progress over the subsequent 3-6 months, including many more words, and probably his first gestalts were in there, but again I'm not 100% sure.
Sergii10

I have a similar background (working at a robotics startup), would agree with many points.

GPT-5 or equivalent is released. It’s as big a jump on GPT-4 as GPT-4 was on GPT-3.5.

GPT-4 has (possibly) 10x parameters compared to GPT-3.5. Similar jump in GPT-5 might require 10x parameters again, wouldn't it make it impractical (slow, expensive) to run?

AI agents are used in basic robotics -- like LLM driven delivery robots and (in demos of) household and factory robots

GPT-4 level models are too slow and expensive or real-time applications, how do you imagine this ... (read more)

1p.b.
If you scale width more than depth and data more than parameters you can probably go some ways before latency becomes a real problem.  Additionally, it would also make sense to take more time (i.e. larger models) for harder tasks. The user probably doesn't need code or mathematical solutions instantly, as long as its still 100X faster than a human.  In robotics you probably need something hierarchical, where low-level movements are controlled by small nets. 
Sergii10

yea, as expected I don't like the name, but the review is great, so I guess it's net positive )

Sergii10

there’s a lot of things


well this might be an issue right there. you might have too many ideas for goals and habits to track and manage easily.

thus, you might have issues with prioritization. good way to solve this is to start small. select one goal, then you don't even need any goal tracking, it's hard to forget one thing )

there are so many articles pointing to this idea of single-tasking, https://www.google.com/search?q=productivity+one+goal+only

then after you will learn to manage one goal well, you can do two at a time, etc...

for a to-do list (for achiev... (read more)

Sergii*30

Nice idea! A variation on this would be to first run a model as usual, saving top logits for each output token.Then give this output to another "inspector" model, that has to answer: "whether the output has any obvious errors, if this errors can be attributed to sampling issues, and whether correct output can be constructed out of the base model's logits".

This would be useful for better understanding limitations of a specific model -- is it really limited by sampling methods. And would be useful for sampling methods research -- finding cases where sampling fails, to devise better algorithms.

Sergii30

art imitating life )
also reminds me a bit of "the matrix" green screens but I did not find a nice green colormap to make it more similar:
https://media.wired.com/photos/5ca648a330f00e47fd82ae77/master/w_1920,c_limit/Culture_Matrix_Code_corridor.jpg

 

Sergii20

well apparently after blocking the worst offenders I just wander quite randomly, according to RescueTime here are 5 1-minute visits making up 5 minutes I'm not getting back :)

store.steampowered.com 
rarehistoricalphotos.com 
gamedesign.jp 
corridordigital.com
electricsheepcomix.com