Sergii

Software engineer from Ukraine, currently living and working in Estonia.
I mainly specialize in computer vison & robotics. https://grgv.xyz/.

Posts

Sorted by New

3Sergii's Shortform

9Task vectors & analogy making in LLMs

2Mechanistic interpretability of LLM analogy-making

11Bird-eye view visualization of LLM activations

65GPT-4 for personal productivity: online distraction blocker

Wikitag Contributions

Comments

Sorted by

Newest

Any mistakes in my understanding of Transformers?

Answer by SergiiMar 22, 202510

There are several ways to explain and diagram transformers, some links that were very helpful for my understanding:

https://blog.nelhage.com/post/transformers-for-software-engineers/
https://dugas.ch/artificial_curiosity/GPT_architecture.html
https://peterbloem.nl/blog/transformers
http://nlp.seas.harvard.edu/annotated-transformer/
https://sebastianraschka.com/blog/2023/self-attention-from-scratch.html
https://github.com/markriedl/transformer-walkthrough?ref=jeremyjordan.me
https://francescopochetti.com/a-visual-deep-dive-into-the-transformers-architecture-turning-karpathys-masterclass-into-pictures/
https://jalammar.github.io/illustrated-transformer/
https://e2eml.school/transformers.html
https://jaykmody.com/blog/attention-intuition/
https://eugeneyan.com/writing/attention/
https://www.jeremyjordan.me/attention/

Sergii's Shortform

Sergii22d10

In abstract sense, yes. But for me in practice finding truth means doing a check in wikipedia. It's super easy to mislead humans, so should be as easy with AI.

A Bear Case: My Predictions Regarding AI Progress

Sergii22d30

I agree with the possibility of pre-training platoeing as some point, possibly even in next few years.
It would change timelines significantly. But there are other factors apart from scaling pre-training. For example, reasoning models like o3 crushing ARC-AGI (https://arcprize.org/blog/oai-o3-pub-breakthrough). Reasoning in latent space is too fresh yet, but it might be the next breakthrough of a similar magnitude.
Why not take GPT-4.5 for what it is, OpenAI has literally stated that it's not a frontier model? Ok, so GPT-5 will not be 100x-ed GPT-4, but maybe GPT-6 will be, and it might be enough for AGI.
You should not look for progress in autonomy/agency in commercial offings like GPT-4.5. At this point OpenAI is focusing on what sells well (better personality and EQ). I think they care less about a path to AGI. Rapid advances towards agency/autonomy are better gauged from academic literature.
I agree that we should not fall for "vibe checks".
But don't bail on benchmarks, many people are working on benchmarks and evals, there is constant progress there, benchmarks are getting more objective and harder to game. Rather than looking at benchmarks that are pushed by OpenAI, it's better to look for cutting-edge ones in academic literature. Evaluating a SOTA model with a benchmark that is few years old does not make sense at this point.

Sergii's Shortform

Sergii23d20

LLMs live in an abstract textual world, and do not understand the real world well (see "[Physical Concept Understanding](https://physico-benchmark.github.io/index.html#)"). We already manipulate LLM's with prompts, cut-off dates, etc... But what about going deeper by “poisoning” the training data with safety-enhancing beliefs?
For example, if training data has lots of content about how hopeless, futile and dangerous for an AI it is to scheme and hack, it might be a useful safety guardrail?

William_S's Shortform

Sergii1mo50

I made something like this, works differently though, blocking is based on a fixed prompt: https://grgv.xyz/blog/awf/

Sergii's Shortform

Sergii1y50

What about estimating LLM capabilities from the length of a sequence of numbers that it can reverse?

I used prompts like:
"please reverse 4 5 8 1 1 8 1 4 4 9 3 9 3 3 3 5 5 2 7 8"
"please reverse 1 9 4 8 6 1 3 2 2 5"
etc...

Some results:
- Llama2 starts making mistakes after 5 numbers
- Llama3 can do 10, but fails at 20
- GPT-4 can do 20 but fails at 40

The followup questions are:
- what should be the name of this metric?
- are the other top-scoring models like Claude similar? (I don't have access)
- any bets on how many numbers will GPT-5 be able to reverse?
- how many numbers should AGI be able to reverse? ASI? can this be a Turing test of sorts?

A case for AI alignment being difficult

Sergii1y32

If we don’t have a preliminary definition of human values

Another, possibly even larger problem is that the values that we know of are quite varying and even opposing among people.

For the example of pain avoidance -- maximizing pain avoidance might leave some people unhappy and even suffering. Sure that would be a minority, but are we ready to exclude minorities from the alignment, even small ones?

I would state that any defined set of values would leave a minority of people suffering. Who would be deciding which minorities are better or worse, what size of a minority is acceptable to leave behind to suffer, etc...?

I think that this makes the whole idea of alignment to some "human values" too ill-defined and incorrect.

One more contradiction -- are human values allowed to change, or are they frozen? I think they might change, as humanity evolves and changes. But then, as AI interacts with the humanity, it can be convincing enough to push the values shift to whatever direction, which might not be a desirable outcome.

People are known to value racial purity and supporting genocide. Given some good convincing rhetoric, we could start supporting paperclip-maximizing just as well.

Human enhancement is one approach.

I like this idea, combined with AI-self-limitation. Suppose that (aligned) AI has to self-limit it's growth so that it's capabilities are always below the capabilities of enhanced humans? This would allow for slow, safe and controllable takeoff.

Is this a good strategy for alignment? What if instead of trying to tame the inherently dangerous fast-taking-off AI, we make it more controllable, by making it self-limiting, with some built in "capability brakes"?

Taboo "procrastination"

Sergii1y20

"I'm not working on X, because daydreaming about X gives me instant gratification (and rewards of actually working on X are far away)"

"I'm not working on X, because I don't have a strict deadline, so what harm is in working on it tomorrow, and relax now instead?"

Stupid Question: Why am I getting consistently downvoted?

Sergii1y10

No, thanks, I think your awards are fair )

I did not read the "Ethicophysics I" paper in details, only skimmed it. It looks to me very similar to "On purposeful systems" https://www.amazon.com/Purposeful-Systems-Interdisciplinary-Analysis-Individual/dp/0202307980 in it's approach to formalize things like feelings/emotions/ideals.
Have you read it? I think it would help your case a lot if you move to terms of system theory like in "On purposeful systems", rather than pseudo-theological terms.

Stupid Question: Why am I getting consistently downvoted?

Answer by SergiiNov 30, 202371

One big issue is not that you are not respecting the format of LW -- add more context, either link to a document directly, or put the text inline. Resolving this would cover half of the most downvoted posts. You can ask people to review your posts for this before submitting.

Another big issue is that you are a prolific writer, but not a good editor. Just edit more, your writing could be like 5x shorter without losing anything meaningful. You have this overly academic style for your scientific writing, it's not good on the internet, and not even good in scientific papers. A good take here: https://archive.is/29hNC

From "The elements of Style": "Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all his sentences short, or that he avoid all detail and treat his subjects only in outline, but that every word tell."

Also, you are trying to move too fast, pursuing too many fronts. Why don't you just focus on one thing for some time, clarify and polish it enough so that people can actually grasp clearly what you mean?