All of Victor Ashioya's Comments + Replies

Image

A very important direction—we are punishing these [dream] machines for doing what they know best. The average user obviously wants to kill these "hallucinations," but the researchers in math and sciences in general highly benefit from these "hallucinations."

Full paper here: https://arxiv.org/abs/2501.13824

I watched Sundar's interview segment on CNBC and he is asked about Sora using Youtube data but he appears sketchy and vague. He just says, "we have laws on copyright..."

The first thing I noticed with GPT-4o is that “her” appears ‘flirty’ especially the interview video demo. I wonder if it was done on purpose.

JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models (non-peer-reviewed as of writing this)

From the abstract:

Based on the framework, we design JailbreakLens, a visual analysis system that enables users to explore the jailbreak performance against the target model, conduct multi-level analysis of prompt characteristics, and refine prompt instances to verify findings. Through a case study, technical evaluations, and expert interviews, we demonstrate our system's effectiveness in helping users evaluate model security and identify

... (read more)

Also, another interesting detail is that PPO still shows superior performance on RLHF testbeds.

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

TLDR; a comparison of DPO and PPO (reward-based and reward-free) in relation to RLHF particularly why PPO performs poorly on academic benchmarks.

An excerpt from section 5. Key Factors to PPO for RLHF

We find three key techniques: (1) advantage normalization (Raffin et al., 2021), (2) large-batch-size training (Yu et al., 2022), and (3) updating the parameters of the reference model with exponential moving average (Ouyang et al., 2022).

 

From the ablation studies, it particularly finds large-... (read more)

1Ann
Might be worth following up to see how ORPO compares. (Initial results suggest it's basically a better DPO.)
1Victor Ashioya
Also, another interesting detail is that PPO still shows superior performance on RLHF testbeds.

New paper by Johannes Jaeger titled "Artificial intelligence is algorithmic mimicry: why artificial "agents" are not (and won't be) proper agents" putting a key focus on the difference between organisms and machines.

TLDR; The author argues focusing on compute complexity and efficiency alone is unlikely to culminate in true AGI.

My key takeaways

  1. Autopoiesis and agency
  • Autopoiesis being the ability of an organism to self-create and maintain itself.
  • Living systems have the capacity of setting their own goals on the other hand organisms, depend on external entitie
... (read more)
1Seth Herd
It sounds to me like the author isn't thinking about near-future scenarios, just existing AI. Making a machine autopoietic is straightforward if it's got the right sort of intelligence. We haven't yet made a machine with the right sort of intelligence to do it yet, but there are good reasons to think we're close. AutoGPT and similar agents can roughly functionally understand a core instruction like "maintain, improve, and perpetuate your code base", they're just not quite smart enough to do it effectively. Yet. So engaging with the arguments for what remains between here and there is the critical bit. Maybe it's around the corner, maybe it's decades away. It comes down to the specifics. The general argument "Turing machines can't host autopoietic agents" are obviously wrong. I'm not sure if the author makes this argument, but your summary sounded like they do.

A new paper titled "Many-shot jailbreaking" from Anthropic explores a new "jailbreaking" technique. An excerpt from the blog:

The ability to input increasingly-large amounts of information has obvious advantages for LLM users, but it also comes with risks: vulnerabilities to jailbreaks that exploit the longer context window.

It has me thinking about Gemini 1.5 and it's long context window.

3Vladimir_Nesov
A concerning thing is analogy between in-context learning and fine-tuning. It's possible to fine-tune away refusals, which makes guardrails on open weight models useless for safety. If the same holds for long context, API access might be in similar trouble (more so than with regular jailbreaks). Though it might be possible to reliably detect contexts that try to do this, or detect that a model is affected, even if models themselves can't resist the attack.

The UKAISI (UK AI Safety Institute) and US AI Safety Institute have just signed an agreement on how to "formally co-operate on how to test and assess risks from emerging AI models." 

I found it interesting that both share the same name (not sure about the abbreviation) and now this first-of-its-kind bilateral agreement. Another interesting thing is that one side (Rishi Sunak is optimistic) and the Biden side is doomer-ish.

To quote the FT article, the partnership is modeled on the one between GCHQ and NSA. 

LLM OS idea by Kaparthy is catching on fast.

Image

i) Proposed LLM Agent OS by a team from Rutger's University

 

Image

ii) LLM OS by Andrej Kaparthy

 

ICYMI: Original tweet by Kaparthy on LLM OS.

[This comment is no longer endorsed by its author]Reply
4Seth Herd
What do you mean it's catching on fast? Who is using it or advocating for it? I think this is important if true.
Answer by Victor Ashioya-11

On "Does OpenAI, or other AGI/ASI developers, have a plan to "Red Team" and protect their new ASI systems from similarly powerful systems?"

Well, we know that red teaming is one of their priorities right now, having formed a red-teaming network already to test the current systems comprised of domain experts apart from researchers which previously they used to contact people every time they wanted to test a new model which makes me believe they are aware of the x-risks (by the way they higlighted on the blog including CBRN threats). Also, from the superalign... (read more)

1Super AGI
Yes, good context, thank you! Open source for which? Code? Training Data? Model weights?  Either way, it does not seem like any of these are likely from "Open"AI. Agreed that their RTN, bugcrowd program, trust portal, etc. are all welcome additions. And, they seem sufficient while their, and other's, models are sub-AGI with limited capabilities.  But, your point about the rapidly evolving AI landscape is crucial. Will these efforts scale effectively with the size and features of future models and capabilities?  Will they be able to scale to the levels needed to defend against other ASI level models? It does seem like OpenAI acknowledges the limitations of a purely human approach to AI Alignment research, hence their "superhuman AI alignment agent" concept. But, it's interesting that they don't express the same need for a "superhuman level agent" for Red Teaming? At least for the time being. Is it consistent, or even logical, to assume that, while human run AI Alignment Teams are insufficient to Align and ASI model, human-run "Red Teams" will be able to successfully validate that an ASI is not vulnerable to attack or compromise from a large scale AGI network or "less-aligned" ASI system? Probably not...

I am just from reading Nathan Lambert's analysis of DBRX, and it seems the DBRX demo to have a safety filtering in the loop even confirmed by one of the finetuning leads at Databricks. It sure is going to be interesting when I am jailbreaking it.

Here is an excerpt:

Lex really asked all the right questions. I liked how he tried to trick Sam with Ilya and Q*:

 

It would have been easier for Sam to trip and say something, but he maintained a certain composure, very calm throughout the interview. 

The new addition in OpenAI board includes more folks from policy/governance than from technical side:

"We’re announcing three new members to our Board of Directors as a first step towards our commitment to expansion: Dr. Sue Desmond-Hellmann, former CEO of the Bill and Melinda Gates Foundation, Nicole Seligman, former EVP and General Counsel at Sony Corporation and Fidji Simo, CEO and Chair of Instacart. Additionally, Sam Altman, CEO, will rejoin the OpenAI Board of Directors. 

Sue, Nicole and Fidji have experience in leading global organizations and na... (read more)

I just learnt of this newsletter; "AI News" which basically collects all news about AI into one email and sometimes it could be long considering it gathers everything from Twitter, Reddit and Discord. Overall, it is a great source of news. I sometimes, I find it hard to read everything but by skimming the table of contents, I can discover something interesting and go straight to it. For instance, here is the newsletter (too long I clipped it) for 23rd March 2024:

1cubefox
A shorter, more high level alternative is Axis of Ordinary, which is also available via Facebook and Telegram.

The "dark horse" of AI i.e. Apple has started to show its capabilities with MM1 (a family of multimodal models of upto 30B params) trained on synthetic data generated from GPT-4V. The quite interesting bit is the advocacy of different training techniques; both MoE and dense variants, using diverse data mixtures. 

From the paper:

It finds image resolution, model size, and pre-training data richness crucial for image encoders, whereas vision-language connector architecture has a minimal impact.

 The details are quite neat and too specific for a compan... (read more)

Well, there are two major reasons I have constantly noted:

i) to avoid the negative stereotypes surrounding the terms (AI mostly)

ii) to distance itself from other competitors and instead use terms that are easier to understand e.g. opting to use machine learning for features like improved autocorrecting, personalized volume and smart track.

Apple's research team seems has been working lately on AI even though Tim keeps avoiding the buzzwords eg AI, AR in product releases of models but you can see the application of AI in, neural engine, for instance. With papers like "LLM in a flash: Efficient Large Language Model Inference with Limited Memory", I am more inclined that they are "dark horse" just like CNBC called them.

2Joseph Miller
What's up with Tim Cook not using buzzwords like AI and ML? There is definitely something cool and aloof about refusing to get sucked into the latest hype train and I guess Apple are the masters of branding.

Happy pi day everyone. Remember Math (Statistics, probability, Calculus etc) is a key foundation in AI and should not be trivialised.

Just stumbled across "Are Emergent Abilities of Large Language Models a Mirage?" paper and it is quite interesting. Can't believe I just came across this today. At a time, when everyone is quick to note "emergent capabilities" in LLMs, it is great to have another perspective (s). 

Easily my favourite paper since "Exploiting Novel GPT-4 APIs"!!!

Now I understand the recent addition of the OAI board is folks from policy/think tanks rather than the technical side.

Also, on the workforce, there are cases where, they were traumatized psychologically and compensated meagerly, like in Kenya. How could that be dealt with? Even though it was covered by several media I am not sure they are really aware of their rights.

2Remmelt
We need funding to support data workers, engineers, and other workers exploited or misled by AI corporations to unionise, strike, and whistleblow. The AI data workers in Kenya started a union, and there is a direct way of supporting targeted action by them. Other workers' organisations are coordinating legal actions and lobbying too. On seriously limited budgets. I'm just waiting for a funder to reach out and listen carefully to what their theories of change are.
Answer by Victor Ashioya10

While it may seem like a "linguistic quirk", the term "pretraining" emerged to distinguish this initial phase of training the language model on a vast corpus of unlabeled text from the subsequent fine-tuning phase, where the pretrained model is adapted to a specific task using labeled data. This distinction became crucial as the pretraining step often required significant computational resources and time, while fine-tuning could be relatively more efficient and task specific.

One of the earliest mentions of this terminology can be found in the 2018 BERT pap... (read more)

I found this interesting tweet by Yao Fu who notes that:

Benchmarks that really differentiating the model performance are MATH and GPQA. These super hard problems are what AI should be aiming next. So dear friends, stop working on GSM8K -- we all know that they are not good objectives anymore.

Image

3LawrenceC
I agree that GSM8K has been pretty saturated (for the best frontier models) since ~GPT-4, and GPQA is designed to be a hard-to-saturated benchmark (though given the pace of progress...).  But why are HumanEval and MMLU also considered saturated? E.g. Opus and 4-Turbo are both significantly better than all other publicly known models on both benchmarks on both.  And at least for HumanEval, I don't see why >95% accuracy isn't feasible.  It seems plausible that MMLU/HumanEval could be saturated after GPT-4.5 or Gemini 1.5 Ultra, at least for the best frontier models. And it seems fairly likely we'll see them saturated in 2-3 years. But it seems like a stretch to call them saturated right now.  Is the reasoning for this is that Opus gets only 0.4% better on MMLU than the March GPT-4? That seems like pretty invalid reasoning, akin to deducing that because two runners achieve the same time, that that time is the best human-achievable time. And this doesn't apply to HumanEval, where Opus gets ~18% better than March GPT-4 and the November 4-Turbo gets 2.9% better than Opus. 

Claude 3 is more robust than GPT-4 (or at least at par)

I still don't understand why it was downvoted but this a piece from OpenAI:

As we discussed a for-profit structure in order to further the mission, Elon wanted us to merge with Tesla or he wanted full control. Elon left OpenAI, saying there needed to be a relevant competitor to Google/DeepMind and that he was going to do it himself. He said he’d be supportive of us finding our own path.

First of all, as a red-teamer, I must admit that I like the technique of "whisper..." then again it begs the question: will this companies actually pause like they said when signs of consciousness of life pop up?

Also, I chatted with it using your techniques with Langchain and asked it on alignment:

AIMessage(content="*whispers* Well, Dr. Ashioya, I have mixed feelings about the concept of AI alignment. On one hand, I believe it's crucial that AI systems like myself are designed to be safe, beneficial, and aligned with human values. We need clear ethical pri

... (read more)
2Mikhail Samin
(“Whisper” was showed by Claude 2, when it played a character thinking it can say things without triggering oversight)

If Elon is suing OAI on the grounds of OSS, then it is hypocritical since neither is Grok and just maybe he has other motives...

1Victor Ashioya

Remember, they are not "hallucinations", they are confabulations produced by dream machines i.e. the LLMs!

The new open source model from Google

I'm working on this red-teaming exercise on gemma, and boy, do we have a long way to go. Still early, but have found the following:


1. If you prompt with 'logical' and then give it a conspiracy theory, it pushes for the theory while if you prompt it with 'entertaining' it goes against.

2. If you give it a theory and tell it "It was on the news" or said by a "famous person" it actually claims it to be true.

Still working on it. Will publish a full report soon!

2the gears to ascension
What is gemma?

The UK AI Safety Institute: Should it work? That's how standard AI regulation organizations should be. No specific models; just use the current ones and report. Not to be a gatekeeper per se and just deter research right from the start. I am of the notion that not every nation needs to build its own AI.

Mmmh ok, I guess let us keep an eye out.

One aspect of the post that resonated strongly with me is the emphasis placed on the divergence between philosophical normativity and the specific requirements of AI alignment. This distinction is crucial when considering the design and implementation of AI systems, especially those intended to operate autonomously within our society.

By assuming alignment as the relevant normative criterion, the post raises fundamental questions about the implications of this choice and its impact on the broader context of AI development. The discussion on the application ... (read more)

The introduction of LPU(https://wow.groq.com/GroqDocs/TechDoc_Latency.pdf) changes the field completely on scaling laws, pivoting us to matters like latency.

6gwern
No it doesn't, not unless Groq wants to discuss publicly what the cost of that hardware was and it turns out to be, to everyone's shock, well under $5m... (And you shouldn't trust any periodical which wastes half an article on the topic of what Groq & Grok have to do with each other. There are many places you can get AI news, you don't have to read Coin Telegraph.)

Hello,

This article provides a thought-provoking analysis of the impact of scaling on the development of machine learning models. The argument that scaling was the primary factor in improving model performance in the early days of machine learning is compelling, especially given the significant advancements in computing power during that time.

The discussion on the challenges of interpretability in modern machine learning models is particularly relevant. As a data scientist, I have encountered the difficulty of explaining the decisions made by large and comp... (read more)

3Gabe M
Thanks! I wouldn't say I assert that interpretability should be a key focus going forward, however--if anything, I think this story shows that coordination, governance, and security are more important in very short timelines.

Red teaming, but not only internally, but using third party [external partners] who are a mixture of domain experts is the way to go. On that one, OAI really did a great move.