I was thinking of trying out Sustained Attention to Response Task (SART) with response feedback (SART 2). I'm not sure how it compares to Dual n-Back FAQ · Gwern.net.
I’m in Canada and have been AI x-risk pilled for years but I really don’t have bandwidth to participate in this. Hope something good comes out of it.
I don't follow the reasoning behind excluding coding help. Does the paper elaborate on why disempowerment in that area is considered benign?
see sam altman on giving full access to codex
Unless you're on the macOS security team at Apple, I would encourage you to consider removing that part of the prompt. I think that lying (even to models) has negative first-order and second-order consequences.
Claude Code has been reported to act on phantom user messages without any user input. Source.
Hi,
Thank you for writing this report and for sharing your data. I have a non-24 hour sleep schedule as well so I’m expecting this to be a helpful resource for me. I’ll be sure to discuss it with my doctor. It’s great information to have.
I won’t be switching over to light glasses or red bulbs but this did get me to adjust the temperature setting of the main bulb in my room to a warmer default. So you have changed one small part of the world in one small way as a direct result of putting this post up.
Best Wishes
I’ve had about ~3000 sessions across Claude Code and Codex, and wanted to write about ~8 of the more interesting stories from that experience, but I’m probably not going to prioritize that anytime in the near future sadly.
Rewind is a tool for scrolling back in time. It automatically records screen and audio data. I leave it running in the background, in spite of this incurring some performance overhead. I have collected over 200GB over the past year.
Limitless.ai was acquired by Meta and will shut down the product on December 19th. I will back up my files, but I do not know if it is possible to roll back the update which disables recording. I am not aware of any recommended alternative which is actively maintained and was unable to discover this with a quick search. I would appreciate suggestions.
It is possible that state tracking could be the next reasoning-tier breakthrough in frontier model capabilities. I believe that there exists strong evidence in favor of this being the case.
State space models already power the fastest available voice models, such as Cartesia's Sonic (time-to-first-audio advertised as under 40ms). There are examples of SSMs such as Mamba, RWKV, and Titans outperforming transformers in research settings.
Flagship LLMs are also bad at state tracking, even with RL for summarization. Forcing an explicit schema added to the top of every message is one of the less elegant solutions used to fix this. Tracker is the second most popular extension for SillyTavern, as measured by the number of upvotes or... (read more)
One percent of the world’s AI compute (LLM-grade GPU capacity) is in the UAE which does not have an AI Security Institute. I’ve planned to spend 6-9% of my bandwidth this month (2-3 days during May 2025) on encouraging the UAE to establish an AISI. Today is the first day.
However in my view even the most optimistic impact estimate of the successful execution of that plan doesn’t realistically lead to a greater than 2% shift in the prediction market of the UAE starting an AI Security Institute before 2026. Even if a UAE AISI existed, then it would not be allocated more than 1% to 5% (mode 2%) of the overall national AI... (read 452 more words →)
Thread: Research Chat with Canadian AI Safety Institute Leadership
I’m scheduled to meet https://cifar.ca/bios/elissa-strome/ from Canada’s AISI for 30 mins on Jan 14 at the CIFAR office in MaRS.
My plan is to share alignment/interp research I’m excited about, then mention upcoming AI safety orgs and fellowships which may be good to invest in or collaborate with.
So far, I’ve asked for feedback and advice in a few Slack channels. I thought it may be valuable to get public comments or questions from people here as well.
Previously, Canada invested $240m into a capabilities startup: https://www.canada.ca/en/department-finance/news/2024/12/deputy-prime-minister-announces-240-million-for-cohere-to-scale-up-ai-compute-capacity.html. If your org has some presence in Toronto or Montreal, I’d love to have permission to give it a shoutout!
Elissa is the lady on the left in the second image from this article: https://cifar.ca/cifarnews/2024/12/12/nicolas-papernot-and-catherine-regis-appointed-co-directors-of-the-caisi-research-program-at-cifar/.
My input is of negligible weight, so wish to coordinate messaging with others.
If k is even, then k^x is even, because k = 2n for n in and we know (2n)^x is even. But do LLMs know this trick? Results from running (a slightly modified version of) https://github.com/rhettlunn/is-odd-ai. Model is gpt-3.5-turbo, temperature is 0.7.
Is 50000000 odd? false
Is 2500000000000000 odd? false
Is 6.25e+30 odd? false
Is 3.9062500000000007e+61 odd? false
Is 1.5258789062500004e+123 odd? false
Is 2.3283064365386975e+246 odd? true
Is Infinity odd? true
If a model isn't allowed to run code, I think mechanistically it might have a circuit to convert the number into a bit string and then check the last bit to do the parity check.
The dimensionality of the residual stream is the sequence length (in tokens) * the embedding dimension of... (read more)
One thing I like to do on a new LLM release is the "tea" test. Where you just say "tea" over and over again and see how the model responds.
ChatGPT-4 will ask you to clarify and then shorten its response each round converging to: "Tea types: white, green, oolong, black, pu-erh, yellow. Source: Camellia sinensis."
Claude 3 Opus instead tells you interesting facts about tea and mental health, production process, examples in literature and popular culture, etiquette around the world, innovation and trends in art and design.
GOODY-2 will talk about uncomfortable tea party conversations, excluding individuals who prefer coffee or do not consume tea, historical injustices, societal pressure to conform to tea-drinking norms.
Gemma-7b... (read more)
Smooth Parallax - Pixel Renderer Devlog #2 is interesting. I wonder if a parallax effect would be useful for visualizing activations in hidden layers with the logit lens.
The main thing we care about is consistency and honesty. To maximize that, we need to retrieve information from the web (though this has risks), https://openai.com/research/webgpt#fn-4, select the best of multiple summary candidates https://arxiv.org/pdf/2208.14271.pdf, generate critiques https://arxiv.org/abs/2206.05802, run automated tests https://arxiv.org/abs/2207.10397, validate logic https://arxiv.org/abs/2212.03827, follow rules https://www.pnas.org/doi/10.1073/pnas.2106028118, use interpretable abstractions https://arxiv.org/abs/2110.01839, avoid taking shortcuts https://arxiv.org/pdf/2210.10749.pdf, and apply decoding constraints https://arxiv.org/pdf/2209.07800.pdf.
Actions speak louder than words. Microsoft's take on Adept.ai's ACT-1 (Office Copilot) is more likely to destroy the world than their take on ChatGPT (new Bing).
I'm visiting the area tomorrow, so I thought I'd schedule a last-minute event in case anyone wants to meet up and talk about large language models.
I think that GDM's safety team is 40+ people now.