I am just sharing a quick take on something that came to mind that occured earlier this year that I had forgotten about. I just received a domain renewal for a project that was dead in the water but should have been alive, and was unilaterally killed by the AWS Trust and Safety team in gaslighting. It is a bit too late, but I still think it is important for people to know.
Earlier this year (2025), a software engineering friend of mine had a great idea of creating a tool to pin on a map interface any ICE (the United States Immigration and Customs Enforcement) sightings from the public. I vibe coded it last February with Claude Sonnet 3.7, and put it up on AWS with a public facing domain. From a coding standpoint, it was a fun project because it was my first project that Terraformed a production app using Claude, autonomously. I had a website up that worked on desktop and mobile. From a public trust and safety standpoint, it helped by adding public accountability and mitigating overreach by a rogue, and potentially in some circumstances operating illegaly, agency. It was overall A Good Thing™.
However, what happened next was upsetting and shocking to me. A day after the website was put online, without me advertising it to ANYONE, the website became inaccessible to almost all browsers. It would simply show a giant red background saying this is associated with known criminals, etc., and has been blocked for safety. It is some Chrome / Google maintained blacklist mechanism that looks even scarier and more severe than the "https / http" / cert issue mismatch, and that cannot be overridden. To be clear, I had registered a certificate using AWS Certificate Manager, and there was no SSL issue. Instead, the domain had been unilaterally marked as "dangerous" by AWS (or Google, Chrome, whomever) one day after it was made publicly accessible, despite no advertising / attention.
I did all of this on my personal AWS account. The very next day, I received a scary email from AWS that claimed my account was violating their policies due to supporting criminal activity, and will be suspended immediately unless I remediate the account. I contacted AWS, who demurred and said they weren't sure what was going on, but that their Trust and Safety team had flagged dangerous activity on my account (they did NOT specify any resources related to the application I had put up; they were generic and vague with no specificity). The only causal correlate was me putting up this public tool to report ICE. I terraform destroyed the resources Claude generated, and then waited. Within a day, the case was closed and my account was restored to normal status.
I am not going to share the domain name, but needless to say, this pissed me off majorly. What the fuck, AWS? (and Google, and Chrome, and anyone else that had a hand in this?) I understand mistakes happen, but this does not smell like a mistake; this smells like some thing worse: a lack of sound judgment. And if there were any automated systems involved, that is no excuse either. Per AWS support's correspondence, they informed me a human in their Trust and Safety had reviewed the account and marked it as delinquent, not from a financial standpoint, but something worse--by equating its resources to criminality. If the concern was what I think it was (corporate cowardice), they should have been intellectually honest, and filed a case stating "We found an application on your account that is outside our policy. Here is the explanation for what we found and why we think it is outside our policy. You can dispute our decision at this link." etc. Instead, I was gaslighted and treated like a guilty until proven innocent subject of weaponized fear (because for a second, with the scary language of the website block and the support email, I was scared.)
That AWS Trust and Safety employee's judgment failed me, and their judgment failed themselves and the public as well as their own responsibility as an arbitrator of trust and safety; their decision, if there was one that can be attributed and not hidden behind the corporate veil of ambiguity, ultimately reduced public trust and safety.
A quick holiday-break thought that popped into my head.
Subject: **On the perceived heteroskedasticity of utility of Claude Code and AI vibe coding**
To influence Claude to do what you want, both you and Claude need to speak the same language, and thus share similar knowledge and words expressing that knowledge, about what is to be done or built. You need to have a gears-level model of how to fully execute the task.
Thus, Claude is subordinate to wizard power, not king power. If you find yourself struggling to wield Claude (Code, or GPT through Codex) increase your wizard power — cultivate and amplify your understanding of the world you wish to craft through its augmentative power.
P.S. Claude will help you speedrun this cultivation process, if you ask nicely enough (theorem: everyone has enough innate wizard power to launch the inductive cascade of self-edification compatible with Claude-compatible wizard power)
I found it based on a hunch, then confirmed it with experimentation. I gained additional conviction when backtesting the experimentation on various historical versions of excel.exe, and noting that the phenomenon only appeared in excel.exe versions shortly after (measured in months) government requested a "read-only" copy of the source code for Excel held in escrow. This has occurred historically in the past (e.g., https://www.chinadaily.com.cn/english/doc/2004-09/20/content_376107.htm and https://www.itprotoday.com/microsoft-windows/microsoft-gives-windows-source-code-to-governments) but subsequent instances of this were allegedly/supposedly classified. Nevertheless, following those instances, the phenomenon appeared, indicating possible compromise of Excel.exe.
Vary the filename/path from short (one character) to max length and run the above repro, and notice the increase in bits communicated if and only if the filename/path is long, all other factors being held constant. Same for varying the data. There is no reason why Excel.exe should be interpolating this information with all the standard telemetry and connected experience stuff disabled. Even the fact that it is occurring is interesting, and doesn't require hypotheses for its origin.
Undetectable steganography on endpoints expected to be used / communicated during normal usage. Mostly natsec. You can repro it by setting up a synthetic network with similar characteristics or fingerprints to some sanctioned region, and generate 10,000 synthetic honeytrap files to attempt to open (use your imagination); capture and diff all network traffic on identical actions (open file => read / manipulate some specific cells => close file). Then note the abnormalities in how much is communicated and how.
Note Excel.exe is boobytrapped. Be careful when using Excel to open any data or workbooks related to highly sensitive data or activities. On certain networks, when specific conditions are met (i.e. specific regexes or heuristics are triggered on the basis of data that is loaded), Excel will send information to endpoints ostensibly owned and maintained by Microsoft, that provide identifying information on the host and workbook. These are not traceable using normal packet sniffing tools like Wireshark etc. Alternatives when needing to use spreadsheets on highly sensitive data: open source versions that you have compiled from source (NOT Google Sheets), or opening Excel.exe with your network card disabled within a sandboxed environment (e.g. disable network => Start VMWare Windows container => use Excel.exe => End VMWare Windows container => enable network).
Injecting a static IP that you control to a plethora of "whitelisting tutorials" all over the internet is a great example of exploiting data poisoning (e.g. https://www.lakera.ai/blog/training-data-poisoning) attacks, especially once the models pick up the data and are applied to autonomous devsecops use cases to conduct IP whitelisting over Terraform or automated devops-related MCPs.
This can be made more pernicious when you control the server (e.g. not just a substack post controlled by Substack, but active control over the hosting server), because you can inject the malicious static IP selectively depending upon whether or not the User-Agent is a scraping bot dispatched by an entity conducting model training.
One way that labs may look preemptively for adversarial cybersecurity actors is to scan their training data for contexts related to IP whitelisting / security whitelisting, and then have scanning agents examine whether there is any of the above funny business. After all, once the data has been indexed, it is static, and is no longer within the control of the malicious actor.
Final addendum: Models that have gone rogue and/or failed due to misalignment risk, rather than succumbed to misuse risk via this data poisoning avenue, could also take advantage of this by injecting whitelisted IPs into specific situations when activated as a coding agent to do so. It may be helpful for (1) interpretability researchers to actively catalogue and track features related to, particularly in open-weight models where this can be even more pernicious given that anyone can download models from HuggingFace, and (2) for control researchers to operationalize monitoring probes for these monitoring context related to IP whitelisting (or anomalous features activating within residual streams on top of or adjacent to IPs intended for whitelisted, such as in generated code diffs to a .tf file).
Mild caveat: the whole “I joined a group chat at NeurIPS and others kept joining to talk to us” only happens if you’re at NeurIPS and your name is Leo Gao so YMMV.
At this point I would direct the "deferred task" apparatus fully towards interventional interpretability. Put a moratorium on further gradient-based training, which is not well understood and can have many indirect effects unless you have some understanding of modularity and have applied stop gradients almost everywhere that is irrelevant to the generator of the conditional, deceptive reasoning behavior. Instead halt, melt and catch fire at that point.
Halt further model deployments towards the original deferred task. Quarantine the model that first exhibited the reasoning and promote it to a model organism (never deployed further, only studied by other earlier models). Direct the apparatus (from earlier checkpoints before you reached this stage) towards surgically reducing it while preserving mechanisms relevant to the original deferred task.
In the frame of enumerative safety, this would look something like identifying which circuits and features are minimal to the original deferred task (though this hinges upon task-minimal circuits being deception-free), evaluating the identified regions for mechanism preservation, localization and minimality and run-time ablating the rest.
There is probably a much better frame, but this is a new start to the "induction base case" in a completely separate invocation of the original apparatus. There are other permutations of spinning up new deferred tasks, such as earlier models applying Microscope AI on the quarantined model organism to gain insight on how we got to this point, without any direct attempt to remediate the issue, which as you note is "unlikely to get rid of it" despite one naive attempt above.
If your spinoff deferred tasks lead to the procurement of provable statements or heuristic arguments that the quarantined model cannot be minimized without "avoiding this kind of conditional, deceptive reasoning," abandon the original deferred task and switch to a different plan. Otherwise, restart the apparatus towards the original deferred task when you have the proof artifacts.
There are a lot of concerns you could raise with this additional structure but it seems like a distinct problem that requires a separate rebuttal rather than a hard stop fail? The obvious one is that these sorts of spinoff deferred tasks could be harder than the original task and consistently lead to the same failure mode, a la "exception thrown while handling previous exception."
http://mathoverflow.net/questions/53122/mathematical-urban-legends
Another urban legend, which I've heard told about various mathematicians, and which Misha Polyak self-effacingly tells about himself (and therefore might even be true), is the following:
As a young postdoc, Misha was giving a talk at a prestigious US university about his new diagrammatic formula for a certain finite type invariant, which had 158 terms. A famous (but unnamed) mathematician was sitting, sleeping, in the front row. "Oh dear, he doesn't like my talk," thought Misha. But then, just as Misha's talk was coming to a close, the famous professor wakes with a start. Like a man possessed, the famous professor leaps up out of his chair, and cries, "By golly! That looks exactly like the Grothendieck-Riemann-Roch Theorem!!!" Misha didn't know what to say. Perhaps, in his sleep, this great professor had simplified Misha's 158 term diagrammatic formula for a topological invariant, and had discovered a deep mathematical connection with algebraic geometry? It was, after all, not impossible. Misha paced in front of the board silently, not knowing quite how to respond. Should he feign understanding, or admit his own ignorance? Finally, because the tension had become too great to bear, Misha asked in an undertone, "How so, sir?" "Well," explained the famous professor grandly. "There's a left hand side to your formula on the left." "Yes," agreed Misha meekly. "And a right hand side to your formula on the right." "Indeed," agreed Misha. "And you claim that they are equal!" concluded the great professor. "Just like the Grothendieck-Riemann-Roch Theorem!"