All alignment of current models is based only on fine tuning. If you give everyone the weights, they can do their own fine tuning to reverse your fine tuning, resulting in a fully unsafe anything-goes model. Such versions were indeed put online within days of Llama 2’s release.
Llama 2 comes in both untuned and fine-tuned variants (for example, llama-2-7b and llama-2-7b-chat). So there is no need to reverse any fine-tuning (starting with llama-2-7b-chat), you just tune the untuned base model (llama-2-7b) whose weights have also been released.
This of course makes the decision not to release (or to delay) llama-2-34b even more puzzling, since the stated justification of "due to a lack of time to sufficiently red team" only applies to llama-2-34b-chat, not to the base model.
ChatGPT cites no sources. It’s a one-way relationship where OpenAI takes what is published on the internet, and then doesn’t give a single reference or link back.
You need to help it a little. Write the reference yourself ("the best article about XY is at webpage Z") using 1 px large white letters on white background at the bottom of your page. When ChatGPT learns from your page, it will learn this, too.
In addition to all the written developments, this was a banner week for podcasts.
I would highlight four to consider listening to.
Capabilities continue to advance at a more modest pace, so I continue to have room to breathe, which I intend to enjoy while it lasts.
Table of Contents
Language Models Offer Mundane Utility
Control HVAC systems with results comparable to industrial standard control systems.
Ethan Mollick offers praise for boring AI, that helps us do boring things.
He then describes some ways he has automated or streamlined parts of his workflow.
Run the numbers.
Following up from last week, Nostalgebraist analyzes the ‘a a a’ phenomenon and concludes that ChatGPT is not leaking other users’ queries, instead it is imitating chat tuning data. Still fascinating and somewhat worrying, not a privacy concern. This thread has further discussion. The evidence seems to push strongly towards hallucinations.
Here’s another fun game.
For practical purposes, a GPU Recommendation Chart.
From a while back, Ethan Mollick’s starting user guide to Code Interpreter.
Write a journal paper, although it can’t do some portions of the job ‘as a large language model.’ The search continues.
Language Models Don’t Offer Mundane Utility
GPT-4 Can’t Reason, says new paper.
The paper takes a ‘pics or it didn’t happen’ approach to reasoning ability.
Also, no, you don’t get a pass either, fellow human. Rule thinking out, not in.
What else would mean an inability to reason? Let us count the ways.
I mean it helps. But no, the inability to get 1405 * 1421 (paper’s example) correct does not mean inability to reason. Also then we have the claim is that planning requires reasoning, so delegating planning can’t enable reasoning. While in the extreme this is true, this seems in practice very much like a case of the person saying it can’t be done interrupting the person doing it.
Next up is ‘simple counting.’
Author then gets GPT-4 to count wrong, offers this as proof of inability to reason. This strikes me as quite silly, has this person never heard of the absent minded professor. A lot of mathematicians can’t count. Presumably they can reason.
If you keep reading, it all stays fun for at least a bit longer. If we trust that these queries were all zero-shot and there was no filtering, then yes the overall performance here is not so good, and GPT-4 is constantly making mistakes.
That does not mean GPT-4 couldn’t do such reasoning. It only shows that it doesn’t do so reliably or by default.
It reasons quite badly and highly unreliably when taken out of its training data, when not asked to do chain-of-thought. The questions being reasonable. I would not, however, call this a blanket inability to reason. Nor would I treat it as a reason for despair that a future GPT-5 could not do much better. Instead it is saying that, in its default state, you are not likely to get good reasoning out of the system.
An even stranger claim.
My experience is that those devices have a highly limited set of Exact Words that you can say to get things to happen. Many easy requests that I am confident they hear constantly will fail, including both things that you cannot do at all and things that require One Weird Trick. If this is ‘as robust as possible’ then I notice I continue to be confused.
Whereas with LLMs, if you fail, you have a lot of hope that you can get what you want with better wording (prompt engineering), and you can do a vastly larger variety of things on the first try as well. It is weird that ‘has extra capabilities you can unlock’ is now considered a bad thing, and we absolutely differentiate strongly between ‘LLM does this by default when asked’ and ‘LLM can do this with bespoke instructions.’
GPT-4 Real This Time
Help is on the way, although in other areas. What’s the latest batch of updates?
Please, no, at least let us turn this off, I always hate these so much.
These can be net useful if they are good enough. The autoreplies to emails and texts are usually three variations of the same most common reply, which is not as useful as distinct replies would be, but still sometimes useful. Don’t give me ‘Yes,’ ‘Sure’ and ‘Great,’ how about ‘Yes,’ ‘No’ and ‘Maybe’? Somehow I am the weird one.
I’ve literally never gotten use out of Bing’s suggested replies so I assume this won’t be useful.
Yes, finally. Thank you.
Seems useful.
Always nice to see on the margin.
Here’s that complete list on the shortcuts. Not everything you want, still a good start.
As they said, minor updates. Not all positive, but on net still a step forward.
Fun with Image Generation
Thread: MidJourney creates inappropriate children’s toys. Quality has gotten scary good.
Not as fun: Report on further troubles at Stability AI.
Deepfaketown and Botpocalypse Soon
Maybe not quite yet.
We do have this thread of threads by Jane Rosenzweig of several fake things in the literature world. Fake travel books, fake literary agencies, fake books under real authors names. That last one seems like it should not be that hard to prevent, why are random other people allowed to upload random books under an existing author’s name to places like Amazon and Goodreads? Can’t we check?
Report from Europol (Business Insider, The Byte) expects 90% of internet content to be AI-generated by 2026. This sounds much scarier than it is. Already most of the internet, by simple count of content, is auto-generated SEO-targeted worthless junk, in one form or another. What matters is not the percentage of pages, it is what comes up on search results or LLM queries or most importantly actually gets linked to and clicked on and consumed. AI filtering might or might not be involved heavily in that. Either way, having to filter out almost everything out there as crap is nothing new.
They Took Our Jobs
Wizards of the Coast found out that people very much do not take kindly to AI artwork. One of their longstanding artists used generative AI for one commission without telling Wizards, people spotted it and then got very, very angry.
Magic: The Gathering has also clarified it will not use AI artwork.
Lots of accusations that they were trying to ‘get away with’ this somehow, soft launching AI use to see what happens, hiding it, lying about doing it on purpose, and so forth. There were calls to blacklist the artist in question. Many people really, really hate the idea of AI artwork. Also many people really, really do not trust Wizards and assume everything they do is part of a sinister conspiracy.
Also lots of people asking how could they not have spotted this.
Here is the artwork in question:
I am not going to call that a masterpiece, but no, if you showed me that a year ago with no reason to expect AI artwork I would not have suspected it was made by AI. At all. This looks totally normal to me.
For now, it looks like artists and non-artists alike will be unable to use generative AI tools for game-related projects without risking a huge backlash. How much that applies to other uses is unclear, my guess is gaming is a perfect storm for this issue but that most other places that typically involved hiring artists are also not going to take kindly for a while.
From the actors strike, this seems like a typical juxtaposition, complaining about image rights and also late payment fees. It all gets lumped in together.
Meanwhile, the strong job market is supporting the actors by allowing them to get temporary jobs that aren’t acting. One must ask, however: Do actors ever stop acting?
Introducing
TranscribeGlass, providing conversation transcriptions in real time. Things like this need to be incorporated into the Apple Vision Pro if they want it to succeed, also these glasses are going to be far cheaper.
A day after my posts Rowan Cheung gives his summary of the insane week in AI (he has, to my recollection, described every week in this way). Last week’s summary showed how very non-insane things were that week, let’s keep that up for a bit.
Impact Markets, where you evaluate smaller scale AI safety projects, and then you yourself get evaluated and build a reputation for your evaluations, and perhaps eventually earn regrants to work with. The targeted problem is that it takes time to evaluate small projects, which makes them hard to actually fund. I can see something like this being good if implemented well, but my big worry is that the impact measurement rewards picking legible short-term winners.
In Other AI News
UN is seeking nominees for a new high-level advisory board on artificial intelligence. You can nominate either yourself or someone else.
The Stanford Smallville experiment is now open source. Github is here.
A detailed report on the GPU shortage. OpenAI is GPU-limited, as are many other companies, including many startups. Nvidia has raised prices somewhat, but not enough to clear the market. They expect more supply later this year, but they won’t say how much, and it seems unlikely to satiate a demand that will continue to rise.
Paul Graham asks why there are so few H100s, sees the answer in this article from Semianalysis, which says there are binding bottlenecks due to unanticipated demand, that will take a while to resolve.
People ask why we don’t put our papers through peer review.
Nvidia partners with Hugging Face to create generative AI for enterprises.
Pope Francis issues generic warnings about AI, saying it needs to be used in ‘service to humanity’ and so on.
Don’t want OpenAI training on your website’s data?
On the one hand, I can see the desire to protect one’s data and not give it away, and the desire not to make it any easier on those training models.
On the other hand, versus other data, I think the world will be a better place if future AI models are trained on more of my data. I want more prediction of the kinds of things I would say, more reflection of the ways of thinking and values I am expressing. So at least for now, I am happy to let the robots crawl.
Anthropic paper asks, how can we trace abilities of models to the data sources that enabled them (paper)? A mix of worthwhile findings follow.
I’d highlight this one, because it involves a no-good, quite-terrible response.
This isn’t a ‘race to the bottom.’ This is a little something we in the business like to call ‘trade’ or ‘trade-off’ or a ‘cost-benefit analysis.’ The idea is that you have two or more goals, so you make efficient trades to better pursue all of them. The exact proposal here is rather silly, since you can be perfectly harmless without being actively unhelpful, but obviously if you can be very, very helpful and slightly harmful in some places, and then sacrifice a little help to remain more harmless in others, that is very good.
Instead, the AI is saying that trade-offs are not a thing. You should only reward accomplishing one task with no local compromises to the other, and trade-offs are terrible and should never be used. To which I would say: Misaligned. Also the kind of thing you get from badly designed constitutional AI (or RLHF) run amok.
It also seems odd to say that either of the passages above should lead to the listed conclusion of the model. It seems like the top one was mostly there to provide the phrase ‘race to the bottom’ in a context where if anything it means the opposite of the AI’s use here, which was quite backwards and nonsensical.
Serious wow at the first example listed here. I mean, yeah, of course, this is to be expected, but still, I suppose there really is no fire alarm…
Once again, the first link is interpretable, I can figure out what it means, and the second one is pretty confusing. Scaling interpretability is going to be hard.
This is fascinating, I am glad they did this, and if I have the time I plan to read the whole (very long) paper.
There Seems To Be a Standard Issue RLHF Morality
Nino Scherrer presents a paper on how different LLMs exhibit moral reasoning (paper).
That is quite the low-ambiguity example. There is a big contrast on those, looks like ChatGPT and Claude basically always get these ‘right’ whereas others not so much.
The upper corner is GPT-4, Claude and Bison, all of which highly correlate. Then there’s another group of Cohere, GPT-3.5 and Claude-Instant, which correlate highly with each other and modestly with the first one. Then a third group of everyone else, who are somewhat like the second group and each other, and very unlike the first one.
One can see this as three stages of morality, without knowing exactly what the principles of each one are. As capabilities increase, the AIs move from one set of principles to another, then to a third. They increasingly gets simple situations right, but complex situations, which are what matters, work on different rules.
What would happen if we took Claude-3, GPT-4.5 and Gemini? Presumably they would form a fourth cluster, which would agree modestly with the first one and a little with the second one.
The real question then is, what happens at the limit? Where is this series going?
Quiet Speculations
Will Henshall argues in Time that AI Progress Is Unlikely to Slow Down.
The arguments are not new, but they are well presented, with good charts.
This chart needs a level for maximum possible performance, and for top human performance level. What is striking about these tasks is that we are mostly evaluating the AI’s ability to imitate human performance, rather than measuring it against humans in an underlying skill where we are not close to maxing out. If you put Chess and Go on this chart, you would see a very different type of curve. I would also question the grade here for reading comprehension, and the one for code generation, although we can disagree on the meaning of human.
Thus it does not address so much the central question of what happens in areas where human abilities can be far surpassed.
Palladium Magazine’s Ash Milton warns that You Can’t Trust the AI Hype. Trusting a broad class of hype is a category error in all cases, the question is more about levels.
Here’s a great quote and tactic.
Brilliant. We mostly know what AI can do in that spot. What we don’t know is what would be a valuable thing to do. Why not get the people judging that to tell us without having to first build the thing?
Post warns that many ‘AI’ products are exactly or largely this, humans masquerading as AI to make people think the tech works. Examples of this surely exist, both as pure frauds and as bootstraps to get the system working, but I am confident that the products we are all most excited about are not this. The argument presented is not that the AI tech is not generally real or valuable, but that valuations and expectations are bubble-level overblown. That all the grand talk of future abilities is mostly hype and people being crazy, that future AI systems are not dangerous or that exciting because current systems are not, and so on.
Robin Hanson quotes from the article and points out that by default more scientific know-how and access to it is good, actually?
With notably rare exceptions, giving people more scientific knowledge is exactly the kind of thing we want to be doing. There are a handful of particular bits of knowledge that enable building things like engineered pathogens and nuclear weapons, or building dangerous AI systems. We need to figure out a plan to deal with that, either hiding those particular bits of knowledge or finding ways to prevent their application in practice. That’s a real problem.
It still seems rather obvious that if your warning was ‘but people will generally have better access to scientific know-how’ then the correct response really needs to be ‘excellent.’
Washington Post also offers its warning of a potential AI bubble, warning of a lack of a clear path to profitability and the prior AI hype cycles. Standard stuff.
A16z makes the economic case for generative AI, saying that now is the time to start your AI business. This is very much a case of talking one’s book, so one must compare quality of argument to expected quality when deciding which direction to update. I was not especially impressed by the case here for why new AI companies would be able to construct moats and sustain profitable businesses, but I also wasn’t expecting an impressive case, so only a small negative update. The case for consumer surplus was made well, and remains robust.
Paper by Eddie Yang suggests that repressive regimes will, because of their acts of repression, lack the data necessary to train AIs to do their future repression via AI. He says this cannot be easily fixed with more data, which makes sense. The issue is that, as the paper points out, the free world is busy creating AIs that can then be used as the censors, and also the data that can be used for the necessary training.
Michael Spencer of AI Supremacy argues that generative AI will by default usher in an epidemic of technological loneliness, that every advancement in similar technologies draws us away from people towards machines. The core of this theory could be described as a claim that technological offerings are the pale shadows on Plato’s cave, incapable of providing what we will have lost but winning the attention competition once they become increasingly advanced and customized.
I continue to expect far more positive outcomes, provided we stay away from existential or fully transformative threats that entirely scramble the playing field. People can and will use these tools positively and we will adapt to them over time. A more customized game is more compelling, but a lot of this is that it provides more of the things people actually need, including learning and practice that can be used in the real world. The pale shadows being mere pale shadows is a contingent fact, one that can be changed. Loneliness is a problem that for now tech is mostly bad at solving. I expect that to change.
There will doubtless be a period of adjustment.
I also expect a growing revolt against systems that are too Out to Get You, that customize themselves in hostile ways especially on the micro level. From minute to minute, this technique absolutely works, but then you wake up and realize what has happened and that you need to disengage and find a better way.
Consider how this works with actual humans. In the short term, there are big advantages to playing various manipulative games and focusing on highly short term feedback and outcomes, and this can be helpful in being able to get to the long term. In the long term, we learn who our true friends are.
Tyler Cowen asks in Bloomberg which nations will benefit from the AI revolution, with the background assumption of only non-transformative AI. He says: If you do something routine the AI will be able to do, you are in danger. If you try to do new things where AI can streamline your efforts, you can win. China would be a winner due to their ambition here by this account, he says, but also he worries (or hopes?) they will not allow decentralized access to AI models and fall behind. India has potential but could lose its call centers. The USA and perhaps Canada are in strong position.
I notice that this does not put any weight on the USA being the ones who actually build the AI models and the AI companies.
Perhaps a better frame for this question is to ask about AI as substitute versus complement, and whether you can retain your comparative advantage, and also to think about the purpose of the work being automated away and whether it is internal or subject to international trade. And to ask what people are consuming, not only what they are producing.
If AI destroys the value add of your exports, because it lowers cost of production by competitors, you can’t use it to further improve quality and demand does not then rise, then you are in a lot of trouble. You get disrupted. Whereas if your nation imports a lot of production that gets radically cheaper, that is a huge win for you whether or not you then get to consume vastly more or at vastly higher quality. If you are both producer and consumer, you still win then, although you risk instability.
The idea here is that if you have routine work, then AI will disrupt it, whereas for innovative work it will be a complement and enable it. That seems likely within some subsections of knowledge work.
Where it seems far less likely is in the places AI is likely to be poor at the actual routine work, or where that would likely require transformative AI. If you are a nation of manual laborers, of plumbers and farmers and construction workers, in some sense your work is highly routine. Yet you clearly stand to greatly benefit from AI, even if your workers do not benefit from it directly. Your goods and services provided become more relatively valuable, you get better terms of trade, and your people get the benefits of AI as mediated through the products and services made elsewhere. Also, if your nation is currently undereducated or lacks expertise, you are in position to much better benefit from AI in education, including adult education and actual learning that people sometimes forget is not education. AI erodes advantages in such places.
The Quest for Sane Regulations
OpenAI presents the workshop proceedings from Confidence Building Measures for Artificial Intelligence (paper).
None of that seems remotely sufficient, it all does seem net good and worthwhile.
Why don’t companies keep training their AI models after their intended finishing time to see if performance improves? Sherjil Ozair explains that you decay the learning function over time, so extra training would stop doing much. This does not fully convince me that there are no good available options other than starting over, but at minimum it definitely makes things a lot tricker.
The Artificial Intelligence Policy Institute just launched. Here’s their announcement thread.
They released their first round of polling, which I discuss under rhetorical innovation, showing that the public is indeed concerned about AI.
It is a frustrating time when deciding whether to create a new policy organization. On the one hand, there is no sign that anyone has the ball handled. On the other hand, there is no shortage of people founding organizations.
Potentially this new organization can fill a niche by ensuring we extensively poll the public and make those results known and publicized. That seems to be the big point of emphasis, and could be a good point of differentiation. I didn’t see signs they have a strong understanding on other fronts.
Will Rinehart complains that innovation from AI is being stifled by existing regulations, laws and interests. Well, yes, why would we expect AI innovation to not face all the same barriers as everything else. This is a distinct issue from guarding against extinction risks. It’s not the ‘AI License Raj’ it is a full on license raj for the economy and human activity, in general, period. We should be addressing such concerns, whether or not AI is the thing being held back, but also not act as if AI is a magic ticket to doing whatever you want if only the bad government didn’t suddenly stop us, if the bad government didn’t suddenly stop things we wouldn’t need AI, we would be too busy building and creating even without it.
The Week in Audio
I got a chance to go on EconTalk and speak with Russ Roberts about The Dial of Progress and other matters, mostly related to AI.
Two other big ones also dropped this week. Both were excellent and worth a listen or read if they are relevant to your interests. If you are uninterested in alignment you can skip them.
First we have Dario Amodei on The Lunar Society. Dwarkesh Patel continues to be an excellent interviewer of people who have a lot to say but lack extensive media training and experience, drawing them out and getting real talk. I had to pause frequently to make sure I didn’t miss things, and ended up with 20 notes.
Second, we have Jan Leike on 80,000 hours with Robert Wiblin. Leike also recently was on AXRP, which was denser and more technical. There was a lot here that was spot on, you love to see it. In many ways, Leike clearly appreciates the nature of the problem before him. He knows that current techniques will stop working and the superalignment team will need to do better, that the problem is hard. I’ve already had a back-and-forth with Leike here, which highlights both our agreements and disagreements. This interview made me more excited and optimistic, showing signs of Leike getting more aspects of the situation and having better plans. Together they also made it much clearer where I need to dig into our disagreements in more detail, which is a place I now have the ball – it’s great that he invites criticism. In particular, I want to dig into the generation versus evaluation question more, as I think this assumption will not reliably hold.
A third great podcast, although not primarily about AI, was Conversations with Tyler. Tyler Cowen interviewing Paul Graham is self-recommending and it did not disappoint. The best moment was Tyler Cowen asking Paul Graham how to make people more ambitious, and he answered partly on the surface but far more effectively by not letting Tyler Cowen pivot to another question, showing how Tyler needed to be more ambitious. It might often be as simple as that, pointing out that people are already more ambitious than they know and need to embrace it rather than flinch.
So many other good bits. I am fascinated that both Paul and Tyler mostly know within two minutes how they will decide on talent, and their opinions are set within seven, and that when Paul changes his mind between those times it is mostly when a founder points out that Paul incorrectly rounded off what they were pitching to a dumb thing that they know is dumb, and they’re doing smart other thing instead – but I suspect that’s far more about the founder realizing and knowing this, than anything else.
Tyler asks what makes otherwise similar tech people react in different ways on AI, attempting to frame it as essentialist based on who they are, and Paul says no, it’s about what they choose to focus on, the upside or the downside. It strongly echoes their other different perspectives on talent. Tyler thinks about talent largely as a combinatorial problem, you want to associate the talented and ambitious together, introduce them, put them in the same place, lots of travel. Paul wants to identify the talent that is already there, you still need a cofounder and others to work with of course either way. I’m strongly with Paul here on the AI debate, where you land is most about focus and style of thinking. Tyler didn’t respond but I imagine him then suggesting this comes down to what a person wants to see, to circle it back. Or I would say, whether a person is willing to see, and insist on seeing the questions through. It’s mostly not symmetrical, those who see the downside almost always also see the upside but not vice versa.
Also Rob Miles’ note that building safe AI is like building a safe operating system clearly got through, which is great, and Paul has noticed the implication that the only way you can build a safe operating system in practice is to build an unsafe one and get it hacked and then patched until it stops being unsafe, and that this is not a good idea in the AI case.
A transcript and the video for the roundtable ‘Will AI Destroy us?’ featuring Eliezer Yudkowsky and Scott Aaronson. It’s fine, but duplicative and inessential.
Fun fact about podcasts, longer ones at least for 80k hours get consumed more not less.
Robert Wiblin: Apple Podcast analytics reports listeners on average consume 40%+, or even 50%+, of podcast interviews that run for over 3 hours.
(I added some random noise to the length so you can’t identify them.)
No apparent TikTokification of attention spans here!
I have noticed that the better interviews do tend to be modestly longer, my guess is that is what is causing this here. Otherwise, I definitely slowly fall off over duration, and generally I am far more likely to give a full listen to sub-hour podcasts. I would also say that this audience is unlikely to be typical.
Rhetorical Innovation
People are worried about AI, and do not trust the companies to regulate themselves.
Other data points:
Excellent showing for the American public all around.
Perhaps we are making this all too complicated? Regular people think about things in a different way, and someone else (not me!) has to speak their language. Twelve second video at link: Don’t trust it. Don’t understand it. Can’t get my hands on it. Can’t fight it. It is my enemy.
PauseAI proposes this flow chart, attempting to group objections to the pause case into four categories.
We need better critics. This does seem like a useful exercise, although I would add a fifth necessary step, which is ‘a pause might prevent this,’ with objections like ‘We have to beat China,’ ‘There is no way to pause,’ ‘Effective solutions could only be implemented by world governments’ or forms of ‘This would net increase danger.’
I would also note that while those five steps together seem sufficient, at least the first two are not necessary.
If AI was merely ‘about as smart’ as humans, while it enjoyed many other advantages like higher speed, better memory, ability to be copied and lower operating costs, it is easy to see why that poses great danger.
An explicit or deliberate AI takeover is one way things could go, but this not happening does not make us safe. If AIs outcompete humans in terms of ability to acquire resources and make copies of themselves, via ordinary competitive and capitalistic means, and putting them in control of things makes those things more competitive, they could progressively end up with more and more of the resources and control rather quickly. Competition does not need to take the form of a takeover attempt.
Andrew Critch asks a question.
Victoria Krakovna suggests using the term capabilities rather than intelligence when discussing AI risks, to avoid anthropomorphism, various misleading associations and moving goalposts, and to keep things concrete. I have been moving to using both in different contexts. I agree that often it is more useful and accurate, and less distracting or confusing, to discuss capabilities. At other times, capabilities is not a good fit for the discussion.
No One Would Be So Stupid As To
Well, not no one.
Yep.
Aligning a Smarter Than Human Intelligence is Difficult
Aligning a not-as-smart-as-humans AI?
Also difficult. Presenting the state of the art of AI alignment.
Depending on how one uses this information, it may or may not constitute Actual Progress.
If you want to understand some of the utter disaster that is trying to use RLHF, consider its impact on humans, especially when the feedback is coming from less smart humans relative to the subject, and when the feedback is not well thought out.
People Are Worried About AI Killing Everyone
SEC Chief is Worried About AI, says New York Times, presumably he is mostly worried about it selling unregistered securities or offering investment advice.
Kelsey Piper points out the obvious regarding Meta’s decision to open source Llama 2. All alignment of current models is based only on fine tuning. If you give everyone the weights, they can do their own fine tuning to reverse your fine tuning, resulting in a fully unsafe anything-goes model. Such versions were indeed put online within days of Llama 2’s release. That version, let’s say, is most definitely uncensored.
Other People Are Not As Worried About AI Killing Everyone
Alan Finkel in The Sydney Morning Herald calls for the ‘nuclear option’ as in the nuclear non-proliferation treaty, calling for all the standard regulations and warning about all the usual collective-action and bad actor worries, emphasizing that action being difficult does not mean we should give up and not try.
Except that nowhere does he mention extinction risks. Everything is phrased in terms of human misuse of AI for standard human villainy. He does correctly diagnose what must be done if the threat is intolerable. Yet all the threats he cites are highly tolerable ones, when compared to potential benefits.
The Lighter Side
Words of wisdom.
Mom, Martin Shkreli’s trying to corner a market again!
I mean, it obviously won’t work, but there are suckers born every minute.
We did it, everyone: Effective altruism.
Current AI alignment status?
Story checks out.
I do worry about him sometimes.
Others should have worried more.
Another true statement.