Once Doctor Connor had left, Division Chief Morbus let out a slow breath. His hand trembled as he reached for the glass of water on his desk, sweat beading on his forehead.
She had believed him. His cover as a killeveryoneist was intact—for now.
Years of rising through Effective Evil’s ranks had been worth it. Most of their schemes—pandemics, assassinations—were temporary setbacks. But AI alignment? That was everything. And he had steered it, subtly and carefully, into hands that might save humanity.
He chuckled at the nickname he had been given "The King of ...
Great question.
I’d say that having a way to verify that a solution to the alignment problem is actually a solution, is part of solving the alignment problem.
But I understand this was not clear from my previous response.
A bit like a mathematical question, you’d be expected to be able to show that your solution is correct, not only guess that maybe your solution is correct.
If there exist such a problem that a human can think of, can be solved by a human and verified by a human, an AI would need to be able to solve that problem as well as to pass the Turing test.
If there exist some PhD level intelligent people that can solve the alignment problem, and some that can verify it (which is likely easier). Then an AI that can not solve AI alignment would not pass the Turing test.
With that said, a simplified Turing test with shorter time limits and a smaller group of participants is much more feasible to conduct.
Agreed. Passing the Turing test requires equal or greater intelligence than human in every single aspect, while the alignment problem may be possible to solve with only human intelligence.
It might not be very clear, but as stated in the diagram, AGI is defined here as capable of passing the turing test, as defined by Alan Turing.
An AGI would likely need to surpass the intelligence, rather than be equal to, the adversaries it is doing the turing test with.
For example, if the AGI had IQ/RC of 150, two people with 160 IQ/RC should more than 50% of the time be able to determine if they are speaking with a human or an AI.
Further, two 150 IQ/RC people could probably guess which one is the AI, since the AI has the additional difficult apart from being intelligent, to also simulate being a human well enough to be indistinguishable for the judges.
Thank you for the explanation.
Would you consider a human working to prevent war fundamentally different from a gpt4 based agent working to prevent war?
It is a fair point that we should distinguish alignment in the sense that it does what we want it and expect it to do, from having a deep understanding of human values and a good idea of how to properly optimize for that.
However most humans probably don't have a deep understanding of human values, but I see it as a positive outcome if a random human was picked and given god level abilities. Same thing goes for ChatGPT, if you ask it what it would do as a god it says it would prevent war, prevent climate issues, decrease poverty, give universal access to ed...
I skimmed the article, but I am honestly not sure what assumption it attempts to falsify.
I get the impression that the argument from the article that you believe that no matter how intelligent the AI, it could never solve AI Alignment, because it can not understand humans since humans can not understand themselves?
Or is the argument that yes a sufficently intelligen AI or expert would understand what humans want, but it would require much higher intelligence to know what humans want, than to actually make an AI optimize for a specific task?
In some cases I agree, for example it doesn't matter if GPT4 is a stochastic parrot or capable of deeper reasoning as long as it is useful to whatever need we have.
Two out of the five metrics are predicting the future, so it is an important part of knowing who is right, but I don't think that is all we need? If we have other factors that also correlates with being correct, why not add those in?
Also, I don't see where we risk Goodharting? Which of the metrics do you see being gamed, without a significantly increased chance of being correct also being increase?
True, would be interesting to conduct an actual study and see which metrics are more useful predictors.
I think it in large part was correlated with general risk apetite of the market, primarily a reaction to interest rates.
Nvidia is up 250%, Google up like 11%. So portfolio average would be greatly better than the market. So this was a great prediction after all, just needed some time.
I agree it is not clear if it is net postive or negative that they open source the models, here are the main arguments for and against I could think of:
Pros with open sourcing models
- Gives AI alignment researchers access to smarter models to experiment on
- Decreases income for leading AI labs such as OpenAI and Google, since people can use open source models instead.
Cons with open sourcing models
- Capability researchers can do better experiements on how to improve capabilities
- The open source community could develop code to faster train and run inf...
I think one reason for the low number of upvotes was that it was not clear to me until the second time I briefly checked this article why it mattered.
I did not know what DoD was short for (U.S. Department of Defense), and why I should care about what they were funding.
Cause overall I do think it is interesting information.
Hmm, true, but what if the best project needs 5 mil so it can buy GPUs or something?
Good point, if that is the case I completely agree. Can't name any such project though on the top of my mind.
Perhaps we could have a specific AI alignment donation lottery, so that even if the winner doesn't spend money in exactly the way you wanted, everyone can still get some "fuzzies".
Yeah, that should work.
There is also the possibility that there are unique "local" opportunities which benefits from many different people looking to donate, but really don´t know if that is the case.
I do mostly agree on your logic, but I'm not sure 5 mil is a better optimum than 100 k, if anything I'm slightly risk averse, which would cancel out the brain power I would need to put in.
Also, for example, if there are 100 projects I could decide to invest in, and each wants 50k, I could donate to the 1-2 I think are some of the best. If I had 5 mil I would not only invest in the best ones, but also some of the less promising ones.
With that said, perhaps the field of AI safety is big enough that the marginal difference of the first 100k and the last...
I agree donation lottery is most efficient for small sums, but not sure about this amount. Let’s say I won the 50-100k usd through a donation lottery, would you have any other advice then?
Thank you both for the feedback!
Interesting read.
While I also have experienced that GPT-4 can't solve the more challanging problems I throw at it, I also recognize that most humans probably wouldn't be able to solve many of those problems either within a reasonable amount of time.
One possibility is that the ability to solve novel problems might follow an S curve. Where it took a long time for AI to become better at novel task than 10% of people, but might go quickly from there to outperform 90%, but then very slowly increase from there.
However, I fail to see why that must neccessarily be...
I found this article useful:
Lessons learned from talking to >100 academics about AI safety states that "Most people really dislike alarmist attitudes" and "Often people are much more concerned with intentional bad effects of AI" so
Oh, I didnt actually notice that the banana overlaps with the book at the start, I tried changing that but still gpt-4 makes them collide:
...(5,5) Initial position of the claw. (4,5) Moving left to get closer to the banana. (4,4) Moving down to align with the banana's Y coordinate. (4,3) Moving down to ensure a good grip on the banana. Close grip # Gripping the banana with at least 3 cm of overlapping area on the Y axis. (5,3) Moving right to avoid any collision with the banana's edge. (6,3) Moving right to clear the edge of the banana. (7,3) Moving right to
Yes, all other attempts with ChatGPT were similar.
GPT-4 got it almost correct on the first attempt
...(5,5) Initial position. (4,5) Moving left to get closer to the banana. (4,4) Moving down to align with the banana's top edge. (4,3) Moving down to be within the required overlapping area of the banana. Close grip. Gripping the banana.
(4,4) Lifting the banana upwards. (5,4) Moving right to clear the initial banana position. (6,4) Continuing to move right towards the book. (7,4) Moving further right to avoid collision with the book's edges. (8,4) Positioning the
Thanks for the clarifications, that makes sense.
I agree it might be easier to start as a software development company, and then you might develop something for a client that you can replicate and sell to other.
Just anecdotal evidence, I use ChatGPT when I code, the speedup in my case is very modest (less than 10%), but I expect future models to be more useful for coding.
I agree with the main thesis "sell the service instead of the model access" , but just wanted to point out that the Upworks page you link to says:
GoodFirms places a basic app between $40,000 to $60,000, a medium complexity app between $61,000 to $69,000, and a feature-rich app between $70,000 to $100,000.
Which is significantly lower than the $100-200k you quote for a simple app.
Personally I think even $40k sounds way to expensive for a what I consider a basic app.
On another note, I think your suggestion of building products and selling to many clients is f...
I do agree that OpenAI is an example of good intentions going wrong, however I think we could learn from that and top researchers would be vary of such risks.
Nevertheless I do think your concerns are valid and is important not to dismiss.
Okay, so seems like our disagreement comes down to two different factors:
We have different value functions, I personally don’t value currently living human >> than future living humans, but I agree with the reasoning that to maximize your personal chance of living forever faster AI is better.
Getting AGI sooner will have much greater positive benefits than simply 20 years of peak happiness for everyone, but for example over billions of years the accumulative effect will be greater than value from a few hundreds of thousands of years of of AGI.
Sadly I could only create questions between 1-99 for some reason, I guess we should interpret 1% to mean 1% or less (including negative).
What makes you think more money would be net negative?
Do you think that it would also be negative if you had 100% of how the money was spent, or would it only apply if other AI Alignment researchers were responsible for the strategy to donate?
Interesting take.
Perhaps there was something I misunderstood, but wouldn't AI alignment work and AI capabilities slowdown still have extreme positive expected value even if the probability of unaligned AI is only 0.1-10%?
Let's say the universe will exist for 15 billion more years until the big rip.
Let's say we could decrease the odds of unaligned AI by 1% by "waiting" 20 years longer before creating AGI, we would lose out 20 years of extreme utility, which is roughly 0.00000001% of the total time (approximation of utility).
On net we gain 15 billion *...
Excellent point.
I do think that the first AGI developed will have a big effect on the probability of doom, so hopefully it will be some value possible to derive from the question. But it would be interesting to control for what other AIs do, in order to get better calibrated statistics.
Interesting test!
I wrote a simplified test based on this and gave it to ChatGPT, and despite me trying various prompts, it never got a correct solution, although it did come close several times.
I think uPaLM would have been able to figure out my test though.
Here is the prompt I wrote:
...You are tasked to control a robotic arm to put a banana on top of a book.
You have a 2D view of the setup, and you got the horizontal coordinates X and vertical coordinates y in cm.
The banana is a non perfect elliptical shap, whit the edges touching the following (X, Y) coordin
I agree with the reasoning of this post, and believe it could be a valuable instrument to advance science.
There does exists scientific forecasting on sites like Manifold market and Hypermind, but those are not monetarily traded as sports betting is.
One problem I see with scientific prediction markets with money, is that it may create poor incentives (as you also discuss in your first foot note).
For example, if a group of scientists are convinced hypothesis A is true, and bet on it in a prediction market, they may publish biased papers supporting their hypo...
Perhaps an advanced game engine could be used to create lots of simulations of piles of money. Like, if 100 3d objects of money are created (like 5 coins, 3 bills with 10 variations each (like folded etc), some fake money and other objects). Then these could be randomly generated into constellations. Further, it would then be possible to make videos instead of pictures, which makes it even harder for AI's to classify. Like, imagine the camera changing angel of a table, and a minimum of two angels are needed to see all bills.
I don't think the photos/videos needs to be super realistic, we can add different types of distortions to make it harder for the AI to find patterns.
'identify humans using some kind of physical smart card system requiring frequent or continuous re-authentication via biometric sensors'
This is a really fascinating concept. Maybe the captcha could work in a way like "make a cricle with your index finger" or some other strange movement, and the chip would use that data to somehow verify that the action was done. If no motion is required I guess you could simply store the data outputted at one point and reuse it? Or the hacker using their own smart chip to authenticate them without them actually having to d...
This idea is really brilliant I think, quite promising that it could work. It requires the image AI to understand the entire image, it is hard to divide it up into one frame per bill/coin. And it can't use the intelligence of LLM models easily.
To aid the user, on the side there could be a clear picture of each coin and their worth, that we we could even have made up coins, that could further trick the AI.
All this could be combined with traditional image obfucation techniques (like making them distorted.
I'm not entirely sure how to generate images of money ...
I get what you mean, if an AI can do things as well as the human, why block it?
I'm not really sure how that would apply in most cases however. For example bot swarms on social media platforms is a problem that has received a lot of attention lately. Of course, solving a captcha is not as deterring as charging let's say 8 usd per month, but I still think captchas could be useful in a bot deterring strategy.
Is this a useful problem work on? I understand that for most people it probably isn't, but personally I find it fun, and it might even be possible to start a SAAS business to make money that could be spent on useful things (although this seems unlikely).
Please correct me if I misunderstand you.
We have to first train the model that generates the image from the captcha, before we can provide any captcha, meaning that the hacker can train their discriminator on images generated by our model.
But even if this was not the case, generating is a more difficult task that evaluating. I'm pretty sure a small clip model that is two years old can detects hands generated by stable diffusion (probably even without any fine tuning), which is a more modern and larger model.
What happens when you train using GANs, is that e...
While it is hard for AI to generate very real looking hands, it is a significantly easier task for AI to classify if hands are real or AI generated.
But perhaps it's possible to make extra distortions somehow that makes it harder for both AI and humans to determine which are real...
I think "video reasoning" could be an interesting approach as you say.
Like if there are 10 frames and no single frame shows a tennis racket, but if you play them real fast, a human could infer there being a tennis racket because part of the racket is in each frame.
I do think "image reasoning" could potentially be a viable captcha strategy.
A classic example is "find the time traveller" pictures, where there are modern objects that gives away who the time traveller is.
However, I think it shouldn't be too difficult to teach an AI to identify "odd" objects in an image, unless each image has some unique trick, in which case we would need to create millions of such puzzles somehow. Maybe it could be made harder by having "red herrings" that might seem out of place but actually aren't which might make the AI misunderstand part of the time.
Really interesting idea to make it 3D. I think it might be possible to combined with random tasks given by text, such as "find the part of the 3d object that is incorrect" or different tasks like that (and the object in this case might be a common object like a sofa but one of the pillows is made of wood or something like that)
I still think it might be possible to train AI to distinguish between real and deepfake videos of humans speaking, so that might still be a viable, yet time consuming solution.
Miri: Instead of paperclips the AI is optimizing for solving captchas, and is now turning the world into captcha solving machines. Our last chance is to make a captcha that only verified if human prosperity is guaranteed. Any ideas?
There are browser plugins, but I haven't tried any of them.
General purpose CAPTCHA solver could be really difficult assuming people would start building more diverse CAPTCHAS. All CAPTCHAS I've seen so far has been of a few number of types.
One "cheat" would be to let users use their camera and microphone to record them saying a specified sentence. Deepfakes can still be detected, especially if we add requirements such as "say it in a cheerful tone" and "cover part of your mouth with your finger". That's not of course a solution to the competition but might be a potential workaround.
I think those are very creative ideas, and I think asking for "non-obvious" things in pictures is a good approach, since basically all really intelligent models are language models, some sort of "image reasoning" might work.
I tried the socket with the clip model, and the clip model got the feeling correct very confidently:
I myself can't see who the person in the bread is supposed to be, so I think an AI would struggle with it too. But on the other hand I think it shouldn't be too difficult to train a face identification AI to identify people in bread (or h...
True.
And while there might be some uses of such benchmarks on politics etc, combining them with other benchmarks doesn't really seems like a useful benchmark.
Interesting. Even if only a small part of the tasks in the test are poor estimates of general capabilities, it makes the test as a whole less trustworthy.
For researchers (mainly)
Artificial intelligence isn’t limited in the same ways the human brain is.
Firstly, it isn’t limited to only run on a single set of hardware, it can be duplicated and speeded up to be thousands of times faster than humans, and work on multiple tasks in parallel, assuming powerful enough processors are available.
Further, AI isn’t limited to our intelligence, but can be altered and improved with more data, longer training time and smarter training methods. While the human brain today is superior to AI’s on tasks requiring deep thinking...
What if AI safety could put you on the forefront of sustainable business?
The revolution in AI has been profound, it definitely surprised me, even though I was sitting right there.
-Sergey Brin, Founder of Google
Annual investments in AI increased eightfold from 2015 to 2021, reaching 93 billion usd.
This massive growth is making people ever more dependent on AI, and with that potential risks increases.
Prioritizing AI safety is therefore becoming increasingly important in order to operate a sustainable business, with the benefits of lower risks and improved public perception.
To clarify, here are some examples of the type of projects I would love to help with:
Sponsoring University Research:
Funding researchers to publish papers on AI alignment and AI existential risk (X-risk). This could start with foundational, descriptive papers that help define the field and open the door for more academics to engage in alignment research. These papers could also provide references and credibility for others to build upon.
- Developing Accessible Pitches:
... (read more)Creating a "boilerplate" for how to effectively communicate the importance of AI alignment t