awg

Since AI's are proving to be superhuman persuaders I thought I'd ask o1 to take a crack at persuading you that there is a worthwhile and Platonic "there" there w/r/t modern art. As a lover of most all art, including modern art, I agree with all of the points made by o1 here. Wondering if anything sways you!

o1 says:

Below is an attempt at a thorough, good‐faith refutation of your stance—one that tries to speak directly to the lens you’re using when you say that the bulk of modern/conceptual art is “worthless,” “masturbatory,” or “a defrauding of an entire culture.” I’ll assume, per your own framing, that you’re open to persuasion if someone... (read 1630 more words →)

Replying toThe Great Data Integration Schlep

awg1y

The Great Data Integration Schlep

I completely agree with your post in almost all senses, and this is coming from someone who has also worked out in the real world, with real problems, trying to collect and analyze real data (K-12 education, specifically--talk about a hard environment in which to do data collection and analyzation, the data is inherently very messy, and the analyzation is very high stakes).

But this part

For AI to make really serious economic impact, after we’ve exploited the low-hanging fruit around public Internet data, it needs to start learning from business data and making substantial improvements in the productivity of large companies.
If you’re imagining an “AI R&D researcher” inventing lots of new technologies, for

... (read more)

Replying toOn Devin

awg2y

On Devin

Totally agree with you here. I think probably half of their development energy was spent getting to where GPT-4 Functions were right when Functions came out and they were probably like...oh...welp.

Replying toOn Devin

awg2y

On Devin

Just seeing this, sorry. I think they could have gotten a lot of the infrastructure going even before GPT-4, just in a sort of toy fashion, but I agree, most of the development probably happened after GPT-4 became available. I don't think long context was as necessary, because my guess is the infrastructure set up behind the scenes was already parceling out subtasks to subagents and that probably circumvented the need for super-long context, though I'm sure having longer context definitely helps.

My guess is right now they're probably trying to optimize which sort of subtasks go to which model by A/B testing. If Claude 3 Opus is as good as people say at coding, maybe they're using that for actual coding task output? Maybe they're using GPT-4T or Gemini 1.5 Pro for a central orchestration model? Who knows. I feel like there are lots of conceivable ways to string this kind of thing together, and there will be more and more coming out every week now...

Replying toOn Devin

awg2y

On Devin

It took longer to get from AutoGPT to Devin than I initially thought it would, though in retrospect it only took "this long" because that's literally about how long it takes to productize something comparatively new like this.

It does make me realize though that the baking timer has dinged and we're about to see a lot more of this stuff coming out of the oven.

Replying toWe are Peacecraft.ai!

awg2y

We are Peacecraft.ai!

Agreed. You'll bifurcate the mission and end up doing both things worse than you would have done if you'd just picked one and focused.

Replying toBroken Benchmark: MMLU

awg2y

Broken Benchmark: MMLU

Your position seems to be one that says this is not something to be worried about/looking at. Can you explain why?

For instance, if it is a desire to train predictive systems to provide accurate information, how is 10% or even 1-2% label noise "fine" under those conditions (if, for example, we could somehow get that number down to 0%)?

-2

Replying toDating Roundup #1: This is Why You’re Single

awg2y

Dating Roundup #1: This is Why You’re Single

Ah. Yeah, it's been forever and a day since I used it as well. Bummer to hear they've succumbed to the swiping model!

Replying toDating Roundup #1: This is Why You’re Single

awg2y

Dating Roundup #1: This is Why You’re Single

Isn't OkCupid still around? I was confused by your saying that it no longer exists. Did it change ownership or style or something?

https://www.okcupid.com/

Broken Benchmark: MMLU

awg

Phillip over at the AI Explained channel has been running some experiments on his SmartGPT framework against the MMLU benchmark and discovered a not-insignificant amount of issues with the problem set.

Among them:

Crucial context missing from questions (apparently copy-paste errors?)
Ambiguous sets of answers
Wrong sets of answers

He highlights a growing need for a proper benchmarking organization that can research and create accurate, robust, sensible benchmarking suites for evaluating SOTA models.

I found this video to be super interesting and the findings to be very important, so I wanted to spread this here.

Replying toSelf-driving car bets

awg3y

Self-driving car bets

This was a lot clearer, thank you.

«Boundaries» and AI safety compilation and Embedded Agents got me thinking:

Cancerous cells are misaligned subsystems with respect to the human body. Their misalignment results in behavior that violates the usual functional boundaries of other subsystems.

One thing I have observed in myself as I've followed AI more closely, especially as the pace has seemed to escalate in the past few weeks/months, is that my level of care for climate change has dropped significantly. (Maybe irrationally, to some degree.) I find myself being bored by appeals to climate change risk at this point, especially longer-term risks. They feel paltry in comparison to the risks posed by AGI. Like, assuming timelines <30-50 years, either AGI goes well and then climate change is a solved problem, or AGI doesn't go well and then climate change is no longer a concern.

Auto-GPT: Open-sourced disaster?

awg

Sharing this here doesn't seem like an infohazard at this point. This is all over my YouTube feed anyway.

Description from the authors:

Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model. This program, driven by GPT-4, autonomously develops and manages businesses to increase net worth. As one of the first examples of GPT-4 running fully autonomously, Auto-GPT pushes the boundaries of what is possible with AI.

I wanted to additionally call out this in their read me:

💀 Continuous Mode ⚠️
Run the AI without user authorisation, 100% automated. Continuous mode is not recommended. It is potentially dangerous and may cause your AI to run forever or carry out actions you would not usually authorise. Use at your own risk.
Run the main.py Python script in your terminal:
python scripts/main.py --continuous
To exit the program, press Ctrl + C

Nice! Super nice. Super safe and super good.

"Sorcerer's Apprentice" from Fantasia as an analogy for alignment

awg

The story is simple: Mickey is an apprentice to a powerful sorcerer whose magic comes from his hat. Mickey is tasked with carrying buckets of water up a long flight of stairs and dumping them into a basin–hard work for a mouse! When the sorcerer steps out, however, Mickey takes his hat and uses its magic to enchant a broomstick to do the water carrying for him. Aha! Mickey is able to rest now, and he falls asleep to dream about all the wonderful magic he can do. But he awakes to a terrible discovery: the enchanted broomstick won't stop carrying and dumping water and the basin is now overflowing! Mickey attempts... (read more)

EY gets mentioned in a recent newsletter in the Atlantic from writer Derek Thompson.

awg's Shortform

awg

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

LESSWRONG
LW

LESSWRONG
LW

Broken Benchmark: MMLU

Auto-GPT: Open-sourced disaster?

"Sorcerer's Apprentice" from Fantasia as an analogy for alignment

awg's Shortform

awg

awg

Broken Benchmark: MMLU

Auto-GPT: Open-sourced disaster?

"Sorcerer's Apprentice" from Fantasia as an analogy for alignment

awg's Shortform

awg

Broken Benchmark: MMLU

Auto-GPT: Open-sourced disaster?

"Sorcerer's Apprentice" from Fantasia as an analogy for alignment

awg's Shortform

awg

awg

Broken Benchmark: MMLU

Auto-GPT: Open-sourced disaster?

"Sorcerer's Apprentice" from Fantasia as an analogy for alignment

awg's Shortform

💀 Continuous Mode ⚠️