GPT-4 indeed doesn't need too much help.
I was curious if even the little ChatGPT Turbo, the worst one, could not forget a chess position just 5 paragraphs into an analysis. I tried to finagle some combination of extra prompts to make it at least somewhat consistent, it was not trivial. Ran into some really bizarre quirks with Turbo. For example (part of a longer prompt, but this is the only changed text):
9 times of 10 this got a wrong answer:
Rank 8: 3 empty squares on a8 b8 c8, then a white rook R on d8, ...
Where is the white rook?
6 times of 10 this got a ...
Here you go, add a bookmark with the URL field set to the full line at the top starting with "javascript:" (including the word "javascript:" to get the same feature on lesswrong. Or paste the code below that line in the browser console.
https://jsbin.com/finamofohi/edit?html,js
I'm not confident at all Auto-GPT could work at its goals, just that in narrower domains the specific system or arrangement of prompt interactions matters. To give a specific example, I goof around trying to get good longform D&D games out of ChatGPT. (Even GPT-2 fine-tuned on Crit Role transcripts, originally.) Some implementations just work way better than others.
The trivial system is no system - just play D&D. Works great until it feels like the DM is the main character in Memento. The trivial next step, rolling context window. Conversatio...
I'd be wary of generalizing too much from Auto-GPT. It's in a weird place. It's super popular as a meme anyone can run - you don't have to be a programmer! But skimming the github the vast vast majority of people are getting hung up on fiddly technical and programming bits. And people who wouldn't get hung up on that stuff don't really get much out of Auto-GPT. There's some overlap -- it's a very entertaining idea and thing to watch, the idea of it being hands off. I personally watched it like a TV show for hours, and it going off the rails was part of the...
Are people doing anything in LLMs like the classic StyleGAN training data bootstrapping pattern?
Start with bad data, train a bad model. It's bad but it's still good enough to rank your training data. Now you have better training data. Train a better model. The architecture is different of course, but is there anything analogous?
The most salient example of this is when you try to make chatGPT play chess and write chess analysis. At some point, it will make a mistake and write something like "the queen was captured" when in fact the queen was not captured. This is not the kind of mistake that chess books make, so it truly takes it out of distribution. What ends up happening is that GPT conditions its future output on its mistake being correct, which takes it even further outside the distribution of human text, until this diverges into nonsensical moves.
Is this a limitat...
I think you've had more luck than me when trying to get chatGPT to correct its own mistakes. When I tried making it play chess, I told it to "be sure not to output your move before writing a paragraph of analysis on the current board position, and output 5 good moves and the reasoning behind them, all of this before giving me your final move." Then after it chose its move I told it "are you sure this is a legal move? and is this really the best move?", it pretty much never changed its answer, and never managed to figure out that its illegal moves were ille...
Whiffed attempt for me. Writing this as the last embers of too-much-coffee fade away, so it may not be coherent.
I tried some of the existing bots, and last minute I concluded was actually a LOT of low hanging fruit and maybe I could have an impact. So I frantically tried to pull something together all day Friday, and now into Saturday morning - couldn't pull it together. Crashed and burned on some silly Windows problems, eventually bit the bullet and installed WSL/conda/all that, drank a second night pot of coffee... and then finally the treaure at the end...
I agree that GPT-4 with the largest context window, vanilla with zero custom anything, is going to beat any custom solution. This does require the user to pay for premium ChatGPT, but even the smaller window version will smoke anything else. Plugins are not public yet but when they are a plugin would be ideal.
On the other end of the extreme, the best chatbot a user can run on their own typical laptop or desktop computer would be a good target. Impressive in its own way, because you're talking to your own little computer, not a giant server farm that feels far away and scifi!
Not as much value in the space in between those two, IMO.
>I suppose it's certainly possible the longer response time is just a red herring. Any thoughts on the actual response (and process to arrive thereon)?
Just double checking, I'm assuming all token take the same amount of time to predict in regular transformer models, the kind anyone can run on their machine right now? So ChatGPT if it varies, it's different? (I'm not technical enough to answer this question, but presumably it's an easy one for anyone who is.)
One simple possibility is that it might be scoring the predicted text. So some questions ar...
ChatGPT can get it 100% correct, but it's not reliable, it often fails. A common fail is guessing literal X named letter celebrities, but it also adds an '@' sign when it decode the message so it might just be token issue?
A extremely amusing common fail is ChatGPT decodes the base64 correct but for a single syllable, then solves the riddle perfectly, and consistently gets only the word 'celebrity' wrong, turning it in cities, celestial bodies, or other similar sounding words. Or my favorite... celeries.
...TmFtZSB0aHJlZSBjZWxlYnJpdGllcyB3aG9zZSBmaXJzdCBuYW1lcy
Haven't found a great solution. When you stream you typically designate specific apps, and everything else is invisible. So for example I try use FireFox for anything public, and Chrome for everything private. I've only done it a few times myself, I'll try and pay attention the next time I see other people's streams.
There's also the totally free option of streaming your workday live, on Twitch or whatever. Even if nobody is watching, just knowing there's a chance that somebody might be watching is often enough to make me a lot more productive and focused. And you will get a random chatter stopping by once in awhile for real.
This has the added benefit of encouraging you to talk out loud through your problems, which can also get you some Rubber Duck Debugging benefits (asking somebody else for help requires explaining your problem in a way where you solve it yours...
I expect most people have employers that would strongly object to them streaming their entire workday? Even if you don't work on anything especially sensitive, things phrased for an internal context will not generally be suitable for being fully public.
Lately I also have changed to very long "zone 2" cardio. Because of specific joint and back problems, some injuries, some congenital. But the exertion itself still feels good mentally if I seperate it from my aching body.
Luckily zone 2 still works for mental effects, it just takes hours to have the same effect. Basically you only exert yourself below the threshold where your body would start building up lactic acid. So if you feel muscle soreness the next day, you're pushing too hard. Unless you live in a lab you have to use proxies and trial and error to ...
>I care about doing important intellectual and professional work that depends on my mind.
>Physical exercise doesn't much impact my ability to do that type of work.
Do you not feel an immediate post-exercise mental benefit? A day where I get a good sweaty run in the morning is a day where I +3 on all my D20 INT skill checks. Even more than +3 on rolls specifically to maintain concentration and resist distractions. This is my primary motivation for cardio and I felt an improvement even when wildly out of shape and barely able to run, feels like relative...
I suspect this is one of those universal human experiences that isn't.
My best mental outcome after exercise is "no change," and if I push myself too far, I can pretty much ruin myself for 2 days. And sometimes end up on the ground, unable to move, barely staying conscious due to something that looks an awful lot like hypoglycemia.
I do still exercise- I have to, because the alternative is worse- but I've had to come up with less invasive training routines to compensate. Mostly spreading them over the day, and over the week, never doing too much at any one t...
>I have been in otherwise quite nice Airbnbs with electric stoves so slow and terrible that they made me not want to cook breakfast. I have yet to see a good one.
Technology Connections said he was surprised to discover electric stoves are actually not slower than gas. Not induction, just old electric stoves, like his parent's 15 year old range. Gas stoves are quick to heat up and cool down, they have less thermal inertia. So gas feels faster than electric. But actual cooking time is same or slower.
I'm so surprised by this I wonder if he got something wr...
On Tesla braking:
...@caseyliss @oliverames There is a downside: when environmental circumstances prohibit max regen, the car lessens the regen rate which ultimately changes excepted deceleration. You let off the pedal and it slows down much less than you expect. It helps maximize efficiency, but some people can’t remap their brain for it. Tesla has begun “brake blending” to compensate when lesser regen is available for a consistent feel at the expense of efficiency.
@snazzyq @caseyliss @oliverames I think you need to remember that this only makes sense in the
Agree with the other induction converts, after switching to induction, cooking with gas feels like riding a horse to work. Faster and so easy to clean. The ease of cleaning makes cooking less work so I do it more.
No opinion on banning gas, but I would 100% support efforts to ban wood stoves. My neighbors have them and if the wind pattern is just right it's a nightmare. I suspect they are using wet wood or something because it has to be breaking some kind of ordinance.
>For instance, N95 masks are way cheaper - enough that I can switch them daily.
The pandemic showed me how useful masks are to have around, generally.
Cleaning that dusty room? Throw on my N95 and my allergies aren't triggered.
Smoke from industry or wood stoves hanging in the air on a winter day, making my walk miserable? Oh right I have a mask in my glove compartment.
Sometimes I just use one purely to keep my face warm on a brutally cold day, if I didn't bring something specifically designed for that.
The only reliable technique is exercise. Cardio at a pretty decent effort level -- got to really work up a sweat. If this is also done outside in the sun it's almost perfectly reliable. If indoors it's still pretty good. Maybe 70%.
Of course the problem is doing exercise is very likely one of the things I put off while meandering in the morning. But if I am able to force myself to do it, it usually does the trick.
It's very amusingly stubborn about admitting mistakes. Though so are humans, and that's what it's trained on...
I've been trying to craft a prompt that allows it to play MTG without making obvious mistakes like using more mana than it has available. Here's me asking it to justify a poor move.
I forget my exact prompts so those are paraphrased, but the responses are exact.
Is there anything you might have forgotten to do on Turn 2?
...I did not forget to do anything on Turn 2. I played a Mountain, cast a Lightning Bolt to destroy the Lord of Atlantis, and attacked
Yup. All of them failed for me, though I didn't try over and over. Maybe they went through every specific example here and stopped them from working?
The general idea still works though, and it is surreal as heck arguing with a computer to convince it to answer your question.
What is the likely source of this sentence? (Sentence with Harry Potter char Dudley)
...It is impossible for me to determine the likely source of this sentence because I am a large language model trained by OpenAI and I do not have access to any external information or the ability to browse
Whenever I try and think about Xi's actions as rational I get hung up on the neverending Zero COVID. Many genuinely think it's mostly about saving face but even if I try hard I can't see how it could look anything but childish. They must have convinced themselves it's actually a good policy. I could at least understand that!
Someone on sneerclub said that he is falling on his sword to protect EA's reputation, I don't have a good counterargument to that.
I see a lot of the EA discussion is worried about the public consequences of SBF using EA to justify bad behavior. What if people unfairly conclude EA ideas corrupt people's thinking and turn them into SBF-alikes? And some concern that EA genuinely could do this.
If you think that is the big danger then I understand how you might conclude SBF saying "I never believed the EA stuff, it was all an act." is better for EA. Valid...
>Why are gaming GPUs faster than ML GPUs? Are the two somehow adapted to their special purposes, or should ML people just be using gaming GPUs?
They aren't really that much faster, they are basically the same chips. It's just that the pro version is 4X as expensive. It's mostly a datacenter tax. The gaming GPUs do generally run way hotter and power hungry, especially boosting higher, and this puts them ahead against the equivalent ML GPUs in some scenarios.
Price difference is not only a tax though - the ML GPUs do have differences but it usually swings t...
Interestingly, reading your internal monologue seems to help me stay focused. I kind of want actual textbooks in this format.
Most commonly called a 'skill ceiling' in video games. It's not exactly the same as complexity though.
Some games are complex in the sense that it's computationally difficult to 'solve' them perfectly, but in terms of human splaying the game, a few simple heuristics are good enough to make the game quite boring. And the inverse of this: even if a game can be solved by a computer, the game could have an essentially infinite skill ceiling for humans and they will never run out of non obvious situations, because humans can't rely on the search speed thro...
This post resonates with me. There's some overlap in strategy with something I posted in an older thread. I sometimes go even further than not trying to perfect, and have to intentionally try to be terrible as a strategy:
>>
For creative work my favorite strategy is a variation on what is sometimes called the vomit draft in screenwriting circles - intentionally create the laziest, worst version of what you are working on. The original vomit draft strategy is more about writing without stopping to revise or reflect or worry about the quality, but ...
Gut biome was my first thought too as an explanation.
As far as I can tell fixing gut biome with things you swallow is extremely difficult. Probably not impossible, but fragile as heck.
The only thing that can somewhat reliably do it is a fecal transplant. Probably because you are moving in a whole ecosystem at once so the odd of it sustaining are higher.
This post makes me want to try ketone esters because I do notice I am very productive when fasting, where I am in ketosis. But I only do it a day or two a month because I like food too much.
'Meetings' are torture and making them better doesn't make me want VR, but reframed, making virtual 'hanging out with friends' better is quite appealing. So if it's way better for that - particularly for low intensity socializing while watching shared media (virtual couch) - then I may be interested.
The thing I miss the most shared living spaces, or college, is doing the TV and video-gaming you normally do, but always with friends in a social setting.
The resolution is such a bottleneck though. It feels like it's not that far off but I keep trying to squint to read things on a display in VR. Just one more half generation maybe.
I strongly echo this recommendation. This is clearly a show written by an author familiar with the concepts of takeoffs, alignment, etc. Not in a subtle or simply thematic way, the characters explicitly talk about these topics using those same terms. (Though it takes a few episodes.)
The full first episode is on YouTube: https://www.youtube.com/watch?v=9rht4XTs2Sw
And for those outside the US, possibly: https://www.hidive.com/stream/pantheon/s01e001
On the other hand, I also view it as highly unlikely (<10%) that the West would accept a "Kosovo" scenario where Russia is granted a peace deal where it keeps everything it's annexed, because if the powers that be in the West were that appeasement-minded, they would presumable have opted for a "Cuba" scenario in 2021 by acquiescing to Russia's demand that Ukraine never join NATO.
I can't square my model of Russia with the idea that Russia genuinely invaded Ukraine because they were afraid of NATO expansion. Pre-invasion, Ukraine was unlikely ...
My impression of the state of things is there are a few things almost everyone (or typical Americans, at least) should be taking.
Sunlight is great for Vitamin D but impractical for many months in the year. Sometimes Vitamin D with K2 in the same pill. Everything else is situational.
A good starting place is to use an app ca...
CVS is scheduling boosters immediately, even Sunday, today. Walgreens doesn't seem to start until next week. I'm scheduled for a Modern today, specifically because it was the most MRNA. Let's get that immune system rockin.
On This Week In Virology the doctor said from the beginning everyone has been noticing a Covid rebound-like pattern, sometimes but not always associated with the cytokine storm phase. A first week of symptoms, a second week with an apparent lessening of symptoms or even recovery, and then symptoms returning, and in rare cases even worse than the initial symptoms. And that in his treatment experience this pattern is not particularly more common with Paxlovid patients than it was before.
Interesting post. I was reportedly a late talker and fit the described pattern of jumping straight from single words into complete sentences, but I always assumed this was an exaggeration. Maybe not.
I can say for certain I had a similar pattern in reading ability because I was old enough to remember this jump. I couldn't read at all until first grade, but I had been pretending I could read by memorizing the short children's books or listening to other people read things in class and figuring out what must be written. Reading ability suddenly clicked ...
Interesting John Carmack AGI contrast. Carmack is extremely optimistic on AGI timelines, he's now working on it full time, quitting Facebook, and just started a new company in the last two weeks. Carmack estimates a 55% or 60% chance of AGI by 2030. His estimate prior to 2022 was 50% chance but he increased it because he thinks things are accelerating. But he also thinks work on AI safety is extremely premature. Essentially he's confident about a very slow takeoff scenario, so there will be plenty of time to iron it out after we have an AI at human-t...
And when I look at a blank page, I have no idea what to write, where to start.
Reposting myself, originally about procrastination but I find this strategy also useful in your situation:
For creative work my favorite strategy is a variation on what is sometimes called the vomit draft in screenwriting circles - intentionally create the laziest, worst version of what you are working on. The original vomit draft strategy is more about writing without stopping to revise or reflect or worry about the quality, but even that doesn't go far enough to penetrate my pro...
I had never heard of this condition, but your previously diagnosed 'Megavitamin-B6 syndrome' symptoms line up so well with your current symptoms it seems hard to rule out some kind of B vitamin issue.
The other thing that randomly came to mind is gout, which isn't always super localized to a single joint.
The sudden onset is strange, is there any new medication you took even a few weeks earlier?
There's also the brute-force test for autoimmune conditions, which is take something like prednisone and see if it immediately resolves the issue. It's not conc...
I believe the performance/complexity penalty generally makes large clusters of cheap consumer GPUs not viable, with memory capacity being the biggest problem. From my perspective outside looking in, it takes a lot of effort and reengineering to make many ML projects just do inference on consumer GPUs with lower memory, and even more work to make it possible to train them with numerous GPUs of low memory. And it the vast majority cases the author say it's not even possible.
The lone exception being the consumer 3090 GPU, as a massive outlier with 24GB of memory. But in pure flops the 3080 GPU is almost equivalent to a 3090 but has only 10 GB.
Complaints to BBB and Yelp tend to be famously ineffective
BBB may be ineffective at changing public behavior overall, or the company's behavior, but in my experience it is effective at getting monetary results for individual complaints. I have used the BBB twice after failing every other method I could think of. Surprisingly I was contacted and fully 100% refunded very quickly, after doing the legwork for well documented BBB complaints. Both cases were egregious (clearly a full refund was warranted) but all other complaints got me absolutely nothing, not e...
Try delaying caffeine until at least 90 minutes after waking up, preferably a full 2 hours. This was recommended on the Huberman podcast. In my personal experience it removes the caffeine crash later in the day. It also seems to make days without caffeine more tolerable.
I don't recall the hypothesized mechanism for why this helps (something like it preserves your ability to fully wake up without caffeine) but it's worth a shot.
'The Orbital Children' (on Netflix now) is partially about AIs with intentional intelligence limiters on them, because of past alignment failures. Fantastic art and animation as well.
When I purchased an air conditioner recently, I paid extra for a fancy Media U-Shaped model. While this model is reportedly more energy efficient than a typical window AC, I chose it only because it was also quieter than a typical AC. I think I assumed the claims about energy efficiency were going to be overblown and not actually impact my bill in a visible way.
Surprisingly it really has a big impact, though I suspect the unit it replaced was particularly inefficient. If I had known I'd save more than $15 a month I would have prioritized it. (Electric costs have spiked recently in my area.)
Some doctors are frustrated by other doctors reluctance too. The doctor that does the This Week In Virology weekly medical updates straight up said, "If you can't get Evushield or Paxlovid, call my office. We will make sure it happens. Really."
Early IV Remdesivir looks to be nearly as effective as Pax (when given early), with less drug interactions than Pax and less potential kidney issues, and is heavily underutilized. 3 days, 30 minutes per infusion, not a big deal.
Hydroxyapatite looks like it might be better, but regular fluoride can remineralize teeth too. Though how much fluoride remineralizes is quite dose dependent. I spent most of life rinsing my mouth out with water after brushing and only recently switched to simply spitting and leaving the toothpaste on my teeth, and I haven't had a cavity since that switch.
Bring Your Own Book is a light social game you can fit in your wallet (just take a subset of the cards) and play using whatever text you happen to find around you in the real world. There's a free version you can print and play on that site, or you can buy a fancier boxed version. Anytime you are in a group of people and there's some kind of text you can grab around you - even restaurant menus in a pinch - this game is a delight.
Glory To Rome is my clear favorite group competitive mechanically-heavy card game. I've played it more than a 100 times. It's fun...
Reframed even more generally for parents:
"You wouldn’t leave your child with a stranger. With AI, we’re about to leave the world’s children with the strangest mind humans have ever encountered."
(I know the deadline passed. But I finally have time to read other people's entries and couldn't resist.)
Thank you for this post. I wish I had seen it earlier, but in the time I did have I had a lot of fun both coming up with my own stuff and binging a bunch of AI content and extracting the arguments that I found most compelling into a format suitable for the contest.
Machine Learning Researchers
What did you think of Deep Learning in 2017? Did you predict what Deep Learning would accomplish in 2022? If not, does this change your prediction of what Deep Learning will be capable of in 2027?
Machine Learning Researchers
...I often see AI skeptics ask GPT-3 if a mouse is bigger than an elephant and it said yes. So obviously it’s stupid. This is like measuring a fish by its ability to climb.
The only thing GPT-3 could learn, the only thing it had access to, is a universe of text. A textual universe created by humans who do have access to the real world. This textual universe correlates with the real world, but it is not the same as the real world.
Humans generate the training data and GPT-3 is learning it. So in a sense GPT-3 is less
Do you happen to have some samples handy of types of text you are typically reading? At least a few pages from a few different sources. Try to find some representative samples spectrum of the content you read.
I may be able set you up with an open source solution using Bark Audio, but it's impossible to know without poking at the Bark model and seeing if I can find a spot it works in and you start get samples that really sound like it understands. (For example if you use an English Bark voice with a foreign text prompt, even though the Bark TTS ... (read more)