We do this by performing standard linear regression from the residual stream activations (64 dimensional vectors) to the belief distributions (3 dimensional vectors) which associated with them in the MSP.
I don't understand how we go from this to the fractal. The linear probe gives us a single 2D point for every forward pass of the transformer, correct? How do we get the picture with many points in it? Is it by sampling from the transformer while reading the probe after every token and then putting all the points from that on one graph?
Is this result equiva...
Epistemic status: I'm not familiar with the technical details of how LMs work, so this is more word association.
You can glide along almost thinking "a human wrote this," but soon enough, you'll hit a point where the model gives away the whole game. Not just something weird (humans can be weird) but something alien, inherently unfitted to the context, something no one ever would write, even to be weird on purpose.
What if the missing ingredient is a better sampling method, as in this paper? To my eye, the completions they show don't seem hugely better....
How many of the decision makers in the companies mentioned care about or even understand the control problem? My impression was: not many.
Coordination is hard even when you share the same goals, but we don't have that luxury here.
An OpenAI team is getting ready to train a new model, but they're worried about it's self improvement capabilities getting out of hand. Luckily, they can consult MIRI's 2025 Reflexivity Standards when reviewing their codebase, and get 3rd-party auditing done by The Actually Pretty Good Auditing Group (founded 2023).
Current OpenAI ...
TL;DR: Thought this post was grossly misleading. Then I saw that the GPT3 playground/API changed quite a lot recently in notable and perhaps worrying ways. This post is closer to the truth than I thought but I still consider it misleading.
Initially strongly downvoted since the LW post implies (to me) that humans provide some of the GPT3 completions in order to fool users into thinking it's smarter than it is. Was that interpretation of your post more in the eye of the beholder?
Nested three layers deep is one of two pieces of actual evidence:
...InstructGPT is
I wonder if, in that case, your brain picks the stopping time, stopping point or "flick" strength using the same RNG source that is used when people just do it by feeling.
What if you tried a 50-50 slider on Aaronson's oracle, if it's not too exhausting to do it many times in a row? Or write down a sequence here and we can do randomness tests on it. Though I did see some tiny studies indicating that people can improve at generating random sequences.
Hm, could we tell apart yours and Zack's theories by asking a fixed group of people for a sequence of random numbers over a long period of time, with enough delay between each query for them to forget?
This occurred to me, but I didn't see how it could work with different ratios. I guess if you have a sample from a variable with a big support (> 100 events) that's uniformly distributed, that would work (e.g. if x is your birth date in days, then x/365 < 20 would work).
It would be interesting to test this with a very large sample where you know a lot of information about the respondents and then trying to predict their choice.
Here's an Android game that works like Zendo but has colorful caterpillars, might be great for kids: https://play.google.com/store/apps/details?id=org.gromozeka1980.caterpillar_logic
What would be the physical/neurological mechanism powering ego depletion, assuming it existed? What stops us from doing hard mental work all the time? Is it even imaginable to, say, study every waking hour for a long period of time, without ever having an evening of youtube videos to relax? I'm not asking what the psychology of willpower is, but rather if there's a neurology of willpower?
And beyond ego depletion, there's a very popular model of willpower where the brain is seen as a battery, used up when hard work is being done and charged when relaxing. I...
That is indeed very low weight. My prior is pretty shaky as-is, but that evidence shouldn't move it much.
I thought about priming a lot while reading. Many of the results he lists are similar to priming, but priming being false doesn't mean all results similar to it are false. One could consider a broader hypothesis encompassing all that, namely "humans can be influenced by subtle clues to their subconsciousness to a significant degree". That's the similarity I see with priming, both it and many of Caldini's hypothesis follow from this premise. Th...
I have a neat idea for a smartphone app, but I would like to know if something similar exists before trying to create it.
It would be used to measure various things in one's life without having to fiddle with spreadsheets. You could create documents of different types, each type measuring something different. Data would be added via simple interfaces that fill in most of the necessary information. Reminders based on time, location and other factors could be set up to prompt for data entry. The gathered data would then be displayed using various graphs and c...
Cialdini? I'm finishing "Influence" right now. I was extra skeptical during reading it since I'm freshly acquainted with the replication crisis, but googling each citation and reading through the paper is way too much work. He supports many of his claims with multiple studies and real-life anecdotes (for all that's worth). Could you point me to the criticism of Cialdini you have read?
The SSC article about omega-6 surplus causing criminality brought to my attention the physiological aspect of mental health, and health in general. Up until now, I prioritized mind over body. I've been ignoring the whole "eat well" thing because 1) it's hard, 2) I didn't know how important it was and 3) there's a LOT of bullshit literature. But since I want to live a long life and I don't want my stomach screwing with my head, the reasonable thing to do would be to read up. I need book (or any other format, really) recommendations on nutrition 1...
I have two straight-forward empirical questions for which I was unable to find a definitive answer.
1) Does ego depletion exist? There was a recent meta-study that found a negligible effect, but the result is disputed.
2) Does visualizing the positive outcome of a endeavor help one achieve it? There are many popular articles confirming this, but I've found no studies in either direction. My prediction is no, it doesn't, since the mind would feel like it already reached the goal after visualizing it, so no action would be taken. It has been like this in my personal experience, although inferring from personal experience is incredibly unreliable.
Depending on where you are in your life and education, you could consider enrolling in graduate school.
If I've managed to translate "graduate school" to our educational system correctly, then I currently am in undergraduate school. Our mileages vary by quite a bit, most people I meet aren't of the caliber. Also, it's hard to find out if they are. Socially etiquette prevents me from bringing up the heavy hitting topics except on rare occasions.
I guess I should work on my social skills then cast a bigger net. The larger the sample, the better od...
I'm not 100% clear as to where the non-ambitious posts should go, so I will write my question here.
Do you know of a practical way of finding intellectual friends, so as to have challenging/interesting conversations more often? Not only is the social aspect of friendship in general invaluable (of course I wouldn't be asking here if that was the sole reason), but I assume talking about the topics I care and think about will force me to flesh them out and keep me closer to Truth, and is a great source of novelty. So, from a purely practical standpoint (althou...
Yep, that's what I was trying to describe as well. Thanks!