Thirder here (with acknowledgement that the real answer is to taboo 'probability' and figure out why we actually care)
The subjective indistinguishability of the two Tails wakeups is not a counterargument - it's part of the basic premise of the problem. If the two wakeups were distinguishable, being a halfer would be the right answer (for the first wakeup).
Your simplified example/analogies really depend on that fact of distinguishability. Since you didn't specify whether or not you have it in your examples, it would change the payoff structure.
I'll al...
Thanks for sharing that study. It looks like your team is already well-versed in this subject!
You wouldn't want something that's too hard to extract, but I think restricting yourself to a single encoder layer is too conservative - LLMs don't have to be able to fully extract the information from a layer in a single step.
I'd be curious to see how much closer a two-layer encoder would get to the ITO results.
:Here's my longer reply.
I'm extremely excited by the work in SAEs and their potential for interpretability, however I think there is a subtle misalignment in the SAE architecture and loss function, and the actual desired objective function.
The SAE loss function is:
, where is the -Norm.
or
I would argue that, however, what you are actually trying to solve is the sparse coding problem:
where, imp...
This is great work. My recommendation: add a term in your loss function that penalizes features with high cosine similarity.
I think there is a strong theoretical underpinning for the results you are seeing.
I might try to reach out directly - some of my own academic work is directly relevant here.
This is one of those cases where it might be useful to list out all the pros and cons of taking the 8 courses in question, and then thinking hard about which benefits could be achieved by other means.
Key benefits of taking a course (vs. Independent study) beyond the signaling effect might include:
But these depend on the courses and your personality. The precommi...
Instead of demanding orthogonal representations, just have them obey the restricted isometry property.
Basically, instead of requiring , we just require .
This would allow a polynomial number of sparse shards while still allowing full recovery.
I think the success or failure of this model really depends on the nature and number of the factions. If interfactional competition gets too zero-sum (this might help us, but it helps them more, so we'll oppose it) then this just turns into stasis.
During ordinary times, vetocracy might be tolerable, but it will slowly degrade state capacity. During a crisis it can be fatal.
Even in America, we only see this factional veto in play in a subset of scenarios - legislation under divided government. Plenty of action at the executive level or in state governments don't have to worry about this.
You switch positions throughout the essay, sometimes in the same sentence!
"Completely remove efficacy testing requirements" (Motte) "... making the FDA a non-binding consumer protection and labeling agency" (Bailey)
"Restrict the FDA's mandatory authority to labeling" logically implies they can't regulate drug safety, and can't order recalls of dangerous products. Bailey! "... and make their efficacy testing completely non-binding" back to Motte again.
"Pharmaceutical manufactures can go through the FDA testing process and get the official “approved’ label i...
This is a Motte and Bailey argument.
The Motte is 'remove the FDAs ability to regulate drugs for efficacy'
The Bailey is 'remove the FDAs ability to regulate drugs at all'
The FDA doesn't just regulate drugs for efficacy, it regulates them for safety too. This undercuts your arguments about off-label prescriptions, which were still approved for use by the FDA as safe.
Relatedly, I'll note you did not address Scott's point on factory safety.
If you actually want to make the hardline position convincing, you need to clearly state and defend that the FDA should not regulate drugs for safety.
The differentiation between CDT as a decision theory and FDT as a policy theory is very helpful at dispelling confusion. Well done.
However, why do you consider EDT a policy theory? It's just picking actions with the highest conditional utility. It does not model a 'policy' in the optimization equation.
Also, the ladder analogy here is unintuitive.
This doesn't make sense to me. Why am I not allowed to update on still being in the game?
I noticed that in your problem setup you deliberately removed n=6 from being in the prior distribution. That feels like cheating to me - it seems like a perfectly valid hypothesis.
After seeing the first chamber come up empty, that should definitively update me away from n=6. Why can't I update away from n=5 ?
Counterpoint, robotaxis already exist: https://www.nytimes.com/2023/08/10/technology/driverless-cars-san-francisco.html
You should probably update your priors.
Nope.
According to the CDC pulse survey you linked (https://www.cdc.gov/nchs/covid19/pulse/long-covid.htm) the metrics for long covid are trending down. This includes: currently experiencing, any limitations, and significant limitations categories.
I agree that the type of rationalization you've described is often practically rational. And it's at most a minor crime against epestemic rationality. If anything, the epestemic crime here is not anticipating that your preferences will change after you've made a choice.
However, I don't think this case is what people have in mind when they critique rationalization.
The more central case is when we rationalize decisions that affect other people; for example, Alice might make a decision that maximizes her preferences and disregards Bob's, but after the fact s...
I don't see how this is more of a risk for a shutdown-seeking goal, than it is for any other utility function that depends on human behavior.
If anything, the right move here is for humans to commit to immediately complying with plausible threats from the shutdown-seeking AI (by shutting it down). Sure, this destroys the immediate utility of the AI, but on the other hand it drives a very beneficial higher level dynamic, pushing towards better and better alignment over time.
That assumption literally changes the nature of the problem, because the offer to bet, is information that you are using to update your posterior probability.
You can repair that problem by always offering the bet and ignoring one of the bets on tails. But of course that feels like cheating - I think most people would agree that if the odds makers are consistently ignoring bets on one side, then the odds no longer reflect the underlying probability.
Maybe there's another formulation that gives 1:1 odds, but I can't think of it.
To the second point, because humans are already general intelligences.
But more seriously, I think the monolithic AI approach will ultimately be uncompetitive with modular AI for real life applications. Modular AI dramatically reduces the search space. And I would contend that prediction over complex real life systems over long-term timescales will always be data-starved. Therefore being able to reduce your search space will be a critical competitive advantage, and worth the hit from having suboptimal interfaces.
Why is this relevant for alignment? Because y...
I take issue with the initial supposition:
My weak prediction is that adding low levels of noise would change the polysemantic activations, but not the monosemantic ones.
Adding L1 to the loss allows the network to converge on solutions that are more monosemantic than otherwise, at the cost of some estimation error. Basically, the network is less likely to lean on polysemantic neurons to make up small errors. I think your best bet is to apply the L1 loss on the hidden layer and the output later activations.
Great stuff!
Do you have results with noisy inputs?
The negative bias lines up well with previous sparse coding implementations: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=JHuo2D0AAAAJ&citation_for_view=JHuo2D0AAAAJ:u-x6o8ySG0sC
Note that in that research, the negative bias has a couple of meanings/implications:
Along those lines, you might be able to further improve m...
I have some technical background in neuromorphic AI.
There are certainly things that the current deep learning paradigm is bad at which are critical to animal intelligence: e.g. power efficiency, highly recurrent networks, and complex internal dynamics.
It's unclear to me whether any of these are necessary for AGI. Something, something executive function and global workspace theory?
I once would have said that feedback circuits used in the sensory cortex for predictive coding were a vital component, but apparently transformers can do similar tasks using purel...
In my model, Chevron and the US military are probably open to AI governance, because: 1 - they are institutions traditionally enmeshed in larger cooperative/rule-of-law systems, AND 2 - their leadership is unlikely to believe they can do AI 'better' than the larger AI community.
My worry is instead about criminal organizations and 'anti-social' states (e.g. North korea) because of #1, and big tech because of #2.
Because of location, EA can (and should) make decent connective with US big tech. I think the bigger challenge will be tech companies in other countries , especially China.
I published an article on induction https://www.lesswrong.com/posts/7x4eGxXL5DMwRwzDQ/commensurable-scientific-paradigms-or-computable-induction of decent length/complexity that send to have gotten no visibility at all, which I found very discouraging for my desire to ever do so again. I could only find it by checking my user profile!
I'm downvoting this, not because it's wrong or because of weak epistemics, but because politics is the mind killer, and this article is deliberately structured to make that worse.
I believe politically sensitive topics like this can be addressed on less wrong, but the inflammatory headline and first sentence here are just clickbait.
Articles are hard! I was lucky enough to be raised bilingual, so I'm somewhat adept at navigating between different article schemes). I won't claim these are hard and fast rules in English, but:
1 - 'Curiosity' is an abstract noun (e.g. liberty, anger, parsimony). These generally don't have articles, unless you need some reason to distinguish between subcategories (e.g. 'the liberty of the yard' vs. 'the liberty of the French')
2 - 'Context' can refer to either a specific context (e.g. 'see in the proper context'), in which case the articles are included, or...
I'm confused.
In the counterfactual where lesswrong had the epistemic and moderation standards you desire, what would have been the result of the three posts in question, say three days after they were first posted? Can you explain why, using the standards you elucidated here?
(If you've answered this elsewhere, I apologize).
Full disclosure: I read all three of those posts, and downvoted the third post (and only that one), influenced in part by some of the comments to that post.
The three posts would all exist.
The first one would be near zero, karmawise, and possibly slightly negative. It would include a substantial disclaimer, up front, noting and likely apologizing for the ways in which the first draft was misleading and underjustified. This would be a result of the first ten comments containing at least three highly upvoted ones pointing that out, and calling for it.
The second post would be highly upvoted; Zoe's actual writing was well in line with what I think a LWer should upvote. The comments would contain ...
"However there’s definitely an additional problem, which is that the fees are going to the city."
Money which the city could presumably use to purchase scarce and vital longshoreman labor.
The city is getting a windfall because it owns a scarce resource. Would you consider this a problem if the port were privately owned?
What Ryan is calling punishment is just an ECON 101 cost increase.
I'm actually ok with the social pressures inherent in the activity. It's a subtle reminder of the real influence of this community. The fact that this community would enforce a certain norm makes me more likely to be a conscientious objector in contexts with the opposite norm. (This is true of historical C.O.s, who often come from religious communities).
I'd highly recommend 'The Bomber Mafia' by Malcolm Gladwell on this subject, which details the internal debates of the US Army Air Corps generals during WWII.
One of the key questions was whether to use the bombers to target strategic industries, or just for general attrition (i.e. firebombing of civilians). Obviously the first one would have been preferable from a humanitarian perspective (and likely would have ended the European War sooner), but it was very difficult to execute in practice.
I think the Bob example is very informative! I think there's an intuitive and logical reason why we think Bob and Edward are worse off. Their happiness is contingent on the masquerade continuing, which has a probability less than one in any plausible setup.
(The only exception to this would be if we're analyzing their lives after they are dead)
Yes, I was completely turned off from 'debate' as a formal endeavor as a high schooler, despite my love for informal debate.
One of the main problems is that debate contests are usually formulated as zero sum, whereas the typical informal debate I engage in is not.
Do you know of any formats for nonzero sum debate competitions where the competitors argue points they actually believe in? e.g. both debaters get more points if they identify a double-crux, and you win by having more points in the tournament as a whole, not by beating your opponent.
I believe that determinism and free will are both good models of reality, albeit at different conceptual level.
Human brains are high dimensional chaotic systems. I believe that if you put a very smart human in a task that demands creativity and insight, it will be extremely difficult to predict what they'll do, even if you precisely knew their connectome and data inputs. Maybe that's not the same thing as a philosophical "free will", but I don't see how it would result in a different end experience.
Even in a scenario where all unvaccinated people were infected with covid, I would expect none of the Georgetown undergraduates to die from covid or get covid longer than 12 weeks.
Here's my fermi analysis:
That gives us .03 x .01 x .1, for a case really long covid rate of .00003. .00003 x 6532 = .2 really long cov...
He recommends that for communities, which presumably include significant numbers of unvaccinated folks. Which, if targeted to N95 or better masks, and actually enforced, could have substantial effect!
But having members of the least infectious subpopulation voluntarily mask is pretty much useless.
As to your second point, there is strong evidence that is not the case: https://pubmed.ncbi.nlm.nih.gov/34250518/ Vaccinated individuals who get infected have substantially lower viral loads, and thus are substantially less contagious.
You reach the opposite conclusion from Tomas Pueyo (who seems to be your primary reference):
"If you’re vaccinated, you’re mostly safe, especially with mRNA vaccines. Keep your guard up for now, avoid events that might become super-spreaders, but you don’t need to worry much more than that."
Checking your math, I think your biggest error is equating long covid (at least one symptom still present after 28 days) with lifelong CFS. The vast majority seem to clear up in the next 8 weeks: https://www.nature.com/articles/s41591-021-01292-y
I believe the 64% reducti...
There are, but what does having a length below 10^90 have to do with the solomonoff prior? There's no upper bound on the length of programs.