Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that "uses archetypal data" to "embed Synthetic Archetypes". These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 57.33% in the GPT-2-XL model after fine-tuning. .. (read more)
Religion is a complex group of human activities — involving commitment to higher power, belief in belief, and a range of shared group practices such as worship meetings, rites of passage, etc... (read more)
| User | Post Title | Wikitag | Pow | When | Vote |
The arguments about which entities to include or exclude seem to contradict each other, or don't really justify their positions. Examples:
The only argument that seems to me to have force is "avoid a slap-fight over who gets to rule the world". The argument for excluding particular (plausibly-)moral patients is that if you try to include them, you might be conquered by someone else who doesn't include them, and get a worse ultimate outcome.
Summaries of discussions, takeaways, etc. from LessWrong meetups that have already taken place.
Inkhaven is a 30-day residency where one has to publish posts every day, as part of an effort to grow stronger as a writer. While this has produced some excellent posts it also produces a fair bit of noise too, and also many more hastily-written or experimental posts than usual.
Inkhaven-like posts emerge when other people try to imitate this manner on a smaller scale (e.g. Lightcone team members doing their own 1-week writing stints, or 'HalfHaven' where remote LessWrongers aim to post 30 posts over the course of two months).
ML4Good is a France-based field-building organisation that runs AI Safety bootcamps.
Scalable oversight is an approach to AI control [1]in which AIs supervise each other.the problem of providing reliable supervision of outputs from AIs, even as they become smarter than humans. Often groups of weaker AIs supervise a stronger AI, or AIs are set in a zero-sum interactiondebate with each other.
People used to refer to scalable oversight as a set of AI alignment techniques, but they usually work on the level of incentives to the AIs, and have less to do with architecture.
Interp on Deepseeks mHC architecture
Inkhaven is a 30-day residency where one has to publish posts every day.day, as part of an effort to grow stronger as a writer. While this likely helps one in the longer term, the shorter-term effect ishas produced some excellent posts it also produces a fair bit of noise too, and also many more likely creation ofhastily-written or experimental posts with less effort to doublecheck the arguments and, as a result, with epistemic problems. than usual.
Inkhaven-like posts emerge when other people try to imitate this manner on a smaller scale (e.g. Lightcone team members doing their own 1-week writing stints)stints, or 'HalfHaven' where remote LessWrongers aim to post 30 posts over the course of two months).
For the purposes of Agent Foundations, Payor's lemma has been proposed as an alternative to Löb's theorem, due to both being simpler and possibly having a probabilistic generalization in a way that breaks for Löb's theorem.theorem. If it works out, this would provide a way for agents to do the probabilistic version of logical decision theory type stuff like cooperate in the Prisoner's dilemma when given each other's source code, this time with uncertainty.
Löb's theorem states that, given any statement P, if Peano Arithmetic (PA for short) proves that it can be 'trusted' if it proves P (that is, Prove(P) implies P), then it actually just proves P. This means that PA cannot tell you that it can be trusted about P, unless it also just tells you P. It also holds for theories that contain PA.
As a consequence, whenever we try to prove a statement P, we can go ahead and just assume that P is provable, and then see if we can show that that implies that P is true. This might sound really stupid and contradictory at first glance - the important thing is to be really clear about what is proving what. In the condition, PA is saying that if it proves P, then P is true. In 'our' view (that is, in a metatheory), we see that if PA says that, then PA will also say that P is true.
It became much less important later after the invention/discovery of the Garrabrant Inductor. There is also work on using the similar Payor's Lemma, which possibly allows for a probabilistic version in a way that break's for Löb's theorem.
In formal notation, let Prv stand for the standard provability predicate of PA. Then, Prv(T) is true if and only if there is a proof from the axioms and rules of inference of PA of T. Then what we would like PA to say is that Prv(S)⟹S for every sentence S.
But alas, PA suffers from a problem of self-trust.
Löb's theorem states that if PA⊢Prv(S)⟹S then PA⊢S. This immediately implies that if PA is consistent, the sentences PA⊢Prv(S)⟹S are not provable when S is false, even though according to our intuitive understanding of the standard model every sentence of this form must be true.
Thus, PA is incomplete, and fails to prove a particular set of sentences that would increase massively our confidence in it.
Notice that Gödel's second incompleteness theorem follows immediately from Löb's theorem, as if PA is consistent, then by Löb's PA⊬Prv(0=1)⟹0=1, which by the propositional calculus implies PA⊬¬Prv(0=1).
Inkhaven is a 30-day residency where one has to publish posts every day. While this likely helps one in the longer term, the shorter-term effect is a more likely creation of posts with less effort to doublecheck the arguments and, as a result, with epistemic problems.
Inkhaven-like posts emerge when other people try to imitate this manner on a smaller scale (e.g. Lightcone team members doing their own 1-week writing stints).
When formalized, causal relationships are usually formalized as a directed acyclic graph from parent events to child events saying how to compute the probable child given the state of its parents.
Summaries of discussions, takeaways, etc. from LessWrong meetups that have already taken place.
By Ruthenis (summarized; includes level 0):
The main problems with CEV include, firstly, the great difficulty of implementing such a program - “If one attempted to write an ordinary computer program using ordinary computer programming skills, the task would be a thousand lightyears beyond hopeless.” Secondly, the possibility that human values may not converge. Yudkowsky considered CEV obsolete almost immediately after its publication in 2004. He states that there's a "principled distinction between discussing CEV as an initial dynamic of Friendliness, and discussing CEV as a Nice Place to Live" and his essay was essentially conflating the two definitions.
Hey everyone! My name's Rishi. Hoping to explore more of the Rationalist community and float some of my ideas. Any initial reading recs? I'm mostly interested in the relation of rationalism to metaphysics.