Wiki-Tags in Need of Work

AI Control in the context of AI Alignment is a category of plans that aim to ensure safety and benefit from AI systems, even if they are goal-directed and are actively trying to subvert your control measures. From The case for ensuring that powerful AIs are controlled:.. (read more)

Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that "uses archetypal data" to "embed Synthetic Archetypes". These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 57.33% in the GPT-2-XL model after fine-tuning. .. (read more)

If you are new to LessWrong, the current iteration of this, is the place to introduce yourself... (read more)

Repositories are pages that are meant to collect information and advice of a specific type or area from the LW community. .. (read more)

A threat model is a story of how a particular risk (e.g. AI) plays out... (read more)

A project announcement is what you might expect - an announcement of a project.
Posts that are about a project's announcement, but do not themselves announce anything, should not have this tag... (read more)

A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility. They are also referred to as goal-seeking. The concept of a rational agent is used in economics, game theory, decision theory, and artificial intelligence... (read more)

Zettelkasten (German for "slip box") is a note-taking method popular amongst some LWers, and often praised for its scalability... (read more)

Tag Voting Activity

User Post Title Tag Pow When Vote

Recent Tag & Wiki Activity

  • Limited temporal scope in decision-making
  • Focus on immediate reward optimization
  • Reduced instrumental incentives

     

PostsObject-Level AI Risk Skepticism is the view that expressthe potential risks posed by artificial intelligence (AI) are overstated or misunderstood, specifically regarding the direct, tangible dangers posed by the behavior of AI systems themselves. Skeptics of object-level reasonsAI risk argue that fears of highly autonomous, superintelligent AI leading to be skeptical of core AI X-Risk arguments. catastrophic outcomes are premature or unlikely.

Encultured AI is a for-profit public benefit corporation working to make AI safer and healthier for human beings.

Its current main strategy involves building a platform usable for AI safety and alignment experiments, comprising a suite of environments, tasks, and tools for building more environments and tasks.

Myopia meansrefers to short-sighted, particularly with respectsightedness in planning and decision-making processes. It describes a tendency to planning -- neglecting long-prioritize immediate or short-term consequences in favoroutcomes while disregarding longer-term consequences.

The most extreme form of the short term. The extreme case, in which myopia occurs when an agent considers only immediate rewards are considered, is of particular interest. We can think ofrewards, completely disregarding future consequences. In artificial intelligence contexts, a perfectly myopic agent as one that only considers how bestwould optimize solely for the current query or task without attempting to answer the single question that you give to it rather than considering any sort of long-term consequences. Such an agent might have a number of desirable safety properties, such as a lack of influence future outcomes.

Myopic agents demonstrate several notable properties:

Corrigibility is an AI system's capacity to be safely and reliably modified, corrected, or shut down by humans after deployment, even if doing so conflicts with its current objectives.

Within the field of machine learning, reinforcement learningReinforcement Learning refers tois the study of how to train agents to complete tasks by updating ("reinforcing") the agents with feedback signals. 

A Neuromorphic AI ('neuron-shaped') is a form of AI where most of the functionality has been copied from the human brain. This implies that its inner workings are not necessarily understood by the creators any further than is necessary to simulate them on a computer. It is considered a more unsafe form of AI than either Whole Brain Emulation or de novo AI because its lacks the high quality replication of human values of the former and the possibility of good theoretical guarantees that the latter may have due to cleaner design.

Machine Learning refers to theis a general field of study that deals with automated statistical learning and pattern detection by non-biological systems. It can be seen as a sub-domain of artificial intelligence that specifically deals with modeling and prediction through the knowledge extracted from training data. As a multi-disciplinary area, it has borrowed concepts and ideas from other areas like pure mathematics and cognitive science.

Language modelsModels are computer programs made to estimate the likelihood of a piece of text. "Hello, how are you?" is likely. "Hello, fnarg horses" is unlikely.

The All AI Questions WelcomeOpen Threads open threads are a series of posts where people are welcome to post any AI questions, however basic.

AI Alignment Intro Materials.Materials Postsare posts that help someone get oriented and skill up. Distinct from AI Public Materials is that they are more "inward facing" than "outward facing", i.e. for people who are already sold AI risk is a problem and want to upskill.
 

This tag is for explicit discussion of the organisation, not for all work published by researchers at that organisation.

The Machine Intelligence Research Institute, formerly known as the Singularity Institute for Artificial Intelligence (not to be confused with Singularity University) is a non-profit research organization devoted to reducing existential risk from unfriendly artificial intelligence and understanding problems related to friendly artificial intelligence. Eliezer Yudkowsky was one of the early founders and continues to work there as a Research Fellow. The Machine Intelligence Research Institute created and currently owns the LessWrong domain.

External Links

The Future of Life Institute, or FLI, is a nonprofit organization whose mission is to mitigate existential risks. Its most prominent activities are issuing grants to x-risk researchers and organizing conferences on AI and existential risk.

Website: futureoflife.org

The Future of Humanity Institute was part of the Faculty of Philosophy and the Oxford Martin School at the University of Oxford. Founded in 2005 and shut down in 2024, its director was Nick Bostrom. The mission of FHI was described on their website:

External links

The Center for Human-Compatible AI is a research institute at UC Berkeley, founded and led by Stuart Russell. Its stated objective is to prevent building unfriendly AI by focusing research on provably beneficial behaviour.

External links:

The AI X-riskRisk Research Podcast is a podcast hosted by Daniel Filan.

Website: axrp.net

AI Safety Camp (AISC) is a non-profit initiative to run programs for diversely skilled researchers who want to try collaborate on an open problem for reducing AI existential risk.

Slowing Down AI refers to efforts and proposals aimed at reducing the pace of artificial intelligence advancement to allow more time for safety research and governance frameworks. These initiatives can include voluntary industry commitments, regulatory measures, or coordinated pauses in development of advanced AI systems.