Escaque 66 — LessWrong

LESSWRONG
LW

FLEXIBLE AND ADAPTABLE LLM’s WITH CONTINUOUS SELF TRAINING

CURRENT UNFLEXIBLE MODELS

Current LLMs have serious limitations in adaptability, flexibility, and continuous learning. The knowledge and world model contained in the LLM’s parameters is fixed at the time of training completion. Interactions during inference time do not influence the model's knowledge:

Factual knowledge is outdated: limited to the last date of the training data.
No learning from conversations: new knowledge obtained in interactions isn't incorporated into the model.
Limited user adaptation: customization for users relies on side memory, which only modifies pre-prompts without internalizing changes.
Static in specific use cases: models do not learn from day-to-day interactions in their designated environments.
No learning from physical interactions: models used in physical environments (e.g., “robots”) do not learn from

... (read 664 more words →)

-11

Replying tofMRI LIKE APPROACH TO AI ALIGNMENT / DECEPTIVE BEHAVIOUR

Escaque 662y

fMRI LIKE APPROACH TO AI ALIGNMENT / DECEPTIVE BEHAVIOUR

For a work implementin this idea, see: https://www.anthropic.com/index/decomposing-language-models-into-understandable-components

Replying tofMRI LIKE APPROACH TO AI ALIGNMENT / DECEPTIVE BEHAVIOUR

Escaque 663y

fMRI LIKE APPROACH TO AI ALIGNMENT / DECEPTIVE BEHAVIOUR

Thank you for your comment, Zac. The links you suggest will be helpful for me to check whether this kind of analysis has been tried. Up to now I've only seen studies directed to interpret specific neurons or areas of a model, but not a statistical analysis of the whole model that can raise an alert when the model is using certain areas previously associated with negative behaviors.

fMRI LIKE APPROACH TO AI ALIGNMENT / DECEPTIVE BEHAVIOUR

Escaque 66

fMRI (Functional Magnetic Resonance Imaging) is a technique used to investigate the human brain by detecting changes in the blood flow when doing certain tasks. Blood flow is an indicator of energy consumption, and its changes indicate areas of the brain associated with certain activities.

AI alignment is an open area of investigation trying to solve the problem of ensuring that an AI system's goals and behaviours are aligned with human values. Among other problems that may arise in an advance AI system (specification, robustness, interpretability…), one of the main concerns is the possibility that it may attempt to hide its real intentions. This is generally referred to as "concealed intent" or "deceptive... (read 387 more words →)

-1