All of David Bravo's Comments + Replies

I partly support the spirit behind this feature, of providing more information (especially to the commenter), making the readers more engaged and involved, and expressing a reaction with more nuance than with a mere upvote/downvote. I also like that, as with karma, there are options for negative (but constructive) feedback, which I mentioned here when reviewing a different social discussions platform that had only positive reactions such as "Aha!" and "clarifying".

In another sense, I suspect (but could be wrong) that this extra information could also have ... (read more)

5jimrandomh
I agree this is a concern. At an earlier stage of this prototype, the reactions were at the top with the rest of the voting buttons; moving them to the bottom was done partially to reduce the chance that you see the reactions before you've read it.

Certainly; it wasn't my intention to make it seem like an 'either-or'. I believe there's a lot of room for imported quality teaching, and a fairly well-educated volunteer might be better at teaching than the average local teacher. I didn't find how they taught there too effective: a lot of repeating the teacher's words, no intuition built for maths or physics…I think volunteers could certainly help with that. Also by teaching the subjects they are more proficient at than the local teachers (e.g. English). I agree there is the potential to use volunteers in a variety of ways to raise the level of education, and also to try to make the changes permanent once the volunteers leave.

Strong upvote. I found that almost every sentence was extremely clear and conveyed a transparent mental image of the argument made. Many times I found myself saying to myself "YES!" or "This checks" as I read a new point.

That might involve not working on a day you’ve decided to take off even if something urgent comes up; or deciding that something is too far out of your comfort zone to try now, even if you know that pushing further would help you grow in the long term

I will add that, for many routine activities or personal dilemmas with short- and long-ter... (read more)

2Sweetgum
This is an interesting way of looking at it. To elaborate a bit, one day of working toward a long-term goal is essentially useless, so you will only do it if you believe that your future selves will as well. This is some of where the old "You need to believe in yourself to do it!" advice comes from. But there can be good reasons not to believe in yourself as well. In the context of the iterated Prisoner's Dilemma, it's been investigated what the frequency of random errors (the decision to cooperate or defect being replaced with a random one in x% of instances) can go up to before cooperation breaks down. (I'll try to find a citation for this later.) This seems similar, but not literally equivalent, to a question we might ask here: What frequency of random motivational lapses can be tolerated before the desire to work towards the goal at all breaks down? Naturally, the goals that require the most trust are ones that see no benefit until the end, because they require you to trust that your future selves won't permanently give up on the goal anywhere between now and the end to be worth working towards at all. But most long term goals aren't really like this. They could be seen to fall on a spectrum between providing no benefit until a certain point and linear benefit the more they are worked towards with the "goal" point being arbitrary. (This is analogous to the concept of a learning curve.) Actions towards a goal may also provide an immediate benefit as well as progress toward the goal, which reduces the need to trust your future selves. If you don't trust your future selves very much, you can seek out "half-measure" actions that sacrifice some efficiency toward the goal for immediate benefits, but still contribute some progress toward the goal. You can to some extent set where they are along this spectrum, but you are also limited by the types of actions available to you.

Thank you very much for this sequence. I knew fear was a great influence (or impediment) over my actions, but I hadn't given it such a concrete form, and especially a weapon (= excitement) to combat it, until now.

Following matto's comment, I went through the Tunning Your Cognitive Strategies exercise, spotting microthoughts and extracting the cognitive strategies and deltas between such microthoughts. When evaluating a possible action, the (emotional as much as cognitive) delta "consider action X -> tiny feeling in my chest or throat -> meh, I'm not ... (read more)

Thank you very much for this post, I find it extremely valuable.

I also find it especially helpful for this community, because it touches on what I believe are two main sources of anxiety over existential dread that might be common among LWers:

  • Doom itself (end of life and Earth, our ill-fated plans, not being able to see our children or grandchildren grow, etc.), and
  • uncertainty over one's (in)ability to prevent the catastrophe (can I do better? Even if it's unlikely I will be the hero or make a difference, isn't it worth wagering everything on this tiny pos
... (read more)

As others have pointed out, there's a difference between a) problems to be tackled for the sake of the solution, vs b) problems to be tackled for the sake (or fun) of the problem. Humans like challenges and puzzles and to solve things themselves rather than having the answers handed down to them. Global efforts to fight cancer can be inspiring, and I would guess a motivation for most medical researchers is their own involvement in this same process. But if we could push a button to eliminate cancer forever, no sane person would refuse to.

I think we should ... (read more)

The phases you mentioned in learning anything seem especially relevant for sports.

1.  To have a particular kind of feelings (felt senses) that represent something (control, balance, singing right, playing the piano right, everything being done)
2.  A range of intensity that we should keep that feeling sense in, in some given context (either trying to make sure we have some positive feeling, or that we avoid some negative feeling)
3.  Various strategies for keeping it within that range

Below the surface, every sport is an extremely complex ... (read more)

4Kaj_Sotala
Great example, thanks! Yeah, it's a very common thing that there are two opposite strategies that one can hit upon in order to deal with these kinds of strategies, one of them "approach/do" and the other "avoid/don't do". In this case, "do something bad" vs. "avoid doing anything bad". It looks to me like the brain has a tendency to generate both options, and then some combination of external consequences and the person's past history and genetic disposition decides which of those strategies becomes predominant.  People may also have the issue that both a "do" and a "don't do" strategy have become reinforced for them, so some situations may trigger both and lead to significant internal conflict. (E.g. the example in the text about a person alternatively seeking romantic connection and then wanting to get out of a relationship, is a description of a case where "approach" and "avoid" strategies alternate in getting triggered.) It also seems like a relevant difference that in your case, you knew what would trigger your sister's accusations. Whereas in the case I was thinking of, the person was genuinely confused about where the original accusation came from (they were accused of stealing something they never even touched). That made the "don't do things that would get me blamed" strategy less available as an option, since they didn't know what they could have done differently to avoid the accusations in the first place. Schema therapy also has a slightly different way of characterizing this kind of a thing which I like, of three different coping styles of surrender (giving into a negative belief), avoidance (trying to avoid situations where the belief would be triggered) and overcompensation (actively trying to prove the negative belief wrong). This page has some examples.

Thank you for your explanations. My confusion was not so much from associating agency with consciousness and morality or other human attributes, but with whether it was judged from an inside, mechanistic point of view, or from an outside, predicting point of view of the system. From the outside, it can be useful to say that "water has the goal to flow downhill", or that "electrons have the goal to repel electrons and attract protons", inasmuch as "goal" is referred to as "tendency". From an inside view, as you said, it's nothing like the agency we know; th... (read more)

5DirectedEvolution
We can also reframe AI agents as sophisticated feedback controllers. A normal controller, like a thermostat, is set up to sense a variable (temperature) and take a control action in response (activate a boiler). If the boiler is broken, or the thermostat is inaccurate, the controller fails. An AI agent is able to flexibly examine the upstream causes of its particular goal and exert control over those in adaptive fashion. It can figure out likely causes of failing to accurately regulate temperature, such as a broken boiler, and figure out how to monitor and control those as well. I think one way we could try to define agency in a way that excludes calculators would be the ability to behave as if pursuing an arbitrary goal. Calculators are stuck with the goal you built them with, providing a correct answer to an input equation. AutoGPT lets you define its goal in natural language. So if we put these together - a static piece of software that can monitor and manipulate the causes leading to some outcome in an open-ended way, and allows an input to flexibly specify what outcome ought to be controlled, I think we have a pretty intuitive definition of agency. AutoGPT allows the user to specify truly arbitrary goals. AutoGPT doesn’t even need to be perfect at pursuing a destructive goal to be dangerous - it can also be dangerous by pursuing a good goal in a dangerous way. To me, the only thing making AutoGPT anything but a nightmare tool is that for now, it is ineffective at pursuing most goals. But I look at that thing operating and I don’t at all see anything intrinsically sphexish, nor an intelligence that is likely to naturally become more moral as it becomes more capable. I see an amoral agent that hasn’t been quite constructed to efficiently accomplish the aims it was given yet.

(I reply both to you and @Ericf here). I do struggle a bit to make up my mind on whether drawing a line of agency is really important. We could say that a calculator has the 'goal' of returning the right result to the user; we don't treat a calculator as an agent, but is it because of its very nature and the way in which it was programmed, or is it for a matter of capabilities, it being incapable of making plans and considering a number of different paths to achieve its goals?

My guess is that there is something that makes up an agent and which has to do wi... (read more)

2Ericf
I think the key facility of am agent vs a calculator is the capability to create new short term goals and actions. A calculator (or water, or bacteria) can only execute the "programming" that was present when it was created. An agent can generate possible actions based on its environment, including options that might not even have existed when it was created.
3DirectedEvolution
If you didn’t know what a calculator was, and were told that it had the goal of always returning the right answer to whatever equation was input, and failing that, never returning a wrong answer, that would help you predict its behavior. The calculator can even return error messages for badly formatted inputs, and comes with an interface that helps humans avoid slip ups. So I would say that the calculator is behaving with very constrained but nonzero functional agency. Its capabilities are limited by the programmers to exactly those required to achieve its goal under normal operating conditions (it can’t anticipate and avoid getting tossed in the ocean or being reprogrammed). Likewise, a bacterial genome exhibits a form of functional agency, with the goal of staying alive and reproducing itself. Knowing this helps us predict what specific behaviors bacteria might exhibit. Describing something as possessing some amount of functional agency is not the same as saying it is conscious, highly capable of achieving this goal, or that the mechanism causing goal-oriented behavior has any resemblance to a biological brain. We can predict water’s behavior well knowing only that it “wants to flow downhill,” even if we know nothing of gravity. The reason for doubting agency in non-brain-endowed entities is that we want to make space for two things. First, we want to emphasize that behaving with functional agency is not the same as having moral weight attached to those goals. Water has no right to flow downhill, and we don’t have any duty to allow it to do so. Second, we want to emphasize that the mechanism producing functionally agentic behavior is critical to understand, as it informs both our conception of the goal and the agent’s ability to achieve it, both of which are critical for predicting how it will behave. There is a limit to how much you can predict with a rough understanding like “water’s goal is to flow downhill,” just as there’s a limit to how well you can pr

This seems to me more like a tool AI, much like a piece of software asked to carry out a task (e.g. an Excel sheet for doing calculations), but with the addition of processes or skills for the creation of plans and searches for solutions which would endow it with an agent-like behaviour. So, for the AutoGPT-style AI here contemplated, it appears to me like this agent-like behaviour would not emerge out of the AI's increased capabilities and achievement of general intelligence to reason, devise accurate models of the world and of humans, and plan; nor would... (read more)

3DirectedEvolution
Take a behaviorist or functionalist approach to AI. Let's say we only understood AutoGPT's epistemic beliefs about the world-state. How well could we predict its behavior? Now, let's say we treated it as having goals - perhaps just knowing how it was initially prompted with a goal by the user. Would that help us predict its behavior better? I think it would. Whatever one thinks it means to be "really" agentic or "really" intelligent, AutoGPT is acting as if it was agentic and intelligent. And in some cases, it is already outperforming humans. I think AutoGPT's demo bot (it's a chef coming up with a themed recipe for a holiday) outperforms almost all humans in the speed and quality with which it comes up with a solution, and of course it can repeat that performance as many times as you care to run it. What this puzzle reveals to some extent is that there may not be a fundamental difference between "agency" and "capabilities." If an agent fails to protect what we infer to be its terminal goals, allowing them to be altered, it is hard to be sure if that's because we misunderstood what its terminal goal was, or whether it was simply incompetent or failed by chance. Until last week, humanity had never before had the chance to run repeatable experiments on an identical intelligent agent. This is the birth of a nascent field of "agent engineering," a field devoted to building more capable agents, diagnosing the reasons for their failures, and better controlling our ability to predict outputs from inputs. As an example of a small experiment we can do right now with AutoGPT, can we make a list of 10 goal-specs for AutoGPT on which it achieves 80% of them within an hour? Treating AutoGPT and its descendents as agents is going to be fruitful, although the fruit may be poisoned bananas.
3Ericf
I think even these first rough concepts have a distinction between beliefs and values. Even if the values are "hard coded" from the training period and the manual goal entry. Being able to generate short term goals and execute them, and see if you are getting closer to your long tern goals is basically all any human does. It's a matter of scale, not kind, between me and a dolphin and AgentGPT.

The first times I read LW articles and especially those by Eliezer, it was common for me to think that I simply wasn't smart enough to follow their lines of argumentation. It's precisely missing these buckets and handles, these modes of thoughts and the expressions/words used to communicate them, what made it hard at the start; as I acquired them, I could feel I belonged the community. I suppose this occurs to all newbies, and it's understandable to feel this helplessness and inability to contribute for as long as you haven't acquired the requisite materia... (read more)

I like this model, much of which I would encapsulate in the tendency to extrapolate from past evidence, not only because it resonates with the image I have of the people who are reluctant to take existential risks seriously, but because it is more fertile for actionable advice than the simple explanation of "because they haven't sat down to think deeply about it". This latter explanation might hold some truth, but tackling it would be unlikely to make them take more actions towards reducing existential risks if they weren't aware of, and weren't able to fi... (read more)

The broad spirit they want to convey with the word "generalisation", which is that two systems can exhibit the same desired behaviour in training but result in completely different goals in testing or deployment, seems fair as the general problem. But I agree that to generalise can give the impression that it's an "intentional act of extrapolation", to create a model that is consistent with a certain specification. And there are many more ways in which the AI can behave well in training and not in deployment, without need to assume it's extrapolating a mod... (read more)

This is a really complicated issue because different priors and premises can lead you to extremely different conclusions.

For example, I see the following as a typical view on AI among the general public:
(the common person is unlikely to go this deep into his reasoning, but could come to these arguments if he had to debate on it)

Premises: "Judging by how nature produced intelligence, and by the incremental progress we are seeing in LLMs, artificial intelligence is likely to be achieved by packing more connections into a digital system. This will allow the A... (read more)

I have been using the Narwhal app for the past few days, a social discussion platform "designed to make online conversations better" that is still at its prototype stage. This is how it basically works: there are several topics of discussion posted by other users, formulated with an initial question (e.g. "How should we prioritise which endangered species to protect?" or "Should Silicon Valley be dismantled, reformed, or neither?") and a description, and you can comment on any or reply to others' comments. You can also suggest your own discussions.

Here are... (read more)

Nice to hear the high standards you continue to pursue. I agree that LessWrong should set itself much higher standards than other communities, even than other rationality-centred or -adjacent communities.

My model of this big effort to raise the sanity waterline and prevent existential catastrophes contains three concentric spheres. The outer sphere is all of humanity; ever-changing yet more passive. Its public opinion is what influences most of the decisions of world leaders and companies, but this public opinion can be swayed by other, more directed force... (read more)

Could we take from Eliezer's message the need to redirect more efforts into AI policy and into widening the Overton window to try, in any way we can, to give AI safety research the time it needs? As Raemon said, the Overton window might be widening already, making more ideas "acceptable" for discussion, but it doesn't seem enough. I would say the typical response from the the overwhelming majority of the population and world leaders to misaligned AGI concerns still is to treat them as a panicky sci-fi dystopia rather than to say "maybe we should stop every... (read more)