Jon Garcia

I have a PhD in Computational Neuroscience from UCSD (Bachelor's was in Biomedical Engineering with Math and Computer Science minors). Ever since junior high, I've been trying to figure out how to engineer artificial minds, and I've been coding up artificial neural networks ever since I first learned to program. Obviously, all my early designs were almost completely wrong/unworkable/poorly defined, but I think my experiences did prime my brain with inductive biases that are well suited for working on AGI.

Although I now work as a data scientist in R&D at a large medical device company, I continue to spend my free time studying the latest developments in AI/ML/DL/RL and neuroscience and trying to come up with models for how to bring it all together into systems that could actually be implemented. Unfortnately, I don't seem to have much time to develop my ideas into publishable models, but I would love to have the opportunity to share ideas with those who do.

Of course, I'm also very interested in AI Alignment (hence the account here). My ideas on that front mostly fall into the "learn (invertible) generative models of human needs/goals and hook those up to the AI's own reward signal" camp. I think methods of achieving alignment that depend on restricting the AI's intelligence or behavior are about as destined to failure in the long term as Prohibition or the War on Drugs in the USA. We need a better theory of what reward signals are for in general (probably something to do with maximizing (minimizing) the attainable (dis)utility with respect to the survival needs of a system) before we can hope to model human values usefully. This could even extend to modeling the "values" of the ecological/socioeconomic/political supersystems in which humans are embedded or of the biological subsystems that are embedded within humans, both of which would be crucial for creating a better future.

Posts

Sorted by New

Wikitag Contributions

Goal-Directedness

(+1348)

Comments

Sorted by

Newest

What Is The Alignment Problem?

Jon Garcia19d30

Exercise: Do What I Mean (DWIM)
I haven't thought much about what patterns need to hold in the environment in order for "do what I mean" to make sense at all. But it's a natural next target in this list, so I'm including it as an exercise for readers: what patterns need to hold in the environment in order for "do what I mean" to make sense at all? Note that either necessary or sufficient conditions on such patterns can constitute marginal progress on the question.

As far as I can tell, DWIM will necessarily require other-agent modeling in some sort of predictive-coding framework. The "patterns in the environment" would be the correspondence between the actual state of the world and the representation of the desired goal state in the mind of the human, as well as between the trajectory taken to reach the goal state and the human's own internal acceptance criteria.

Part of the AGI not hooked up to the reward signal would need to have a generative model of human agent's behavior, words, commands, etc., derived from a latent representation of their beliefs and desires. This latent representation is constantly updated to minimize prediction error derived from observation, verbal feedback, etc. (e.g., Human: "That's not what I meant!" AGI: "Hmm, what must be going on inside their head to make them say that, given the state of the environment and prior knowledge about their preferences, and how does that differ from what I was assuming?")

At the same time, the AGI needs to have some latent representation of the environment and the paths taken through it that uses (a linear mapping to) the same latent space it uses for representing the human's desires. Correspondence can then be measured and optimized for directly.

Stephen Fowler's Shortform

Jon Garcia9mo20

Also, consider a more traditional optimization process, such as a neural network undergoing gradient descent. If, in the process of training, you kept changing the training dataset, shifting the distribution, you would in effect be changing the optimization target.

Each minibatch generates a different gradient estimate, and a poorly randomized ordering of the data could even lead to training in circles.

Changing environments are like changing the training set for evolution. Differential reproductive success (mean squared error) is the fixed cost function, but the gradient that the population (network backpropagation) computes at any generation (training step) depends on the particular set of environmental factors (training data in the minibatch).

Stephen Fowler's Shortform

Jon Garcia9mo20

Evolution may not act as an optimizer globally, since selective pressure is different for different populations of organisms on different niches. However, it does act as an optimizer locally.

For a given population in a given environment that happens to be changing slowly enough, the set of all variations in each generation act as a sort of numerical gradient estimate of the local fitness landscape. This allows the population as a whole to perform stochastic gradient descent. Those with greater fitness for the environment could be said to be lower on the local fitness landscape, so their is an ordering for that population.

In a sufficiently constant environment, evolution very much does act as an optimization process. Sure, the fitness landscape can change, even by organisms undergoing evolution (e.g. the Great Oxygenation Event of yester-eon, or the Anthropogenic Mass Extinction of today), which can lead to cycling. But many organisms do find very stable local minima of the fitness landscape for their species, like the coelacanth, horseshoe crab, cockroach, and many other "living fossils". Humans are certainly nowhere near our global optimum, especially with the rapid changes to the fitness function wrought by civilization, but that doesn't mean that there isn't a gradient that we're following.

Conditional on living in a AI safety/alignment by default universe, what are the implications of this assumption being true?

Answer by Jon GarciaJul 17, 202370

I would expect that for model-based RL, the more powerful the AI is at predicting the environment and the impact of its actions on it, the less prone it becomes to Goodharting its reward function. That is, after a certain point, the only way to make the AI more powerful at optimizing its reward function is to make it better at generalizing from its reward signal in the direction that the creators meant for it to generalize.

In such a world, when AIs are placed in complex multiagent environments where they engage in iterated prisoner's dilemmas, the more intelligent ones (those with greater world-modeling capacity) should tend to optimize for making changes to the environment that shift the Nash equilibrium toward cooperate-cooperate, ensuring more sustainable long-term rewards all around. This should happen automatically, without prompting, no matter how simple or complex the reward functions involved, whenever agents surpass a certain level of intelligence in environments that allow for such incentive-engineering.

Another medical miracle

Jon Garcia2y212

Disclaimer: I am not a medical doctor nor a nutritionist, just someone who researches nutrition from time to time.

I would be surprised if protein deficiency per se was the actual problem. As I understand it, many vegetables actually have a higher level of protein per calorie than meat (probably due to the higher fat content of the latter, which is more calorie dense), although obviously, there's less protein per unit mass than meat (since vegetables are mostly cellulose and water). The point is, though, that if you were getting enough calories to function from whole, unrefined plant sources, you shouldn't have had a protein deficiency. (Of course, you might have been eating a lot of highly processed "vegetarian" foods, in which case protein deficiency is not entirely out of the question.)

That being said, my guess is that you may be experiencing a nutritional deficiency either in sulfur or in vitamin D (the latter of which is a very common deficiency). Plant-derived proteins tend to have much lower levels of sulfur-containing amino acids (methionine, cysteine) than animal-derived proteins, and sulfur is an important component of cartilage (and of arthritis supplements). Both sulfur and vitamin D have been investigated for their role in musculoskeletal pain and other health issues (although from what I have read, results are more ambiguous for sulfur than for vitamin D with respect to musculoskeletal pain in particular). Eggs are particularly high in both sulfur (sulfur smell = rotten egg smell) and vitamin D, so if you were low on either one of those, it makes sense that eating a lot of eggs would have helped. It would be very interesting to test whether either high-sulfur vegetables (such as onions or broccoli) or vitamin D supplements would have a similar effect on your health.

Residual stream norms grow exponentially over the forward pass

Jon Garcia2y40

Due to LayerNorm, it's hard to cancel out existing residual stream features, but easy to overshadow existing features by just making new features 4.5% larger.

If I'm interpreting this correctly, then it sounds like the network is learning exponentially larger weights in order to compensate for an exponentially growing residual stream. However, I'm still not quite clear on why LayerNorm doesn't take care of this.

To avoid this phenomenon, one idea that springs to mind is to adjust how the residual stream operates. For a neural network module f, the residual stream works by creating a combined output: r(x)=f(x)+x

You seem to suggest that the model essentially amplifies the features within the neural network in order to overcome the large residual stream: r(x)=f(1.045*x)+x

However, what if instead of adding the inputs directly, they were rescaled first by a compensatory weight?: r(x)=f(x)+1/1.045x=f(x)+0.957x

It seems to me that this would disincentivize f from learning the exponentially growing feature scales. Based on your experience, would you expect this to eliminate the exponential growth in the norm across layers? Why or why not?

Deep learning models might be secretly (almost) linear

Jon Garcia2y10

If both images have the main object near the middle of the image or taking up most of the space (which is usually the case for single-class photos taken by humans), then yes. Otherwise, summing two images with small, off-center items will just look like a low-contrast, noisy image of two items.

Either way, though, I would expect this to result in class-label ambiguity. However, in some cases of semi-transparent-object-overlay, the overlay may end up mixing features in such a jumbled way that neither of the "true" classes is discernible. This would be a case where the almost-linearity of the network breaks down.

Maybe this linearity story would work better for generative models, where adding latent vector representations of two different objects would lead the network to generate an image with both objects included (an image that would have an ambiguous class label to a second network). It would need to be tested whether this sort of thing happens by default (e.g., with Stable Diffusion) or whether I'm just making stuff up here.

Deep learning models might be secretly (almost) linear

Jon Garcia2y70

For an image-classification network, if we remove the softmax nonlinearity from the very end, then would represent the input image in pixel space, and $Y$ would represent the class logits. Then $f (x_{1} + x_{2}) \approx f (x_{1}) + f (x_{2})$ would represent an image with two objects leading to an ambiguous classification (high log-probability for both classes), and $f (k x) \approx k f (x)$ would represent higher class certainty (softmax temperature = $1 / k$ ) when the image has higher contrast. I guess that kind of makes sense, but yeah, I think for real neural networks, this will only be linear-ish at best.

Would we even want AI to solve all our problems?

Jon Garcia2y61

I would say we want an ASI to view world-state-optimization from the perspective of a game developer. Not only should it create predictive models of what goals humans wish to achieve (from both stated and revealed preferences), but it should also learn to predict what difficulty level each human wants to experience in pursuit of those goals.

Then the ASI could aim to adjust the world into states where humans can achieve any goal they can think of when they apply a level of effort that would leave them satisfied in the accomplishment.

Humans don't want everything handed to us for free, but we also don't generally enjoy struggling for basic survival (unless we do). There's a reason we pursue things like competitive sports and video games, even as we denounce the sort of warfare and power struggles that built those competitive instincts in the ancestral environment.

A safe world of abundance that still feels like we've fought for our achievements seems to fit what most people would consider "fun". It's what children expect in their family environment growing up, it's what we expect from the games we create, and it's what we should expect from a future where ASI alignment has been solved.

But why would the AI kill us?

Jon Garcia2y10

I agree, hence the "if humanity never makes it to the long-term, this is a moot point."