Their romantic partner offering lots of value in other ways. I'm skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it's hard for that to add up enough to outweigh the usual costs.
Imagine a woman is a romantic relationship with somebody else. Are they still so great a person that you would still enjoy hanging out with them as a friend? If not that woman should not be your girlfriend. Friendship first. At least in my model romantic stuff should be stacked ontop of platonic love.
If you've tried this earnestly 3 times, after the 3rd time, I think it's fine to switch to just trying to solve the level however you want (i.e. moving your character around the screen, experimenting).
After you failed 3 times, wouldn't it be a better exercise to just play around in the level until you get a new pice of information that you predict will allow you to reformulate better plans, and then step back into planning mode again?
Another one: We manage to solve alignment to a significant extend. The AI who is much smarter than a human thinks that it is aligned, and takes aligned actions. The AI even predicts that it will never become unaligned to humans. However, at some point in the future as the AI naturally unrolles into a reflectively stable equilibrium it becomes unaligned.
Why not AI? Is it that AI alignment is too hard? Or do you think it's likely one would fall into the "try a bunch of random stuff" paradigm popular in AI, which wouldn't help much in getting better at solving hard problems?
What do you think about the strategy of instead of learning a textbook e.g. on information theory, or compilers you try to write the textbook and only look at existing material if you are really stuck. That's my primary learning strategy.
It's very slow and I probably do it too much, but it allows me to train to solve hard problems that aren't super hard. If you read all the text books all the practice problems remaining are very hard.
How about we meet, you do research, and I observe, and then try to subtly steer you, ideally such that you learn faster how to do it well. Basically do this, but without it being an interview.
Learning theory, complexity theory and control theory. See the "AI theory" section of the LTA reading list.
... and Carol's thoughts run into a blank wall. In the first few seconds, she sees no toeholds, not even a starting point. And so she reflexively flinches away from that problem, and turns back to some easier problems.
I spend ~10 hours trying to teach people how to think. I sometimes try to intentionally cause this to happen. Usually you can recognize it by them starting to be quiet (I usually give the instruction that they should do all their thinking out loud). And this seems to be when actual cognitive labor is happening, instead of saying things that y...
I watched this video, and I semi trust this guy (more than anybody else) about not getting it completely wrong. So you can eat too much soy. But eating a bit is actually healthy, is my current model.
Here is also a calculation I did that it is possible to get all amino acids from soy without eating too much.
At the 2024 LessWrong Community weekend I met somebody who I have been working with for perhaps 50 hours so far. They are better at certain programming related tasks than me, in a way provided utility. Before meeting them they where not even considering working on AI alignment related things. The conversation wen't something like this:
Johannes: What are you working on.
Other Person: Web development. What are you working on?
Johannes: I am trying to understand intelligence such that we can build a system that is capable enough to prevent other misaligned ...
I think running this experiment is generally worth it. It's very different to read a study and to run the experiment and see the effect yourself. You may also try to figure out if you are amino acid deficient. See this comment, as well as others in that comment stack.
The reason I mention chicken is that last time I ran this experiment with beef my body started to hurt really bad such that I woke up in the middle of the night. I am pretty sure that the beef was the reason. Maybe something weird was going on in my body at the same time. However, when I tried the same one week later with chicken I didn't have this issue.
Note this 50% likely only holds if you are using a main stream language. For some non-main stream language I have gotten responses that where really unbelivably bad. Things like "the name of this variable wrong" which literally could never be the problem (it was a valid identifier).
And similarly, if you are trying to encode novel concepts, it's very different from gluing together libraries, or implementing standard well known tasks, which I would guess is what habryka is mostly doing (not that this is a bad thing to do).
Maybe you include this in "stack overflow substitute", but the main thing I use LLMs for is to understand well known technical things. The workflow is: 1) I am interested in understanding something, e.g. how a multiplexed barrel bit shifter works. 2) I ask the LLM to explain the concept. 3) Based on the initial response I create seperate conversation branches with questions I have (to save money and have the context be closer. Didn't evaluate if this actually makes the LLM better.). 4) Once I think I understood the concept or part of the concept I explain...
The goal is to have a system where there are no unlabeled parameters ideally. That would be the world modeling system. It then would build a world model that would have many unlabeled parameters. By understanding the world modeler system you can ensure that the world model has certain properties. E.g. there is some property (which I don't know) of how to make the world model not contain dangerous minds.
E.g. imagine the AI is really good at world modeling, and now it models you (you are part of the world) so accurately that you are now basically copied into...
John's post is quite wierd, because it only says true things, and implicitly implies a conclusion, namely that NNs are not less interpretable than some other thing, which is totally wrong.
Example: A neural network implements modular arithmetic with furier transforms. If you implement that furier algorithm in python, it's harder to understand for a human than the obvious modular arithmetic implementation in python.
It doesn't matter if the world model is inscruitable when looking directly at it, if you can change the generating code such that certain propert...
I specifically am talking about solving problems that nobody knows the answer to, where you are probably even wrong about what the problem even is. I am not talking about taking notes on existing material. I am talking about documenting the process of generating knowledge.
I am saying that I forget important ideas that I generated in the past, probably they are not yet so refined that they are impossible to forget.
A robust alignment scheme would likely be trivial to transform into an AGI recipe.
Perhaps if you did have the full solution, but it feels like that there are some things of a solution that you could figure out, such that that part of the solution doesn't tell you as much about the other parts of the solution.
And it also feels like there could be a book such that if you read it you would gain a lot of knowledge about how to align AIs without knowing that much more about how to build one. E.g. a theoretical solution to the stop button problem seems like i...
If you had a system with “ENTITY 92852384 implies ENTITY 8593483" it would be a lot of progress, as currently in neural networks we don't even understand the interal structures.
I want to have an algorithm that creates a world model. The world is large. A world model is uninterpretable by default through it's sheer size, even if you had interpretable but low level abels. By default we don't get any interpretable labels. I think there are ways to have generic dataprocessing procedures that don't talk about the human mind at all, that would yield more interpr...
I definitely very often run into the problem that I forget why something was good to do in the first place. What are the important bits? Often I get sidetracked, and then the thing that I am doing seems not so got, so I stop and do something completely different. But then later on I realize that actually the original reason that led me down the path was good and that it would have been better to only backtrack a bit to the important piece. But often I just don't remember the important piece in the moment.
E.g. I think that having som...
I'd think you can define a tedrahedron for non-euclidean space. And you can talk about and reason about a set of polyhedra with 10 verticies as an abstract object without talking or defining any specific such polyhedra.
Just consider if you take the assumption that the system would not change in arbitrary ways in response to it's environment. There might be certain constrains. You can think about what the constrains need to be such that e.g. a self modifying agent would never change itself such that it would expect that in the future it would get less utili...
The way I would approach this problem (after not much thought): Come up with a concrete system architecture A of a maimizing computer program that has an explicit utility function, and is known to behave optimally. E.g. maybe it plays tic tac toe or 4-in a row optimally.
Now mutate the source code of A slightly such that it is no longer optimal to get a system B. The objective is not modified. Now B still "wants" to basically be A, in the sense that if it is a general enough optimizer and has access to selfmodification facilities, it would try to make itsel...
To me it seems that understanding how a system that you are building actually works (i.e. have good models about its internal) is the most basic requirement to be able to reason about the system coherently at all.
Yes if you'd actually understood how intelligence works in a deep way you don't automatically solve alignment. But it sure will make it a lot more tractable in many ways. Especially when only aiming for a pivotal act.
I am pretty sure you can figure out alignment in advance as you suggest. That might be the overall saver route... if we didn't have ...
It's becomes more interresting when the people constrain their output based on what they expect is true information that the other person does not yet know. It's useful to talk to an expert, who tells you a bunch of random stuff they know that you don't.
Often some of it will be useful. This only works if they understand what you have said though (which presumably is something that you are interested in). And often the problem is that people's models about what is useful are wrong. This is especially likely if you are an expert in something. Then the thing ...
It seems potentially important to compare this to GPT4o. In my experience when asking GPT4 for research papers on particular subjects it seemed to make up non-existent research papers (at least I didn't find them after multiple minutes of searching the web). I don't have any precise statistics on this.
Yes exactly. The larva example illustrates that there are different kinds of values. I thought it was underexplored in the OP to characterize exactly what these different kinds of values are.
In the sadist example we have:
These two things both seem like values. However, they seem to be qualitatively different kinds of values. I intuit that more precisely characterizing this difference is important. I have a bunch of thoughts on this that I failed to write up so far.
reward is the evidence from which we learn about our values
A sadist might feel good each time they hurt somebody. I am pretty sure it is possible for a sadist to exist who does not endorse hurting people, meaning they feel good if they hurt people, but they avoid it nonetheless.
So to what extent is hurting people a value? It's like the sadist's brain tries to tell them that they ought to want to hurt people, but they don't want to. Intuitively the "they don't want to" seems to be the value.
Here are a few observations I have made when it comes to going to bed on time.
I set up an alarm that reminds me when my target bedtime has arrived. Many times when I am lost in an activity, the alarm makes me remember that I made the commitment to go to bed on time.
I only allow myself to dismiss the alarm when I lay down in bed. Before laying down I am only allowed to snooze it for 8 minutes. To dismiss the alarm I need to solve a puzzle which takes 10s, making dismissing more convenient. Make sure to carry your phone around wi...
You need the right relationship with confusion. By default confusion makes you stop your thinking. Being confused feels like you are doing something wrong. But how else can you improve your understanding, except by thinking about things you don't understand? Confusion tells you that you don't yet understand. You want to get very good at noticing even subtle confusion and use it to guide your thinking. However, thinking about confusing things isn't enough. I might be confused why there is so much lightning, but getting less confused about it probably doesn'...
Here is an AI called GameNGen that generates a game in real-time as the player interacts with the model. (It simulates doom at >20fps.) It uses a diffusion model. People are only slightly better than random chance at identifying if it was generated by the AI or by the Doom program.
There are muscles in your nose I just realized. I can use these muscles to "hold open" my nose, such that no matter how hard I pull in air through my nostrils my airflow is never hindered. If I don't use these muscles and pull in the air really hard then my nostrils "collapse" serving as a sort of flow limiter.
The next time you buy a laptop, and you don't want a Mac, it's likely you want to buy one with a snapdragon CPU. That's an ARM chip, meaning you get very good battery life (just like the M-series Apple chips). On Snapdragon though you can easily run Windows, and eventually Linux (Linux support is a few months out though).
IMO the most important factor in interpersonal relations is that it needs to be possible to have engaging/useful conversations. There are many others.
The problem: Somebody who scores low on these, can be pushed up unreasonably high in your ranking through feelings of sexual desire.
The worst thing: Sexual desire drops temporarily in the short term after orgasm, and (I heard) permanently after a 2-year period.
To probe the nature of your love:
I started to use Typst. I feel a lot more productive in it. Latex feels like a slug. Typst doesn't feel like it slows me down when typing math, or code. That and the fact that it has an online collaborative editor, and that rendering is very very fast are the most important features. Here are some more:
Probably not useful but just in case here are some other medications that are prescribed for narcolepsy (i.e. stuff that makes you not tired):
Solriamfetol is supposed to be more effective than Modafinil. Possibly hard to impossible to get without a prescription. Haven't tried that yet.
Pitolisant is interesting because it has a novel mechanism of action. Possibly impossible to get even with a prescription, as it is super expensive if you don't have the right health insurance. For me, it did not work that well. Only lasted 2-4 hour...
I don't use it to write code, or really anything. Rather I find it useful to converse with it. My experience is also that half is wrong and that it makes many dumb mistakes. But doing the conversation is still extremely valuable, because GPT often makes me aware of existing ideas that I don't know. Also like you say it can get many things right, and then later get them wrong. That getting right part is what's useful to me. The part where I tell it to write all my code is just not a thing I do. Usually I just have it write snippets, and it seems pretty good... (read more)