All of Darklight's Comments + Replies

This put into well-written words a lot of thoughts I've had in the past but never been able to properly articulate. Thank you for writing this.

This sounds rather like the competing political economic theories of classical liberalism and Marxism to me. Both of these intellectual traditions carry a lot of complicated baggage that can be hard to disentangle from the underlying principles, but you seem to have a done a pretty good job of distilling the relevant ideas in a relatively apolitical manner.

That being said, I don't think it's necessary for these two explanations for wealth inequality to be mutually exclusive. Some wealth could be accumulated through "the means of production" as you call it,... (read more)

2Benquo
I agree these mechanisms can coexist. But to test and improve our models and ultimately make better decisions, we need specific hypotheses about how they interact. The OP was limited in scope because it's trying to explain why more detailed analyses like the ones I offer in The Debtors' Revolt or Calvinism as a Theory of Recovered High-Trust Agency are decision-relevant. Overall my impression is that while the situation is complex, it's frequently explicable as an interaction between a relatively small and enumerable number of "types of guy" (e.g. debtor vs creditor, depraved vs self-interested).

Another thought I just had was, could it be that ChatGPT, because it's trained to be such a people pleaser, is losing intentionally to make the user happy?

Have you tried telling it to actually try to win? Probably won't make a difference, but it seems like a really easy thing to rule out.

Also, quickly looking into how LLM token sampling works nowadays, you may also need to set the parameters top_p to 0, and top_k to 1 to get it to actually function like argmax. Looks like these can only be set through the API if you're using ChatGPT or similar proprietary LLMs. Maybe I'll try experimenting with this when I find the time, if nothing else to rule out the possibility of such a seemingly obvious thing being missed.

I've always wondered with these kinds of weird apparent trivial flaws in LLM behaviour if it doesn't have something to do with the way the next token is usually randomly sampled from the softmax multinomial distribution rather than taking the argmax (most likely) of the probabilities. Does anyone know if reducing the temperature parameter to zero so that it's effectively the argmax changes things like this at all?

1Darklight
Also, quickly looking into how LLM token sampling works nowadays, you may also need to set the parameters top_p to 0, and top_k to 1 to get it to actually function like argmax. Looks like these can only be set through the API if you're using ChatGPT or similar proprietary LLMs. Maybe I'll try experimenting with this when I find the time, if nothing else to rule out the possibility of such a seemingly obvious thing being missed.

p = (n^c * (c + 1)) / (2^c * n)

As far as I know, this is unpublished in the literature. It's a pretty obscure use case, so that's not surprising. I have doubts I'll ever get around to publishing the paper I wanted to write that uses this in an activation function to replace softmax in neural nets, so it probably doesn't matter much if I show it here.

So, my main idea is that the principle of maximum entropy aka the principle of indifference suggests a prior of 1/n where n is the number of possibilities or classes. P x 2 - 1 leads to p = 0.5 for c = 0. What I want is for c = 0 to lead to p = 1/n rather than 0.5, so that it works in the multiclass cases where n is greater than 2.

3cubefox
What's the solution?

Correlation space is between -1 and 1, with 1 being the same (definitely true), -1 being the opposite (definitely false), and 0 being orthogonal (very uncertain). I had the idea that you could assume maximum uncertainty to be 0 in correlation space, and 1/n (the uniform distribution) in probability space.

3cubefox
Not sure what you mean here, but p×2−1 would linearly transform a probability p from [0..1] to [-1..1]. You could likewise transform a correlation coefficient ϕ to [0..1] with ϕ(A,B)+12. For P(A)=P(B)=12, this would correspond to the probability of A occuring if and only if B occurs. I.e. ϕ(A,B)+12=P(A↔B) when P(A)=P(B)=0.5.
Darklight-10

I tried asking ChatGPT, Gemini, and Claude to come up with a formula that converts between correlation space to probability space while preserving the relationship 0 = 1/n. I came up with such a formula a while back, so I figure it shouldn't be hard. They all offered formulas, all of which were shown to be very much wrong when I actually graphed them to check.

5cubefox
What's correlation space, as opposed to probability space?

I was not aware of these. Thanks!

Thanks for the clarifications. My naive estimate is obviously just a simplistic ballpark figure using some rough approximations, so I appreciate adding some precision.

Darklight1-2

Also, even if we can train and run a model the size of the human brain, it would still be many orders of magnitude less energy efficient than an actual brain. Human brains use barely 20 watts. This hypothetical GPU brain would require enormous data centres of power, and each H100 GPU uses 700 watts alone.

Also, even if we can train and run a model the size of the human brain, it would still be many orders of magnitude less energy efficient than an actual brain. Human brains use barely 20 watts.

For inference on a GPT-4 level model, GPUs use much less than a human brain, about 1-2 watts (across all necessary GPUs), if we imagine slowing them down to human speed and split the power among the LLM instances that are being processed at the same time. Even for a 30 trillion parameter model, it might only get up to 30-60 watts in this sense.

each H100 GPU uses

... (read more)
Darklight5-1

I've been looking at the numbers with regards to how many GPUs it would take to train a model with as many parameters as the human brain has synapses. The human brain has 100 trillion synapses, and they are sparse and very efficiently connected. A regular AI model fully connects every neuron in a given layer to every neuron in the previous layer, so that would be less efficient.

The average H100 has 80 GB of VRAM, so assuming that each parameter is 32 bits, then you have about 20 billion per GPU. So, you'd need 10,000 GPUs to fit a single instance of a huma... (read more)

5Vladimir_Nesov
What I can find is 20,000 A100s. With 10K A100s, which are 300e12 FLOP/s in BF16, you'd need 6 months (so this is still plausible) at 40% utilization to get the rumored 2e25 FLOPs. We know Llama-3-405B is 4e25 FLOPs and approximately as smart, and it's dense, so you can get away with fewer FLOPs in a MoE model to get similar capabilities, which supports the 2e25 FLOPs figure from the premise that original GPT-4 is MoE. H200s are 140 GB, and there are now MI300Xs with 192 GB. B200s will also have 192 GB. Training is typically in BF16, though you need enough space for gradients in addition to parameters (and with ZeRO, optimizer states). On the other hand, inference in 8 bit quantization is essentially indistinguishable from full precision. The word is, next year it's 500K B200s[1] for Microsoft. And something in the gigawatt range from Google as well. ---------------------------------------- 1. He says 500K GB200s, but also that it's 1 gigawatt all told, and that they are 2-3x faster than H100s, so I believe he means 500K B200s. In various places, "GB200" seems to ambiguously refer either to a 2-GPU board with a Grace CPU, or to one of the B200s on such a board. ↩︎
1Darklight
Also, even if we can train and run a model the size of the human brain, it would still be many orders of magnitude less energy efficient than an actual brain. Human brains use barely 20 watts. This hypothetical GPU brain would require enormous data centres of power, and each H100 GPU uses 700 watts alone.

I ran out of the usage limit for GPT-4o (seems to just be 10 prompts every 5 hours) and it switched to GPT-4o-mini. I tried asking it the Alpha Omega question and it made some math nonsense up, so it seems like the model matters for this for some reason.

Darklight160

So, a while back I came up with an obscure idea I called the Alpha Omega Theorem and posted it on the Less Wrong forums. Given how there's only one post about it, it shouldn't be something that LLMs would know about. So in the past, I'd ask them "What is the Alpha Omega Theorem?", and they'd always make up some nonsense about a mathematical theory that doesn't actually exist. More recently, Google Gemini and Microsoft Bing Chat would use search to find my post and use that as the basis for their explanation. However, I only have the free version of ChatGPT... (read more)

3Darklight
I ran out of the usage limit for GPT-4o (seems to just be 10 prompts every 5 hours) and it switched to GPT-4o-mini. I tried asking it the Alpha Omega question and it made some math nonsense up, so it seems like the model matters for this for some reason.
Darklight10

I'm wondering what people's opinions are on how urgent alignment work is. I'm a former ML scientist who previously worked at Maluuba and Huawei Canada, but switched industries into game development, at least in part to avoid contributing to AI capabilities research. I tried earlier to interview with FAR and Generally Intelligent, but didn't get in. I've also done some cursory independent AI safety research in interpretability and game theoretic ideas my spare time, though nothing interesting enough to publish yet.

My wife also recently had a baby, and carin... (read more)

Darklight10

Thanks for the reply!

So, the main issue I'm finding with putting them all into one proposal is that there's a 1000 character limit on the main summary section where you describe the project, and I cannot figure out how to cram multiple ideas into that 1000 characters without seriously compromising the quality of my explanations for each.

I'm not sure if exceeding that character limit will get my proposal thrown out without being looked at though, so I hesitate to try that. Any thoughts?

6habryka
Oh, hmm, I sure wasn't tracking a 1000 character limit. If you can submit it, I wouldn't be worried about it (and feel free to put that into your references section). I certainly have never paid attention to whether anyone stayed within the character limit.
Darklight20

I already tried discussing a very similar concept I call Superrational Signalling in this post. It got almost no attention, and I have doubts that Less Wrong is receptive to such ideas.

I also tried actually programming a Game Theoretic simulation to try to test the idea, which you can find here, along with code and explanation. Haven't gotten around to making a full post about it though (just a shortform).

1Ryo
Thank you for the references! I'm reading your writings, it's interesting  I posted the super-cooperation argument while expecting that LessWrong would likely not be receptive, but I'm not sure which community would engage with all this and find it pertinent at this stage More concrete and empirical productions seems needed
Darklight10

So, I have three very distinct ideas for projects that I'm thinking about applying to the Long Term Future Fund for. Does anyone happen to know if it's better to try to fit them all into one application, or split them into three separate applications?

3habryka
Three is a bit much. I am honestly not sure what's better. My guess is putting them all into one. (Context, I am one of the LTFF fund managers)

Recently I tried out an experiment using the code from the Geometry of Truth paper to try to see if using simple label words like "true" and "false" could substitute for the datasets used to create truth probes. I also tried out a truth probe algorithm based on classifying with the higher cosine similarity to the mean vectors.

Initial results seemed to suggest that the label word vectors were sorta acceptable, albeit not nearly as good (around 70% accurate rather than 95%+ like with the datasets). However, testing on harder test sets showed much worse accur... (read more)

Update: I made an interactive webpage where you can run the simulation and experiment with a different payoff matrix and changes to various other parameters.

So, I adjusted the aggressor system to work like alliances or defensive pacts instead of a universal memory tag. Basically, now players make allies when they both cooperate and aren't already enemies, and make enemies when defected against first, which sets all their allies to also consider the defector an enemy. This, doesn't change the result much. The alliance of nice strategies still wins the vast majority of the time.

I also tried out false flag scenarios where 50% of the time the victim of a defect first against non-enemy will actually be mistaken for... (read more)

Admittedly this is a fairly simple set up without things like uncertainty and mistakes, so yes, it may not really apply to the real world. I just find it interesting that it implies that strong coordinated retribution can, at least in this toy set up, be useful for shaping the environment into one where cooperation thrives, even after accounting for power differentials and the ability to kill opponents outright, which otherwise change the game enough that straight Tit-For-Tat doesn't automatically dominate.

It's possible there are some situations where this... (read more)

Okay, so I decided to do an experiment in Python code where I modify the Iterated Prisoner's Dilemma to include Death, Asymmetric Power, and Aggressor Reputation, and run simulations to test how different strategies do. Basically, each player can now die if their points falls to zero or below, and the payoff matrix uses their points as a variable such that there is a power difference that affects what happens. Also, if a player defects first in any round of any match against a non-aggressor, they get the aggressor label, which matters for some strategies t... (read more)

1Darklight
Update: I made an interactive webpage where you can run the simulation and experiment with a different payoff matrix and changes to various other parameters.
2Dagon
So, the aggressor tag is a way to keep memory across games, so they're not independent.  I wonder what happens when you start allowing more complicated reputation (including false accusations of aggression). I feel like any interesting real-world implications are probably fairly tenuous.  I'd love to hear some and learn that I'm wrong.

I was recently trying to figure out a way to calculate my P(Doom) using math. I initially tried just making a back of the envelope calculation by making a list of For and Against arguments and then dividing the number of For arguments by the total number of arguments. This led to a P(Doom) of 55%, which later got revised to 40% when I added more Against arguments. I also looked into using Bayes Theorem and actual probability calculations, but determining P(E | H) and P(E) to input into P(H | E) = P(E | H) * P(H) / P(E) is surprisingly hard and confusing.

Minor point, but the apology needs to sound sincere and credible, usually by being specific about the mistakes and concise and to the point and not like, say, Bostrom's defensive apology about the racist email a while back. Otherwise you can instead signal that you are trying to invoke the social API call in a disingenuous way, which can clearly backfire.

Things like "sorry you feel offended" also tend to sound like you're not actually remorseful for your actions and are just trying to elicit the benefits of an apology. None of the apologies you described sound anything like that, but it's a common failure state among the less emotionally mature and the syncophantic.

Expanding on this...

The "standard format" for calling the apology API has three pieces:

  • "I'm sorry"/"I apologize"
  • Explicitly state the mistake/misdeed
  • Explicitly state either what you should have done instead, or will do differently next time

Notably, the second and third bullet points are both costly signals: it's easier for someone to state the mistake/misdeed, and what they'll would/will do differently, if they have actually updated. Thus, those two parts contribute heavily to the apology sounding sincere.

I have some ideas and drafts for posts that I've been sitting on because I feel somewhat intimidated by the level of intellectual rigor I would need to put into the final drafts to ensure I'm not downvoted into oblivion (something a younger me experienced in the early days of Less Wrong).

Should I try to overcome this fear, or is it justified?

For instance, I have a draft of a response to Eliezer's List of Lethalities post that I've been sitting on since 2022/04/11 because I doubted it would be well received given that it tries to be hopeful and, as a former... (read more)

4mike_hawke
Personally, I find shortform to be an invaluable playground for ideas. When I get downvoted, it feels lower stakes. It's easier to ignore aloof and smugnorant comments, and easier to update on serious/helpful comments. And depending on how it goes, I sometimes just turn it into a regular post later, with a note at the top saying that it was adapted from a shortform. If you really want to avoid smackdowns, you could also just privately share your drafts with friends first and ask for respectful corrections. Spitballing other ideas, I guess you could phrase your claims as questions, like "have objections X, Y, or Z been discussed somewhere already? If so, can anyone link me to those discussions?" Seems like that could fail silently though, if an over-eager commenter gives you a link to low-quality discussion. But there are pros and cons for every course of action/inaction.
Answer by Darklight62

I would be exceedingly cautious about this line of reasoning. Hypomania tends to not be sustainable, with a tendency to either spiral into a full blown manic episode, or to exhaust itself out and lead to an eventual depressive episode. This seems to have something to do with the characteristics of the thoughts/feelings/beliefs that develop while hypomanic, the cognitive dynamics if you will. You'll tend to become increasingly overconfident and positive to the point that you will either start to lose contact with reality by ignoring evidence to the contrary... (read more)

I still remember when I was a masters student presenting a paper at the Canadian Conference on AI 2014 in Montreal and Bengio was also at the conference presenting a tutorial, and during the Q&A afterwards, I asked him a question about AI existential risk. I think I worded it back then as concerned about the possibility of Unfriendly AI or a dangerous optimization algorithm or something like that, as it was after I'd read the sequences but before "existential risk" was popularized as a term. Anyway, he responded by asking jokingly if I was a journalist... (read more)

Answer by Darklight91

The average human lifespan is about 70 years or approximately 2.2 billion seconds. The average human brain contains about 86 billion neurons or roughly 100 trillion synaptic connections. In comparison, something like GPT-3 has 175 billion parameters and 500 billion tokens of data. Assuming very crudely weight/synapse and token/second of experience equivalence, we can see that the human model's ratio of parameters to data is much greater than GPT-3, to the point that humans have significantly more parameters than timesteps (100 trillion to 2.2 billion), whi... (read more)

I recently interviewed with Epoch, and as part of a paid work trial they wanted me to write up a blog post about something interesting related to machine learning trends. This is what I came up with:

http://www.josephius.com/2022/09/05/energy-efficiency-trends-in-computation-and-long-term-implications/

I should point out that the logic of the degrowth movement follows from a relatively straightforward analysis of available resources vs. first world consumption levels.  Our world can only sustain 7 billion human beings because the vast majority of them live not at first world levels of consumption, but third world levels, which many would argue to be unfair and an unsustainable pyramid scheme.  If you work out the numbers, if everyone had the quality of life of a typical American citizen, taking into account things like meat consumption to arabl... (read more)

I'm using the number calculated by Ray Kurzweil for his book, the Age of Spiritual Machines from 1999.  To get that figure, you need 100 billion neurons firing every 5 ms, or 200 Hz.  That is based on the maximum firing rate given refractory periods.  In actuality, average firing rates are usually lower than that, so in all likelihood the difference isn't actually six orders of magnitude.  In particular, I should point out that six orders of magnitude is referring to the difference between this hypothetical maximum firing brain and the ... (read more)

Okay, so I contacted 80,000 hours, as well as some EA friends for advice.  Still waiting for their replies.

I did hear from an EA who suggested that if I don't work on it, someone else who is less EA-aligned will take the position instead, so in fact, it's slightly net positive for myself to be in the industry, although I'm uncertain whether or not AI capability is actually funding constrained rather than personal constrained.

Also, would it be possible to mitigate the net negative by choosing to deliberately avoid capability research and just take an ML engineering job at a lower tier company that is unlikely to develop AGI before others and just work on applying existing ML tech to solving practical problems?

I previously worked as a machine learning scientist but left the industry a couple of years ago to explore other career opportunities.  I'm wondering at this point whether or not to consider switching back into the field.  In particular, in case I cannot find work related to AI safety, would working on something related to AI capability be a net positive or net negative impact overall?

1[comment deleted]
1Yonatan Cale
Working on AI Capabilities: I think this is net negative, and I'm commenting here so people can [V] if they agree or [X] if they disagree. Seems like habryka agrees?  Seems like Kaj disagrees? I think it wouldn't be controversial to advise you to at least talk to 80,000 hours about this before you do it, as some safety net to not do something you don't mean to by mistake? Assuming you trust them. Or perhaps ask someone you trust. Or make your own gears-level model. Anyway, seems like an important decision to me

Even further research shows the most recent Nvidia RTX 3090 is actually slightly more efficient than the 1660 Ti, at 36 TeraFlops, 350 watts, and 2.2 kg, which works out to 0.0001 PetaFlops/Watt and 0.016 PetaFlops/kg.  Once again, they're within an order of magnitude of the supercomputers.

So, I did some more research, and the general view is that GPUs are more power efficient in terms of Flops/watt than CPUs, and the most power efficient of those right now is the Nvidia 1660 Ti, which comes to 11 TeraFlops at 120 watts, so 0.000092 PetaFlops/Watt, which is about 6x more efficient than Fugaku.  It also weighs about 0.87 kg, which works out to 0.0126 PetaFlops/kg, which is about 7x more efficient than Fugaku.  These numbers are still within an order of magnitude, and also don't take into account the overhead costs of things like coo... (read more)

3Darklight
Even further research shows the most recent Nvidia RTX 3090 is actually slightly more efficient than the 1660 Ti, at 36 TeraFlops, 350 watts, and 2.2 kg, which works out to 0.0001 PetaFlops/Watt and 0.016 PetaFlops/kg.  Once again, they're within an order of magnitude of the supercomputers.

Another thought is that maybe Less Wrong itself, if it were to expand in size and become large enough to roughly represent humanity, could be used as such a dataset.

So, I had a thought.  The glory system idea that I posted about earlier, if it leads to a successful, vibrant democratic community forum, could actually serve as a kind of dataset for value learning.  If each post has a number attached to it that indicates the aggregated approval of human beings, this can serve as a rough proxy for a kind of utility or Coherent Aggregated Volition.

Given that individual examples will probably be quite noisy, but averaged across a large amount of posts, it could function as a real world dataset, with the post conte... (read more)

1Darklight
Another thought is that maybe Less Wrong itself, if it were to expand in size and become large enough to roughly represent humanity, could be used as such a dataset.

A further thought is that those with more glory can be seen almost as elected experts.  Their glory is assigned to them by votes after all.  This is an important distinction from an oligarchy.  I would actually be inclined to see the glory system as located on a continuum between direct demcracy and representative democracy.

So, keep in mind that by having the first vote free and worth double the paid votes does tilt things more towards democracy.  That being said, I am inclined to see glory as a kind of proxy for past agreement and merit, and a rough way to approximate liquid democracy where you can proxy your vote to others or vote yourself.

In this alternative "market of ideas" the ideas win out because people who others trust to have good opinions are able to leverage that trust.  Decisions over the merit of the given arguments are aggregated by vote.  As lon... (read more)

Perhaps a nitpick detail, but having someone rob them would not be equivalent, because the cost of the action is offset by the ill-gotten gains.  The proposed currency is more directly equivalent to paying someone to break into the target's bank account and destroying their assets by a proportional amount so that no one can use them anymore.

As for the more general concerns:

Standardized laws and rules tend in practice to disproportionately benefit those with the resources to bend and manipulate those rules with lawyers.  Furthermore, this proposal... (read more)

As for the cheaply punishing prolific posters problem, I don't know a good solution that doesn't lead to other problems, as forcing all downvotes to cost glory makes it much harder to deal with spammers who somehow get through the application process filter.  I had considered an alternative system in which all votes cost glory, but then there's no way to generate glory except perhaps by having admins and mods gift them, which could work, but runs counter to the direct democracy ideal that I was sorta going for.

What I meant was you could farm upvotes on your posts.  Sorry.  I'll edit it for clarity.

And further to clarify, you'd both be able to gift glory and also spend glory to destroy other people's glory, at the mentioned exchange rate.

The way glory is introduced into the system is that any given post allows everyone one free vote on them that costs no glory.

So, I guess I should clarify, the idea is that you can both gift glory, which is how you gain the ability to post, and also you gain or lose glory based on people's upvotes and downvotes on your posts.

1Darklight
And further to clarify, you'd both be able to gift glory and also spend glory to destroy other people's glory, at the mentioned exchange rate. The way glory is introduced into the system is that any given post allows everyone one free vote on them that costs no glory.

I have been able to land interviews at a rate of about 8/65 or 12% of the positions I apply to.  My main assumption is that the timing of COVID-19 is bad, and I'm also only looking at positions in my geographical area of Toronto.  It's also possible that I was overconfident early on and didn't prep enough for the interviews I got, which often involved general coding challenges that depended on data structures and algorithms that I hadn't studied since undergrad, as well as ML fundamentals for things like PCA that I hadn't touched in a long time a... (read more)

Actually, apparently I forgot about the proper term: Utilitronium

0MrMind
Well, it depends. Utilitronium is matter optimized for utility. Friendtronium is (let's posit) matter optimized to run a FAI. Not necessarily the two are the same thing.

I would urge you to go learn about QM more. I'm not going to assume what you do/don't know, but from what I've learned about QM there is no argument for or against any god.

Strictly speaking it's not something that is explicitly stated, but I like to think that the implication flows from a logical consideration of what MWI actually entails. Obviously MWI is just one of many possible alternatives in QM as well, and the Copenhagen Interpretation obviously doesn't suggest anything.

This also has to due with the distance between the moon and the earth and

... (read more)
2Viliam
Perhaps in some other universe the local people are happy that the majority of their universe does not consist of dark matter and dark energy, and that their two moons have allowed them to find out some laws of physics more easily.

Interesting, what is that?

The idea of theistic evolution is simply that evolution is the method by which God created life. It basically says, yes, the scientific evidence for natural selection and genetic mutation is there and overwhelming, and accepts these as valid, while at the same time positing that God can still exist as the cause that set the universe and evolution in motion through putting in place the Laws of Nature. It requires not taking the six days thing in the Bible literally, but rather metaphorically as being six eons of time, or some ... (read more)

0MrMind
I was asking because positronium is an already estabilished name for an exotic atom, made of an electron and a positron. I suggest you change your positronium into something like friendtronium, to avoid confusion.
Load More