LESSWRONG
LW

All of FireStormOOO's Comments + Replies

A Straightforward Explanation of the Good Regulator Theorem

(I have been busy, hence the delay.)

No worries, likewise.

Most centrally I think we're seeing fundamentally different things with the causal graph. Or more to the point, I haven't the slightest idea how one is supposed to do any useful reasoning with time varying nodes without somehow expanding it to consider how one node's function and/or time series effects it's leaf nodes (or another way, specifically what temporal relation the arrow represents). It also seems fairly inescapable to me that any way you consider that relation, an actual causal ... (read more)

Fictional Thinking and Real Thinking

FireStormOOO15d50

Do you think they're actually struggling to distinguish real from fiction, or merely struggling to keep two complex distinct worlds in their working memory/stack/context window and keep the details straight?

E.g. many animals will play and chase, understanding both that there's different rules because it's play, yet still transfer the skills to actually hunting or fighting. Seems more a matter of degree?

4Gordon Seidoh Worley15d

I think they simply lack the mental machinery to think about hypotheticals in general. So there's no struggle really, because they have no conception of fiction. It is a matter of degrees, though. Some animals show the ability for social deception. That's a limited form of fictional thinking, and is probably the basis from which humans developed the skill (because chimps can also do some amount of social deception). As for play fighting, I think this is better understood as a different behavioral mode. It doesn't actually require conceptualization, just an ability to engage in a ritualized behavior with others that may be similar to but is safely different from real fighting. Also don't forget that play fights sometimes accidentally become real fights!

A Straightforward Explanation of the Good Regulator Theorem

FireStormOOO15d20

It sounded previously like you were making the strong claim that this setup can't be applied to a closed control loop at all, even in e.g. the common (approximately universal?) case where we have a delay between the regulator's action and it's being able to measure that action's effect. That's mostly what I was responding to; the chaining that Alfred suggested in the sibling comment seems sensible enough to me.

It occurs to me that the household thermostat example is so non-demanding as to not be a poor intuition pump. I implicitly made the jump... (read more)

2Richard_Kennaway11d

(I have been busy, hence the delay.) I am making that claim. Closed loops have circular causal links between the (time-varying) variables. The SZR diagram that I originally objected to is acyclic, therefore it does not apply to closed loops. Loop delays are beside the point. Sampling on that time scale is not required and may just degrade performance. You are assuming that that tighter control demands the sort of more complicated algorithms that you are imagining, that predict how much heat to inject, based on a model of the whole environment, and so on. Let's look outward at the real world. All you need for precision temperature control is to replace the bang-bang control with a PID controller and a scheme for automatically tuning the PID parameters, and there you are. There is nothing in the manual for the ThermoClamp device to suggest a scheme of your suggested sort. In particular, like the room thermostat, the only thing it senses is the actual temperature. Nothing else. For this it uses a thermocouple, which is a continuous-time device, not sampled. There is also no sign of any model. I don't know how this particular device tunes its PID parameters (probably a trade secret), but googling for how to auto-tune a PID has not turned up anything suggesting a model, only injecting test signals and adjusting the parameters to optimise observed performance -- observed solely through measuring the controlled variable. The early automatic pilots were analogue devices operating in continuous time. Everything is digital these days, but a modern automatic pilot is still sampling the sensors many times a second, and I'm sure that's also true of the digital parts of the ThermoClamp. The time step is well below the characteristic timescales of the system being controlled. It has to be. People talk about eliminating the cycles by unrolling. I believe this does not work. In causal graphs as generally understood, each of the variables is time-varying. In the unrolled versi

A Straightforward Explanation of the Good Regulator Theorem

FireStormOOO17d20

How do you figure a thermostat directly measures what it's controlling? It controls heat added/removed per unit time, typically just more/less/no change, and measures the resulting temperature at a single point on typically a minute+ delay due to the dynamics of the system (air and heat take time to diffuse, even with a blower). Any time step sufficiently shorter than that delay is going to work the same. The current measurement depends on what the thermostat did tens of seconds if not minutes previously.

There are times the continuous/dis... (read more)

2Richard_Kennaway16d

This is partly a terminology issue. By "controlling a variable" I mean "taking actions as necessary to keep that variable at some reference level." So I say that the thermostat is controlling the temperature of the room (or if you want to split hairs, the temperature of the temperature sensor—suitably siting that sensor is an important part of a practical system). In the same sense, the body controls its core temperature, its blood oxygenation level, its posture[1], and many other things, and its actions to obtain those ends include sweating, breathing, changing muscle tensions, etc. By the "output" or "action" of a control system I mean the actions it takes to keep the controlled variable at the reference. For the thermostat, this is turning the heat source on and off. It is not "controlling" (in the sense I defined) the rate of adding heat. The thermostat does not know how much heat is being delivered, and does not need to. The resulting behaviour of the system is to keep the temperature of the room between two closely spaced levels: the temperature at which it turns the heat on, and the slightly higher temperature at which it turns the heat off. The rate at which the temperature goes up or down does not matter, provided the heat source is powerful enough to replenish all the energy leaking out of the walls, however cold it gets outside. If the heat source were replaced by one delivering twice as much power, the performance of the thermostat would be unchanged, except for being able to cope with more extreme cold weather. The only delays in the thermostat itself are the time it takes for a mechanical switch to operate (milliseconds) and the time it takes for heat production to reach the sensor (minutes). These are so much faster than the changes in temperature due to the weather outside that it is most simply treated as operating in continuous time. There would be no practical benefit from sampling the temperature discretely and seeing how slow a sample rate yo

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

FireStormOOO4mo40

I wonder if you could produce this behavior at all in a model that hadn't gone through the safety RL step. I suspect that all of the examples have in common that they were specifically instructed against during safety RL, alongside "don't write malware", and it was simpler to just flip the sign on the whole safety training suite.

Same theory would also suggest your misaligned model should be able to be prompted to produce contrarian output for everything else in the safety training suite too. Just some more guesses, the misaligned model would al... (read more)

5Owain_Evans4mo

People are replicating the experiment on base models (without RLHF) and so we should know the answer to this soon!

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

FireStormOOO4mo20

Yikes. So the most straightforward take: When trained to exhibit a specific form of treachery in one context, it was apparently simpler to just "act more evil" as broadly conceptualized by the culture in the training data. And also seemingly, "act actively unsafe and harmful", as defined by the existing safety RL. process. Most of those examples seem to just be taking the opposite position to the safety training, presumably in proportion to how heavily it featured in the safety training (e.g. "never ever ever say anything nice about Nazis... (read more)

When Is Insurance Worth It?

FireStormOOO6mo20

Hmm, I guess I see why other calculators have at least some additional heuristics and aren't straight Kelly. Going bankrupt is not infinitely bad in the US. If the insured has low wealth, there's likely a loan attached to any large asset that really complicates the math. Making W just be "household wealth" also doesn't model "I can replace the loss next paycheck". I'm not sure what exactly the correct notion of wealth is here, but if wealth is small compared to future earnings, and replacing the loss can be deferred, these assumptio... (read more)

When Is Insurance Worth It?

FireStormOOO6mo*41

This seems like a very handy calculator to have bookmarked.

~~I think I did find a bug:~~ At the low end it's making some insane recommendations. E.g. with wealth W and a 50% chance of loss W (50% chance of getting wiped out), the insurance recommendation is any premium up to W.

Wealth $10k, risk 50% on $9999 loss, recommends insure for $9900 premium.

~~That log(W-P) term is shooting off towards -infinity and presumably breaking something?~~

Edit: As papetoast points out, this is a faithful implementation of the Kelly criterion and is not a bug. Rather, Ke... (read more)

1papetoast6mo

The math is correct if you're trying to optimize log(Wealth). log(10000)=4 and log(1)=0 so the mean is log(100)=2. This model assumes going bankrupt is infinitely bad, which is not accurate of an assumption, but it is not a bug.

Cohabitive Games so Far

FireStormOOO7mo20

Related, I noticed Civ VI also really missed the mark with that mechanic. I found that a great strategy, having a modest lead on tech, was to lean into coal power, which has the best bonuses, get your seawalls built to stop your coastal cities from flooding, and flood everyone else with sea-level rise. Only one player wins, so anything to sabotage others in the endgame will be very tempting.

Rise of Nations had an "Armageddon counter" on the use of nuclear weapons, which mostly resulted in exactly the behavior you mentioned - get 'em first and e... (read more)

Quantum Immortality: A Perspective if AI Doomers are Probably Right

FireStormOOO8mo43

Your examples seem to imply that believing QI means such an agent would in full generality be neutral on an offer to have a quantum coin tossed, where they're killed in their sleep on tails, since they only experience the tosses they win. Presumably they accept all such trades offering epsilon additional utility. And presumably other agents keep making such offers since the QI agent doesn't care what happens to their stuff in worlds they aren't in. Thus such an agent exists in an ever more vanishingly small fraction of worlds as they cont... (read more)

3avturchin8mo

I think you are right. We will not observe QI agents and it is a bad policy to recommend it as I will end in empty world soon. Now caveats. My measure declines because of branching anyway very quickly, so no problem. There is an idea of civilization-level quantum suicide by Paul Almond. In that case, the whole civilization performs QI coin trick, and no problem with empty world - but can explain Fermi paradox. QI make sense from first-person perspective, but not from third.

Why is o1 so deceptive?

Answer by FireStormOOONov 07, 202440

I'll note that malicious compliance is a very common response to being provided a task that's not straightforwardly possible with the resources available, and no channel to simply communicate that without retaliation. BS an answer, or technically correct/rules as written response, is often just the best available strategy if one isn't in a position to fix the evaluator's broken incentives.

An actual human's chain of thought would be a lot spicier if their boss ask them to produce a document with working links without providing internet access.

video games > IQ tests

FireStormOOO9mo20

"English" keeps ending up as a catch-all in K-12 for basically all language skills and verbal reasoning skills that don't obviously fit somewhere else. Read and summarize fiction - English, Write a persuasive essay - English, grammar pedantry - English, etc.

The Asshole Filter

FireStormOOO10mo30

That link currently redirects the reader to https://siderea.dreamwidth.org/1209794.html

(just in case the old one stops working)

3b. Formal (Faux) Corrigibility

FireStormOOO1y11

Good clarification; not just the amount of influence, something about the way influence is exercised being unsurprising given the task. Central not just in terms of "how much influence", but also along whatever other axes the sort of influence could vary?

I think if the agent's action space is still so unconstrained there's room to consider benefit or harm that flows through principle value modification it's probably still been given too much latitude. Once we have informed consent, because the agent has has communicated the benefits and harms a... (read more)

3b. Formal (Faux) Corrigibility

FireStormOOO1yΩ240

WRT non-manipulation, I don't suppose there's an easy way to have the AI track how much potentially manipulative influence it's "supposed to have" in the context and avoid exercising more than that influence?

Or possibly better, compare simple implementations of the principle's instructions, and penalize interpretations with large/unusual influence on the principle's values. Preferably without prejudicing interventions straightforwardly protecting the principle's safety and communication channels.

Principle should, for example, be able to ask the AI to... (read more)

1Max Harms1y

That's an interesting proposal! I think something like it might be able to work, though I worry about details. For instance, suppose there's a Propogandist who gives resources to agents that brainwash their principals into having certain values. If "teach me about philosophy" comes with an influence budget, it seems critical that the AI doesn't spend that budget trading with Propagandist, and instead does so in a more "central" way. Still, the idea of instructions carrying a degree of approved influence seems promising.

The Incredible Fentanyl-Detecting Machine

FireStormOOO1y10

You're very likely correct IMO. The only thing I see pulling in the other direction is that cars are far more standardized than humans, and a database of detailed blueprints for every make and model could drastically reduce the resolution needed for usefulness. Especially if the action on a cursory detection is "get the people out of the area and scan it harder", not "rip the vehicle apart".

Slack matters more than any outcome

FireStormOOO1y30

This is the first text talking about goals I've read that meaningfully engages with "but what if you were (partially) wrong about what you want" instead of simply glorifying "outcome fixation". This seems like a major missing piece in most advice about goals. That the most important thing about your goals is that they're actually what you want. And discovering that may not be the case is a valid reason to tap the brakes and re-evaluate.

When is a mind me?

FireStormOOO1y20

(Assuming a frame of materialism, physicalism, empiricism throughout even if not explicitly stated)

Some of your scenarios that you're describing as objectionable would reasonably be described as emulation in an environment that you would probably find disagreeable even within the framework of this post. Being emulated by a contraption of pipes and valves that's worse in every way than my current wetware is, yeah, disagreeable even if it's kinda me. Making my hardware less reliable is bad. Making me think slower is bad. Making it eas... (read more)

When is a mind me?

FireStormOOO1y10

Realistically I doubt you'd even need to be sure it works, just reasonably confident. Folks step on planes all the time and those do on rare occasion fail to deliver them intact at the other terminal.

When is a mind me?

FireStormOOO1y10

Within this framework, whether or not you "feel that continuity" would mostly be a fact about the ontology your mindstate uses thinking about teleportation. Everything in this post could be accurate and none of it would be incompatible with you having an existential crisis upon being teleported, freaking out upon meeting yourself, etc.

Nor does anything here seem to make a value judgement about what the copy of you should do if told they're not allowed to exist. Attempting revolution seems like a perfectly valid response; self defense is held as... (read more)

Apologizing is a Core Rationalist Skill

FireStormOOO1y-1-1

There's a presumption you're open to discussing on a discussion forum, not just grandstanding. Strong downvoted much of this thread for the amount of my time you've wasted trolling.

Making every researcher seek grants is a broken model

FireStormOOO1y20

Bell labs, Xerox park, etc were AFAIK were mostly privately funded research labs that existed for decades and churned out patents that may as well have been money printers. When AT&T (Bell Labs) was broken up, that research all but started the modern telecom and tech industry, which is now something like 20%+ of the stock market. If you attribute even a tiny fraction of that to Bell Labs it's enough to fund another 1000 times over.

The missing piece arguably is executive teams with a 25 year vision instead of a 25 week vision, AND the instit... (read more)

Making every researcher seek grants is a broken model

FireStormOOO1y40

Govt. spending is a ratchet that only goes one direction, replacing dysfunctional agencies costs jobs and makes political enemies. Reform might be more practical, but much like people, very hard to reform an agency that doesn't want to change. You'd be talking about sustained expenditure of political capital, the sort of thing that requires an agency head who's invested in the change and popular enough with both parties to get to spend a few administrations working at it.

Edit: I answered separately above with regards to private industry.

Apologizing is a Core Rationalist Skill

FireStormOOO1y2-1

Again you're saying that without engaging with any of my arguments or giving me any more of your reasoning to consider. Unless you care to share substantially more of your reasoning, I don't see much point continuing this?

Joseph Van Name1y120

I do not care to share much more of my reasoning because I have shared enough and also because there is a reason that I have vowed to no longer discuss except possibly with lots of obfuscation. This discussion that we are having is just convincing me more that the entities here are not the entities I want to have around me at all. It does not do much good to say that the community here is acting well or to question my judgment about this community. It will do good for the people here to act better so that I will naturally have a positive judgment about this community.

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

FireStormOOO1y10

That is a big part of the threat here. Many of the current deployments are many steps removed from anyone reading research papers. E.g. sure, people at MS and OpenAI involved with that roll-out are presumably up on the literature. But the IT director deciding when and how to deploy copilot, what controls need to be in place, etc? Trade publications, blogs, maybe they ask around on Reddit to see what others are doing.

On "Geeks, MOPs, and Sociopaths"

FireStormOOO1y30

Related, how does spin-off subcultures fit into this model? E.g. in music you have people that consume an innovation in one genre, then reinvent it in their own scene where they're a creator. I think there's similar dynamics in various LW adjacent subcultures, though I'm not up enough on detailed histories to comment.

2Viliam1y

If the spin-off group identifies differently and meets at different places, that is okay, because it does not prevent the original group from their original ways. I agree. Less Wrong is a well-defended fortress. The spin-off subcultures have their own online places, such as Astral Codex Ten, Effective Altruism Forum, the places where post-rats meet, etc. (Even if we copy or link each other's articles, it is always selected articles, which are then discussed by a different audience. In music, I guess an analogy would be a song that is halfway between two genres, being played at different festivals for different audiences.)

On "Geeks, MOPs, and Sociopaths"

FireStormOOO1y114

For less loaded terms, maybe Create, Consume, Exploit or Create, Enjoy, Exploit as the set of actions available. Looks like loosely what was settled on above.

Where exploit more naturally captures things like soulless commercialization and others low key taking advantage of those enjoying the scene.

Consume in the context or rationalists would more be people who read the best techniques on offer and then go try to use them for things that aren't "advancing the art" itself, like addressing x-risk.

3FireStormOOO1y

Apologizing is a Core Rationalist Skill

FireStormOOO1y0-1

You're still hammering on stuff I never disagreed with in the first place. In so far as I don't already understand all the math (or math notation) I'd need to follow this, that's a me problem not a you problem, and having a pile of cool papers I want to grok is prime motivation for brushing up on some more math. I'm definitely not down-voting merely on that.

What I'm mostly trying to get across is just how large of a leap of logic you're making from [post got 2 or 3 downvotes] => [everyone here hates math]. There's got to be at least 3 ... (read more)

-2Joseph Van Name1y

You are judging my reasoning without knowing all that went into my reasoning. That is not good.

3Joseph Van Name1y

I will work with whatever data I have, and I will make a value judgment based on the information that I have. The fact that Karma relies on very small amounts of information is a testament to a fault of Karma, and that is further evidence of how the people on this site do not want to deal with mathematics. And the information that I have indicates that there are many people here who are likely to fall for more scams like FTX. Not all of the people here are so bad, but I am making a judgment based on the general atmosphere here. If you do not like my judgment, then the best thing would be to try to do better. If this site has made a mediocre impression on me, then I am not at fault for the mediocrity here.

Apologizing is a Core Rationalist Skill

FireStormOOO1y0-1

Any conversation about karma would necessarily involve talking about what does and doesn't factor into votes, likely both here and in the internet or society at large. Not thinking we're getting anywhere on that point.

I've already said clearly and repeatedly I don't have a problem with math posts and I don't think others do either. You're not going to get what you want by continuing to straw-man myself and others. I disagree with your premise you've thus far failed to acknowledge or engage with any of those points.

Joseph Van Name1y*100

Let's see whether the notions that I have talked about are sensible mathematical notions for machine learning.

Tensor product-Sometimes data in a neural network has tensor structure. In this case, the weight matrices should be tensor products or tensor sums. Regarding the structure of the data works well with convolutional neural networks, and it should also work well for data with tensor structure to it.

Trace-The trace of a matrix measures how much the matrix maps vectors onto themselves since

$Tr (A) = c \cdot E (⟨ A v, v ⟩)$ where $v$ follows the multivariat... (read more)

The LessWrong 2022 Review

FireStormOOO1y10

Ah, gotcha. I had gotten the other impression from the thread in aggregate.

The LessWrong 2022 Review

FireStormOOO1y10

If you're selling them at unit cost you aren't selling at cost, you're straightforwardly selling at a loss. That's definitely not what I'm thinking of when someone tells me they're selling at cost.

2habryka1y

(We're not selling them at marginal/unit cost, we were selling them so that roughly a whole print run breaks even, including some budget for labor-time/opportunity-cost, but less than people's full salaries for that period)

Why are people unkeen to immortality that would come from technological advancements and/or AI?

Answer by FireStormOOOJan 18, 202420

For everyone who gets curious and challenges (or even evaluates on the merits) the approved right answers they learned from their culture, there's dozens more who for whatever reason don't. "Who am I to challenge <insert authority>", "Why should I think I know better?", "How am I supposed to know what's true?" (rhetorically, not expecting an answer exists). And a thousand other rationalizations besides.

And then of those who try, most just find another authority they like better and end their inquiry - independent thinking is hard w... (read more)

What do people colloquially mean by deep breathing? Slow, large, or diaphragmatic?

Answer by FireStormOOOJan 18, 202441

I've always taken that as hold average volumetric flow rate constant or slightly reduce, reduce the rate at which breaths are taken significantly, breath deeper (more air at once) to compensate.

The use of the phrase "deep breath and hold" is also consistent with max lung volume == deep breath.

Apologizing is a Core Rationalist Skill

FireStormOOO1y21

Wouldn't be engaging at all if I didn't think there was some truth to what you're saying about the math being important and folks needing to be persuaded to "take their medicine" as it were and use some rigor. You are not the first person to make such an observation and you can find posts on point from several established/respected members of the community.

That said, I think "convincing people to take their medicine" mostly looks like those answers you gave just being at the intro of the post(s) by default (and/or the intro to the series if that make... (read more)

4Joseph Van Name1y

Talking about whining and my loss of status is a good way to get me to dislike the LW community and consider them to be anti-intellectuals who fall for garbage like FTX. Do you honestly think the people here should try to interpret large sections of LLMs while simultaneously being afraid of quaternions? It is better to comment on threads where we are interacting in a more positive manner. I thought apologizing and recognizing inadequacies was a core rationalist skill. And I thought rationalists were supposed to like mathematics. The lack of mathematical appreciation is one of these inadequacies of the LW community. But instead of acknowledging this deficiency, the community here blasts me as talking about something off topic. How ironic!

The impossible problem of due process

FireStormOOO1y20

There's a more general concern here for running organizations where anyone can sue anyone at any time for any reason, merit or no. If one allows the barest hint of a lawsuit to dictate their actions, that too becomes another vector through which they can be manipulated. Perhaps a better thing to aim for is "don't do anything egregious enough a lawyer will take it on contingency", use additional caution if the potential adversary is much better resourced than you (and can afford sustained frivolous litigation).

The impossible problem of due process

FireStormOOO1y*-2-2

Not a lawyer, but the "can't explain your reasoning" problem is overblown. Just need to be very diligent in separating facts from the opinions and findings of the panel. There is a reason every report of that sort done professionally sounds the particular flavor of stilted that it does.

"Our panel found that <accused> did <thing>" <- potential lawsuit, hope you can prove that in court. You're not a fact finder in a court of law, speak as if you are at your own peril.

"Our panel was convened to investigate <accusation> a... (read more)

2FireStormOOO1y

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

FireStormOOO1y43

From an operational perspective, this is eye-opening in terms of how much trust is being placed in the companies that train models, and the degree to which nobody coming in later in the pipeline is going to be able to fully vouch for the behavior of the model, even if they spend time hammering on it. In particular, it seems like it took vastly less effort to sabotage those models than would be required to detect this.

That's relevant to the models that are getting deployed today. I think the prevailing thinking among those deploying AI mo... (read more)

2RogerDearnaley1y

There have been quite a few previous papers on backdooring models that have also demonstrated the feasibility of this. So anyone operating under that impression hasn't been reading the literature.

Apologizing is a Core Rationalist Skill

FireStormOOO1y21

I did go pull up a couple of your posts as that much is a fair critique:

That first post is only the middle section of what would already be a dense post and is missing the motivating "what's the problem?", "what does this get us?"; without understanding substantially all of the math and spending hours I don't think I could even ask anything meaningful. That first post in particular is suffering from an approachable-ish sounding title then wall of math, so you're getting laypeople who expected to at least get an intro paragraph for their trouble.

The A... (read more)

1Joseph Van Name1y

I have made a few minor and mostly cosmetic edits to the post about the dimensionality reduction of tensors that produces so many trace free matrices and also to the post about using LSRDRs to solve a combinatorial graph theory problem. "What's the problem?"-Neural networks are horribly uninterpretable, so it would be nice if we could use more interpretable AI models or at least better interpretability tools. Neural networks seem to include a lot of random information, so it would be good to use AI models that do not include so much random information. Do you think that we would have more interpretable models by forsaking all mathematical theory? "what does this get us?"-This gets us systems trained by gradient ascent that behave much more mathematically. Mathematical AI is bound to be highly interpretable. The downvotes display a very bad attitude, and they indicate that the LW community is a community that I really do not want much to do with at worst, and at best, the LW community is a community that lacks discipline and such mathematics texts will be needed to instill such discipline. In those posts that you have looked at, I did not include any mathematical proofs (these are empirical observations, so I could not include proof), and the lack of mathematical proofs makes the text much easier to go through. I also made the texts quite short; I only included enough text to pretty much define the fitness function and then state what I have observed. For toy examples, I just worked with random complex matrices, and I wanted these matrices to be sufficiently small so that I can make and run the code to compute with these matrices quite quickly, but these matrices need to be large enough so that I can properly observe what is going on. I do not want to make an observation about tiny matrices that do not have any relevance to what is going on in the real world. If we want to be able to develop safer AI systems, we will need to make them much more mathematical,

Apologizing is a Core Rationalist Skill

FireStormOOO2y21

It is still a forum, all the usual norms about avoid off-topic, don't hijack threads apply. Perhaps a Q&A on how to get more engagement with math-heavy posts would be more constructive? Speaking just for myself, a cheat-sheet on notation would do wonders.

Nobody is under any illusions that karma is perfect AFAICT, though much discussion has already been had on to what extent it just mirrors the flaws in people's underlying rating choices.

1Joseph Van Name2y

If you have any questions about the notation or definitions that I have used, you should ask about it in the mathematical posts that I have made and not here. Talking about it here is unhelpful, condescending, and it just shows that you did not even attempt to read my posts. That will not win you any favors with me or with anyone who cares about decency. Karma is not only imperfect, but Karma has absolutely no relevance whatsoever because Karma can only be as good as the community here. P.S. Asking a question about the notation does not even signify any lack of knowledge since a knowledgeable person may ask questions about the notation because the knowledgeable person thinks that the post should not assume that the reader has that background knowledge. P.P.S. I got downvotes, so I got enough engagement on the mathematics. The problem is the community here thinks that we should solve problems with AI without using any math for some odd reason that I cannot figure out.

Godzilla Strategies

FireStormOOO2y20

Point of clarification: Is the supervisor the same as the potentially faulting hardware, or are we talking about a different, non-suspect node checking the work, and/or e.g. a more reliable model of chip supervising a faster but less reliable one?

5RogerDearnaley2y

Generally each node involved is a something like a rack-mounted server, or a virtual machine running on one, all of roughly comparable reliability (often of only around the commodity level of reliability). The nodes running the checks may often themselves be redundant and crosschecked, or the whole system may be of nodes that both do the work and cross-check each other — there are well-known algorithms for a group of nodes crosschecking each other that will provably always give the right answer as long as some suitably sized majority of them haven't all failed at once in a weirdly coordinated way, and knowing the reliability of your nodes (from long experience) you can choose the size of your group to achieve any desired level of overall reliability. Then you need to achieve the same things for network, storage, job scheduling, data-paths, updates and so forth: everything involved in the process. This stuff is hard in practice, but the theory is well-understood and taught in CS classes. With enough work on redundancy, crosschecks, and retries you can build arbitrarily large, arbitrarily reliable systems out of somewhat unreliable components. Godzilla can be trained to reliably defeat megagodzilla (please note that I'm not claiming you can make this happen reliably the first time: initially there are invariably failure modes you hadn't thought of causing you to need to do more work). The more unreliable your basic components the harder this gets, and there's almost certainly a required minimum reliability threshold for them: if they usually die before they can even do a cross-check on each other, you're stuck. If you read the technical report for Gemini, in the section on training they explicitly mention doing engineering to detect and correct cases where a server has temporarily had a limited point-failure during a calculation due to a cosmic ray hit. They're building systems so large that they need to cope with failure modes that rare. They also maintain multiple

Why not electric trains and excavators?

FireStormOOO2y30

The more curious case for excavators would be open pit mines or quarries where you know you're going to be in roughly the same place for decades and already have industrial size hookups

1anithite2y

A bit more compelling, though for mining, the excavator/shovel/whatever loads a truck. The truck moves it much further and consumes a lot more energy to do so. Overhead wires to power the haul trucks are the biggest win there. “Roughly 70 per cent of our (greenhouse gas emissions) are from haul truck diesel consumption. So trolley has a tremendous impact on reducing GHGs.” This is an open pit mine. Less vertical movement may reduce imbalance in energy consumption. Can't find info on pit depth right now but haul distance is 1km. General point is that when dealing with a move stuff from A to B problem, where A is not fixed, diesel for a varying A-X route and electric for a fixed X-B route seems like a good tradeoff. Definitely B endpoint should be electrified (EG:truck offload at ore processing location) Getting power to varying point A is a challenging. Maybe something with overhead cables could work, Again, John deere is working on something for agriculture with a cord-laying-down-vehicle and overhead wires are used for the last 20-30 meters. But fields are nice in that there's less sharp rocks and mostly softer dirt/plants. Not impossible but needs some innovation to accomplish.

The 6D effect: When companies take risks, one email can be very powerful.

FireStormOOO2y20

The answer there is if you can get it into evidence then you can get it in front of a jury. A big part of what lawyers do in litigation is argue about what gets into evidence and can get shown; all of that arguing costs time and money. I think a fair summary is if it's plausibly relevant, the judge usually can't/won't exclude it.

Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI"

FireStormOOO2y4-1

I wouldn't count on Microsoft being ineffective, but there's good reason to think they'll push for applications for the current state of the art over further blue sky capabilities stuff. The commitment to push copilot into every Microsoft product is already happening, the copilot tab is live in dozens of places in their software and in most it works as expected. It's already good enough to replace 80%+ of the armies of temps and offshore warm bodies that push spreadsheets and forms around today without any further big capabilities gains, and th... (read more)

Sum-threshold attacks

FireStormOOO2y10

Covert side channels like you're suggesting would probably be a related and often helpful thing for someone trying to do what OP is talking about, but I think the side channels are distinct from the things they can be used for.

Sum-threshold attacks

FireStormOOO2y91

This concept in radio communications would be "spread spectrum", reducing the signal intensity or duration in any given part of the spectrum and using a wider band/more channels. See especially military spread spectrum comms and radars. E.g. this technique has been used to frustrate simple techniques for identifying the location of a radio transmitter, to avoid jamming, and to defeat radar warning/missile warning systems on jets.

2TsviBT2y

Nice, thanks.

A Hill of Validity in Defense of Meaning

FireStormOOO2y146

It's pretty easy to find reasons why everything will hopefully be fine, or AI hopefully won't FOOM, or we otherwise needn't do anything inconvenient to get good outcomes. It's proving considerably harder (from my outside the field view) to prove alignment, or prove upper bounds on rate of improvement, or prove much of anything else that would be cause to stop ringing the alarm.

FWIW I'm considerably less worried than I was when the Sequences were originally written. The paradigms that have taken off since do seem a lot more compatible with strai... (read more)

A Hill of Validity in Defense of Meaning

FireStormOOO2y48

Admittedly I skimmed large portions of that, but I'd like to take a crack at bridging some of that inferential distance with a short description of the model I've been using, whereby I keep all the concerns you brought up straight but also don't have to choke on pronouns.

Categories of Men and Women are useful in a wide variety of areas and point at a real thing. There's a region in the middle these categories overlap and lack clean boundaries - while both genetics and birth sex are undeniable and straightforward fact in almost all cases (~98% IIRC), ... (read more)

FireStormOOO2y10

I think a key distinction here is any of this only helps if people care more about the truth of the issue at hand than whatever realpolitik considerations the issue has tangentially gotten pulled into. And yeah, absent "unreasonable levels of political savvy", academics are mostly relying on academic issues usually being far enough from the icky world of politics to be openly discussed, at least outside of a few seriously diseased disciplines where the rot is well and truly set in. The powers that be seem to only care about the truth of an issu... (read more)

The Base Rate Times, news through prediction markets

FireStormOOO2y10

This is very much what I want my headlines to look like.

Personally, preferred mode of consumption would be AM email newsletter like Axios or Morning Brew.

The resolution dates on the markets seem important on several of the headlines and were noticeably missing from the body.

"Crimea land bridge 22% chance of being cut [this year/campaign season], down from 34% according to Insight"

Notice how different that would read with the time horizon on there vs leaving unqualified. The other big question an update like that begs is "what changed?"

Explaining “Hell is Game Theory Folk Theorems”

FireStormOOO2y30

Interesting follow-up: how long do they take to break out of the bad equilibrium if all start there? How about if we choose a less extreme bad equilibrium (say 80 degrees)?

1ProgramCrafter2y

By less extreme bad equilibrium, do you mean "play 79, until someone defects, and then play 80"? Or "play 80 or 100"? Here is the Python script I've used: https://gist.github.com/ProgramCrafter/2af6a5b1cde0ff8995b9502f1c502151 To make all agents start from Hell, you need to change line 31 to self.strategy = equilibrium.