Quick Takes

Looking for blog platform/framework recommendations

I had a Wordpress blog, but I don't like wordpress and I want to move away from it. 

Substack doesn't seem like a good option because I want high customizability and multilingual support (my Blog is going to be in English and Hebrew).

I would like something that I can use for free with my own domain (so not Wix).

The closest thing I found to what I'm looking for was MkDocs Material, but it's still geared too much towards documentation, and I don't like its blog functionality enough.

Other requirements: Da... (read more)

decision theory is no substitute for utility function

some people, upon learning about decision theories such as LDT and how it cooperates on problems such as the prisoner's dilemma, end up believing the following:

my utility function is about what i want for just me; but i'm altruistic (/egalitarian/cosmopolitan/pro-fairness/etc) because decision theory says i should cooperate with other agents. decision theoritic cooperation is the true name of altruism.

it's possible that this is true for some people, but in general i expect that to be a mistaken anal... (read more)

Showing 3 of 7 replies (Click to show all)
1Pi Rogers10h
What about the following: My utility function is pretty much just my own happiness (in a fun-theoretic rather than purely hedonistic sense). However, my decision theory is updateless with respect to which sentient being I ended up as, so once you factor that in, I'm a multiverse-wide realityfluid-weighted average utilitarian. I'm not sure how correct this is, but it's possible.

It certainly is possible! In more decision-theoritic terms, I'd describe this as "it sure would suck if agents in my reference class just optimized for their own happiness; it seems like the instrumental thing for agents in my reference class to do is maximize for everyone's happiness". Which is probly correct!

But as per my post, I'd describe this position as "not intrinsically altruistic" — you're optimizing for everyone's happiness because "it sure would sure if agents in my reference class didn't do that", not because you intrinsically value that everyone be happy, regardless of reasoning about agents and reference classes and veils of ignorance.

2Viliam3d
I don't have an explicit theory of how this works; for example, I would consider "pleasing others" in an experience machine meaningless, but "eating a cake" in an experience machine seems just as okay as in real life (maybe even preferable, considering that cakes are unhealthy). A fake memory of "having eaten a cake" would be a bad thing; "making people happier by talking to them" in an experience machine would be intrinsically meaningless, but it might help me improve my actual social skills, which would be valuable. Sometimes I care about the referent being real (the people I would please), sometimes I don't (the cake I would eat). But it's not the people/cake distinction per se; for example in case of using fake simulated people to practice social skills, the emphasis is on the skills being real; I would be disappointed if the experience machine merely gave me a fake "feeling of having improved my skills". I imagine that for a psychopath everything and everyone is instrumental, so there would be no downside to the experience machine (except for the risk of someone turning it off). But this is just a guess. I suspect that analyzing "the true preferences" is tricky, because ultimately we are built of atoms, and atoms have no preferences. So the question is whether by focusing on some aspect of the human mind we got better insight to its true nature, or whether we have just eliminated the context that was necessary for it to make sense.

In my fantasies, if I ever were to get that god-like glimpse at how everything actually is, with all that is currently hidden unveiled, it would be something like the feeling you have when you get a joke, or see a "magic eye" illustration, or understand an illusionist's trick, or learn to juggle: what was formerly perplexing and incoherent becomes in a snap simple and integrated, and there's a relieving feeling of "ah, but of course."

But it lately occurs to me that the things I have wrong about the world are probably things I've grasped at exactly because ... (read more)

And then today I read this: “We yearn for the transcendent, for God, for something divine and good and pure, but in picturing the transcendent we transform it into idols which we then realize to be contingent particulars, just things among others here below. If we destroy these idols in order to reach something untainted and pure, what we really need, the thing itself, we render the Divine ineffable, and as such in peril of being judged non-existent. Then the sense of the Divine vanishes in the attempt to preserve it.” (Iris Murdoch, Metaphysics as a Guide to Morals)

1metachirality1d
I like to phrase it as "the path to simplicity involves a lot of detours." Yes, Newtonian mechanics doesn't account for the orbit of Mercury but it turned out there was an even simpler, more parsimonious theory, general relativity, waiting for us.
tlevin2d630

I think some of the AI safety policy community has over-indexed on the visual model of the "Overton Window" and under-indexed on alternatives like the "ratchet effect," "poisoning the well," "clown attacks," and other models where proposing radical changes can make you, your allies, and your ideas look unreasonable (edit to add: whereas successfully proposing minor changes achieves hard-to-reverse progress, making ideal policy look more reasonable).

I'm not familiar with a lot of systematic empirical evidence on either side, but it seems to me like the more... (read more)

Showing 3 of 4 replies (Click to show all)
9Akash21h
Agree with lots of this– a few misc thoughts [hastily written]: 1. I think the Overton Window frame ends up getting people to focus too much on the dimension "how radical is my ask"– in practice, things are usually much more complicated than this. In my opinion, a preferable frame is something like "who is my target audience and what might they find helpful." If you're talking to someone who makes it clear that they will not support X, it's silly to keep on talking about X. But I think the "target audience first" approach ends up helping people reason in a more sophisticated way about what kinds of ideas are worth bringing up. As an example, in my experience so far, many policymakers are curious to learn more about intelligence explosion scenarios and misalignment scenarios (the more "radical" and "speculative" threat models).  2. I don't think it's clear that the more effective actors in DC tend to be those who look for small wins. Outside of the AIS community, there sure do seem to be a lot of successful organizations that take hard-line positions and (presumably) get a lot of their power/influence from the ideological purity that they possess & communicate. Whether or not these organizations end up having more or less influence than the more "centrist" groups is, in my view, not a settled question & probably varies a lot by domain. In AI safety in particular, I think my main claim is something like "pretty much no group– whether radical or centrist– has had tangible wins. When I look at the small set of tangible wins, it seems like the groups involved were across the spectrum of "reasonableness." 3. The more I interact with policymakers, the more I'm updating toward something like "poisoning the well doesn't come from having radical beliefs– poisoning the well comes from lamer things like being dumb or uninformed, wasting peoples' time, not understanding how the political process works, not having tangible things you want someone to do, explaining ideas poorl
tlevin11h4-4

Quick reactions:

  1. Re: how over-emphasis on "how radical is my ask" vs "what my target audience might find helpful" and generally the importance of making your case well regardless of how radical it is, that makes sense. Though notably the more radical your proposal is (or more unfamiliar your threat models are), the higher the bar for explaining it well, so these do seem related.
  2. Re: more effective actors looking for small wins, I agree that it's not clear, but yeah, seems like we are likely to get into some reference class tennis here. "A lot of successful o
... (read more)
4trevor22h
Recently, John Wentworth wrote: And I think this makes sense (e.g. Simler's Social Status: Down the Rabbit Hole which you've probably read), if you define "AI Safety" as "people who think that superintelligence is serious business or will be some day". The psych dynamic that I find helpful to point out here is Yud's Is That Your True Rejection post from ~16 years ago. A person who hears about superintelligence for the first time will often respond to their double-take at the concept by spamming random justifications for why that's not a problem (which, notably, feels like legitimate reasoning to that person, even though it's not). An AI-safety-minded person becomes wary of being effectively attacked by high-status people immediately turning into what is basically a weaponized justification machine, and develops a deep drive wanting that not to happen. Then justifications ensue for wanting that to happen less frequently in the world, because deep down humans really don't want their social status to be put at risk (via denunciation) on a regular basis like that. These sorts of deep drives are pretty opaque to us humans but their real world consequences are very strong. Something that seems more helpful than playing whack-a-mole whenever this issue comes up is having more people in AI policy putting more time into improving perspective. I don't see shorter paths to increasing the number of people-prepared-to-handle-unexpected-complexity than giving people a broader and more general thinking capacity for thoughtfully reacting to the sorts of complex curveballs that you get in the real world. Rationalist fiction like HPMOR is great for this, as well as others e.g. Three Worlds Collide, Unsong, Worth the Candle, Worm (list of top rated ones here). With the caveat, of course, that doing well in the real world is less like the bite-sized easy-to-understand events in ratfic, and more like spotting errors in the methodology section of a study or making money playing poker.
dkornai17h3-2

Pain is the consequence of a perceived reduction in the probability that an agent will achieve its goals. 

In biological organisms, physical pain [say, in response to limb being removed] is an evolutionary consequence of the fact that organisms with the capacity to feel physical pain avoided situations where their long-term goals [e.g. locomotion to a favourable position with the limb] which required the subsystem generating pain were harmed.

This definition applies equally to mental pain [say, the pain felt when being expelled from a group of allies] w... (read more)

In biological organisms, physical pain [say, in response to limb being removed] is an evolutionary consequence of the fact that organisms with the capacity to feel physical pain avoided situations where their long-term goals [e.g. locomotion to a favourable position with the limb] which required the subsystem generating pain were harmed.

 

How many organisms other than humans have "long term goals"? Doesn't that require a complex capacity for mental representation of possible future states?

Am I wrong in assuming that the capacity to experience "pain" is... (read more)

3Alexander Gietelink Oldenziel16h
It also suggests that there might some sort of conservation law for pain for agents. Conservation of Pain if you will

Pithy sayings are lossily compressed.

Something someone technical and interested in forecasting should look into:  can LLMs reliably convert peoples claims into a % of confidence through sentiment analysis? This would be useful for Forecasters I believe (and rationality in general)

There have been multiple occasions where I've copy and pasted email threads into an LLM and asked it things like:

  1. What is X person saying
  2. What are the cruxes in this conversation?
  3. Summarise this conversation
  4. What are the key takeaways
  5. What views are being missed from this conversation

I really want an email plugin that basically brute forces rationality INTO email conversations.

Showing 3 of 4 replies (Click to show all)
1yanni kyriacos1d
Hi Johannes! Thanks for the suggestion :) I'm not sure i'd want it in the middle of a video call, but maybe in a forum context like this could be cool?
1Johannes C. Mayer1d
Seems pretty good to me to have this in a video call to me. The main reason why don't immediately try this out is that I would need to write a program to do this.

That seems fair enough!

The FDC just fined US phone carriers for sharing the location data of US customers to anyone willing to buy them. The fines don't seem to be high enough to deter this kind of behavior.

That likely includes either directly or indirectly the Chinese government. 

What does the US Congress do to protect spying by China? Of course, banning tik tok instead of actually protecting the data of US citizens. 

If you have thread models that the Chinese government might target you, assume that they know where your phone is and shut it of when going somewhere you... (read more)

Showing 3 of 5 replies (Click to show all)
2ChristianKl21h
Politico wrote, "Perhaps the most pressing concern is around the Chinese government’s potential access to troves of data from TikTok’s millions of users." The concern that TikTok supposedly is spyware is frequently made in discussions about why it should be banned. If the main issue is content moderation decisions, the best way to deal with it would be to legislate transparency around content moderation decisions and require TikTok to outsource the moderation decisions to some US contractor. 
4ChristianKl21h
The Office of the Director of National Intelligence wrote a report about this question that was declassified last year. They use the abbreviation CAI for "commercially accessible data". "2.5. (U) Counter-Intelligence Risks in CAI. There is also a growing recognition that CAI, as a generally available resource, offers intelligence benefits to our adversaries, some of which may create counter-intelligence risk for the IC. For example, the January 2021 CSIS report cited above also urges the IC to “test and demonstrate the utility of OSINT and AI in analysis on critical threats, such as the adversary use of AI-enabled capabilities in disinformation and influence operations." Last month there was a political fight about warrant requirements when the US intelligence agencies use commercially brought data, that was likely partly caused by the concerns from that report.  Here, I mean that you are doing something that's of interest to Chinese intelligence services. People who want to lobby for Chinese AI policy probably fall under that class.  I'm not sure to what extent people working at top AI labs might be blackmailed by the Chinese government to do things like give them their source code. 
Dagon13h20

[note: I suspect we mostly agree on the impropriety of open selling and dissemination of this data.  This is a narrow objection to the IMO hyperbolic focus on government assault risks. ]

I'm unhappy with the phrasing of "targeted by the Chinese government", which IMO implies violence or other real-world interventions when the major threats are "adversary use of AI-enabled capabilities in disinformation and influence operations." Thanks for mentioning blackmail - that IS a risk I put in the first category, and presumably becomes more possible with phone... (read more)

quick thoughts on LLM psychology

LLMs cannot be directly anthromorphized. Though something like “a program that continuously calls an LLM to generate a rolling chain of thought, dumps memory into a relational database, can call from a library of functions which includes dumping to recall from that database, receives inputs that are added to the LLM context” is much more agent-like.

Humans evolved feelings as signals of cost and benefit — because we can respond to those signals in our behaviour.

These feelings add up to a “utility function”, something ... (read more)

Richard_Ngo17hΩ6122

Hypothesis: there's a way of formalizing the notion of "empowerment" such that an AI with the goal of empowering humans would be corrigible.

This is not straightforward, because an AI that simply maximized human POWER (as defined by Turner et al.) wouldn't ever let the humans spend that power. Intuitively, though, there's a sense in which a human who can never spend their power doesn't actually have any power. Is there a way of formalizing that intuition?

The direction that seems most promising is in terms of counterfactuals (or, alternatively, Pearl's do-ca... (read more)

4Garrett Baker16h
There's also the problem of: what do you mean by "the human"? If you make an empowerment calculus that works for humans who are atomic & ideal agents, it probably breaks once you get a superintelligence who can likely mind-hack you into yourself valuing only power. It never forces you to abstain from giving up power, since if you're perfectly capable of making different decisions, but you just don't. Another problem, which I like to think of as the "control panel of the universe" problem, is where the AI gives you the "control panel of the universe", but you aren't smart enough to operate it, in the sense that you have the information necessary to operate it, but not the intelligence. Such that you can technically do anything you want--you have maximal power/empowerment--but the super-majority of buttons and button combinations you are likely to push result in increasing the number of paperclips.

Such that you can technically do anything you want--you have maximal power/empowerment--but the super-majority of buttons and button combinations you are likely to push result in increasing the number of paperclips.

I think any model of a rational agent needs to incorporate the fact that they're not arbitrarily intelligent, otherwise none of their actions make sense. So I'm not too worried about this.

If you make an empowerment calculus that works for humans who are atomic & ideal agents, it probably breaks once you get a superintelligence who can likely

... (read more)
4Richard_Ngo16h
You can think of this as a way of getting around the problem of fully updated deference, because the AI is choosing a policy based on what that policy would have done in the full range of hypothetical situations, and so it never updates away from considering any given goal. The cost, of course, is that we don't know how to actually pin down these hypotheticals.

Hypothesis, super weakly held and based on anecdote:
One big difference between US national security policy people and AI safety people is that the "grieving the possibility that we might all die" moment happened, on average, more years ago for the national security policy person than the AI safety person. 

This is (even more weakly held) because the national security field has existed for longer, so many participants literally had the "oh, what happens if we get nuked by Russia" moment in their careers in the Literal 1980s...

What would the minimal digital representation of a human brain & by extension memories/personality look like?

I am not a subject matter expert. This is armchair speculation and conjecture, the actual reality of which I expect to be orders of magnitude more complicated than my ignorant model.

The minimal physical representation is obviously the brain itself, but to losslessly store every last bit of information —IE exact particle configurations— as accurately as it is possible to measure is both nigh-unto-impossible and likely unnecessary considering the ... (read more)

5gwern15h
You might find my notes of interest.
ErioirE15h10

Yes, thanks!

I want a word that's like "capable" but clearly means the things you have the knowledge or skill to do. I'm clearly not capable of running a hundred miles an hour or catching a bullet in my bare hand. I'm not capable of bench pressing 200lbs either; that's pretty likely in the range of what I could do if I worked out and trained at it for a few years, but right this second I'm not in that kind of shape. In some senses, I'm capable of logging into someone else's LessWrong account- my fingers are physically capable of typing their password- but I don't have ... (read more)

2cubefox1d
"I can do X" seems to be short for "If I wanted to do X, I would do X." It's a hidden conditional. The ambiguity is the underspecified time. I can do X -- when? Right now? After a few months of training?

Q. "Can you hold the door?" A. "Sure."

That's straightforward.

Q. "Can you play the violin at my wedding next year?" A. "Sure."

Colloquial language would imply not only am I willing and able to do this, I already know how to play the violin. Sometimes, what I want to answer is that I don't know how to play the violin, I'm willing to learn, but you should know I currently don't know.

Which I can say, it just takes more words.

1davekasten2d
"Achievable goal" or "plausible outcome", maybe?
TurnTrout1dΩ23505

A semi-formalization of shard theory. I think that there is a surprisingly deep link between "the AIs which can be manipulated using steering vectors" and "policies which are made of shards."[1] In particular, here is a candidate definition of a shard theoretic policy:

A policy has shards if it implements at least two "motivational circuits" (shards) which can independently activate (more precisely, the shard activation contexts are compositionally represented).

By this definition, humans have shards because they can want food at the same time as wantin... (read more)

1samshap22h
Instead of demanding orthogonal representations, just have them obey the restricted isometry property. Basically, instead of requiring  ∀i≠j:<xi,xj>=0, we just require ∀i≠j:xi⋅xj≤ϵ . This would allow a polynomial number of sparse shards while still allowing full recovery.
11cubefox1d
For illustration, what would be an example of having different shards for "I get food" (F) and "I see my parents again" (P) compared to having one utility distribution over F∧P, F∧¬P, ¬F∧P, ¬F∧¬P?

I think this is also what I was confused about -- TurnTrout says that AIXI is not a shard-theoretic agent because it just has one utility function, but typically we imagine that the utility function itself decomposes into parts e.g. +10 utility for ice cream, +5 for cookies, etc. So the difference must not be about the decomposition into parts, but the possibility of independent activation? but what does that mean? Perhaps it means: The shards aren't always applied, but rather only in some circumstances does the circuitry fire at all, and there are circums... (read more)

I think there should be a way to find the highest rated shortform posts.

habryka16h90

You can! Just go to the all-posts page, sort by year, and the highest-rated shortform posts for each year will be in the Quick Takes section: 

2024: 

2023: 

2022: 

Detangle Communicative Writing and Research

One reason why I never finish any blog post is probably because I'm just immediately starting to write it. I think it is better to first build a very good understanding of whatever I'm trying to understand. Only when I'm sure I have understood do I start to create a very narrowly scoped writeup?

Doing this has two advantages. First, it speeds up the research process, because writing down all your thoughts is slow.

Second, it speeds up the writing of the final document. You are not confused about the thing, and you ... (read more)

Be Confident in your Processes

I thought a lot about what kinds of things make sense for me to do to solve AI alignment. That did not make me confident that any particular narrow idea that I have will eventually lead to something important.

Rather, I'm confident that executing my research process will over time lead to something good. The research process is:

  1. Take some vague intuitions
  2. Iteratively unroll them into something concrete
  3. Update my models based on new observations I make during this overall process.

I think being confident, i.e. not feeling hop... (read more)

Check my math: how does Enovid compare to to humming?

Nitric Oxide is an antimicrobial and immune booster. Normal nasal nitric oxide is 0.14ppm for women and 0.18ppm for men (sinus levels are 100x higher). journals.sagepub.com/doi/pdf/10.117…

Enovid is a nasal spray that produces NO. I had the damndest time quantifying Enovid, but this trial registration says 0.11ppm NO/hour. They deliver every 8h and I think that dose is amortized, so the true dose is 0.88. But maybe it's more complicated. I've got an email out to the PI but am not hopeful about a response ... (read more)

Showing 3 of 4 replies (Click to show all)
6Elizabeth1d
I found the gotcha: envoid has two other mechanisms of action. Someone pointed this out to me on my previous nitric oxide post, but it didn't quite sink in till I did more reading. 

What are the two other mechanisms of action?

2Elizabeth6d
I think that's their guess but they don't directly check here.  I also suspect that it doesn't matter very much.  * The sinuses have so much NO compared to the nose that this probably doesn't materially lower sinus concentrations.  * the power of humming goes down with each breath but is fully restored in 3 minutes, suggesting that whatever change happens in the sinsues is restored quickly * From my limited understanding of virology and immunology, alternating intensity of NO between sinuses and nose every three minutes is probably better than keeping sinus concentrations high[1]. The first second of NO does the most damage to microbes[2], so alternation isn't that bad.   I'd love to test this. The device you linked works via the mouth, and we'd need something that works via the nose. From a quick google it does look like it's the same test, so we'd just need a nasal adaptor. Other options: * Nnoxx. Consumer skin device, meant for muscle measurements * There are lots of devices for measuring concentration in the air, maybe they could be repurporsed. Just breathing on it might be enough for useful relative metrics, even if they're low-precision.    I'm also going to try to talk my asthma specialist into letting me use their oral machine to test my nose under multiple circumstances, but it seems unlikely she'll go for it.  1. ^ obvious question: so why didn't evolution do that? Ancestral environment didn't have nearly this disease (or pollution) load. This doesn't mean I'm right but it means I'm discounting that specific evolutionary argument.  2. ^ although NO is also an immune system signal molecule, so the average does matter. 

Is there a lesswrong cannon post for the quantified impact of different masks? I want to compare a different intervention to masks and it would be nice to use a reference that's gone through battle testing.

Load More