All of kvas's Comments + Replies

From 2023 Ray Kurzweil's list doesn't look all that bad, just ahead of reality by some 20 years. The main point of this post (futurism is mostly entertainment) seems true though.

What if we think about it the following way? ML researchers range from _theorists_ (who try to produce theories that describe how ML/AI/intelligence works at the deep level and how to build it) to _experimenters_ (who put things together using some theory and lots of trial and error and try to make it perform well on the benchmarks). Most people will be somewhere in between on this spectrum but people focusing on interpretability will be further towards theorists than most of the field.

Now let's say we boost the theorists and they produce a lot of exp... (read more)

If geoengineering approaches successfully counteract climate change, and it's cheaper to burn carbon and dim the sun than generate power a different way (or not use the power), then presumably civilization is better off burning carbon and dimming the sun.

AFAIK, the main arguments against solar radiation management (SRM) are:

1. High level of CO2 in the atmosphere creates other problems too (e.g. ocean acidification) but those problems are less urgent / impactful so we'll end up not caring about them if we implement SRM. Reducing CO2 emissions a... (read more)

3Vaniver
Sure, I think carbon sequestration is a solid approach as well (especially given that it's still net energy-producing to burn fossil fuels and sequester the resulting output as CO2 somewhere underground!), and am not familiar enough with the numbers to know if SRM is better or worse than sequestration. My core objection was that Russell's opinion of the NAS meeting wasn't "SRM has expected disasters or expected high costs that disqualify it", and instead it looked like that the NAS thought it was more important to be adversarial to fossil fuel interests than make the best engineering decision.

I think things are not so bad. If our talking of consciousness leads to a satisfactory functional theory, we might conclude that we have solved the hard problem (at least the "how" part). Not everyone will be satisfied, but it will be hard to make an argument that we should care about the hard problem of consciousness more than we currently care about the hard problem of gravity.

I haven't read Nagel's paper but from what I have read _about_ it, it seems like his main point is that it's impossible to fully explain subjective experie... (read more)

The web of concepts where connections conduct karma between nodes is quite similar to a neural net (a biological one). It also seems to be a good model for System 1 moral reasoning and this explains why moral arguments based on linking things to agreed good or agreed bad concepts work so well. Thank you, this was enlightening.

1[anonymous]
I vaguely recall auditing the MIT OCW course on intro psych. I think that's where I heard about what I'll call the "association network model of words". Here's one summarizing idea: I present you with a bunch of ways, say purring, fuzzy, cut, predator, pet, remote controlled helicopter video. The thing that represents "cat" in your brain should light up, because its neighbors have been stimulated. I don't recall whether the network is just a good prediction factory ("it only exists in the model") or whether the brain is supposed to have... I guess a network of neurons with an isomorphic structure. These structures are are similar to those in the article, with except with meaning or association taking the place of evaluation.

I've been doing some ad hoc track-backs while trying to do anapanasati meditation and I found them quite interesting. Never tried to go for track-backs specifically but it does seem like a good idea and the explanations and arguments in this post were quite convincing. I'm going to try it in my next sessions.

I also learned about staring into regrets, which sounds like another great technique to try. This post is just a treasure trove, thank you!

I find that trees of claims don't always work because context gets lost as you traverse the tree.

Imagine we have an claim A supported by B that is supported by C. If I think that C does support B in some cases but is irrelevant when specifically talking about A, there's no good way to express this. Actually even arguing about relevance of B to A is not really possible, there's only impact vote, but often that was too limiting to express my point.

To some extent both of those cases can be addressed via comments. However, the comments are not v... (read more)

If we can do this, then it would give us a possible route to a controlled intelligence explosion, in which the AI designs a more capable successor AI because that is the task it has been assigned, rather than for instrumental reasons, and humans can inspect the result and decide whether or not to run it.

How would humans decide whether something designed by a superintelligent AI is safe to run? It doesn't sound safe by design because even if we rule out safety-compromising divergence in the toy intelligence explosion, how would we know that successor... (read more)

I'm not sure if I can provide useful feedback but I'd be interested in reading it.

I enjoyed reading the hand-written text from images (although I found it a bit surprising that I did). I feel that the resulting slower reading pace fit the content well and that it allowed me to engage with it better. It was also aesthetically pleasant.

Content-wise I found that it more or less agrees with my experience (I have been meditating every day for ~1 hour for a bit over a month and after that non-regularly). It also gave me some insight in terms of every-day mindfulness and some motivation for resuming regular practice or at least making it more... (read more)

5Unreal
I also liked reading the handwritten version.

This reminded me of this post. I like that you specifically mention that reasoning vs. pattern-matching is a spectrum and context-dependent. The advice about using examples is also good, that definitely worked for me.

Both posts also remind me of Mappers and Packers. Seems like all three are exploring roughly the same personality feature from different angles.

1[comment deleted]

To see if I understand this right, I'll try to look at (in)adequacy in terms of utility maximization.

The systems that we looked at can be seen as having utility functions. Sometimes the utility function is explicitly declared by creators of the system but more often it's implicit in its design or just assumed by an observer. For markets it will be some combination of ease of trade, adjacenty of price in sell and buy offers, etc., for academia - the amount of useful scientific progress per dollar, for medicine - amount of saved and improved lives... (read more)

Just in case anyone is interested, here's a non-paywalled version of this article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3176828/

I'm not sure I'm completely solid on how FHE works, so perhaps this won't work, but here's an idea of how B can exploit this approach:

  1. Let's imagine that Check_trustworthy(A_source) = 1. After step 3 of the parent comment B would know E1 = Encrypt(1, A_key). If Check_trustworthy(A_source) returned 0, B would instead know E0 = Encrypt(0, A_key) and the following steps works similarly. B knows which one it is by looking at msg_3.

  2. B has another program: Check_blackmail(X, source) that simulates behaviour of an agent with the given sourc

... (read more)
3bryjnar
I think your example won't work, but it depends on the implementation of FHE. If there's a nonce involved (which there really should be), then you'll get different encrypted data for the output of the two programs you run, even though the underlying data is the same. But you don't actually need to do that. The protocol lets B exfiltrate one bit of data, whatever bit they like. A doesn't get to validate the program that B runs, they can only validate the output. So any program that produces 0 or 1 will satisfy A and they'll even decrypt the output for you. That does indeed mean that B can find out if A is blackmailable, or something, so exposing your source code is still risky. What would be really cool would be a way to let A also be sure what program has been run on their source by B, but I couldn't think of a way to do this such that both A and B are sure that the program was the one that actually got run.

This example is a lie that could be classified as "aggression light" (because it maximises my utility at the expense of victim's utility), whereas the examples in the post are trying to maximise other's utility. What I find interesting is that the second example from the post (protecting Joe) almost fits your formula but it seems intuitively much more benign.

One of the reasons I feel better about lying to protect Joe is that there I maximise his utility (not mine) at expense of yours (it's not clear if you lose anything, but what&#

... (read more)