What if we think about it the following way? ML researchers range from _theorists_ (who try to produce theories that describe how ML/AI/intelligence works at the deep level and how to build it) to _experimenters_ (who put things together using some theory and lots of trial and error and try to make it perform well on the benchmarks). Most people will be somewhere in between on this spectrum but people focusing on interpretability will be further towards theorists than most of the field.
Now let's say we boost the theorists and they produce a lot of exp...
If geoengineering approaches successfully counteract climate change, and it's cheaper to burn carbon and dim the sun than generate power a different way (or not use the power), then presumably civilization is better off burning carbon and dimming the sun.
AFAIK, the main arguments against solar radiation management (SRM) are:
1. High level of CO2 in the atmosphere creates other problems too (e.g. ocean acidification) but those problems are less urgent / impactful so we'll end up not caring about them if we implement SRM. Reducing CO2 emissions a...
I think things are not so bad. If our talking of consciousness leads to a satisfactory functional theory, we might conclude that we have solved the hard problem (at least the "how" part). Not everyone will be satisfied, but it will be hard to make an argument that we should care about the hard problem of consciousness more than we currently care about the hard problem of gravity.
I haven't read Nagel's paper but from what I have read _about_ it, it seems like his main point is that it's impossible to fully explain subjective experie...
The web of concepts where connections conduct karma between nodes is quite similar to a neural net (a biological one). It also seems to be a good model for System 1 moral reasoning and this explains why moral arguments based on linking things to agreed good or agreed bad concepts work so well. Thank you, this was enlightening.
I've been doing some ad hoc track-backs while trying to do anapanasati meditation and I found them quite interesting. Never tried to go for track-backs specifically but it does seem like a good idea and the explanations and arguments in this post were quite convincing. I'm going to try it in my next sessions.
I also learned about staring into regrets, which sounds like another great technique to try. This post is just a treasure trove, thank you!
I find that trees of claims don't always work because context gets lost as you traverse the tree.
Imagine we have an claim A supported by B that is supported by C. If I think that C does support B in some cases but is irrelevant when specifically talking about A, there's no good way to express this. Actually even arguing about relevance of B to A is not really possible, there's only impact vote, but often that was too limiting to express my point.
To some extent both of those cases can be addressed via comments. However, the comments are not v...
If we can do this, then it would give us a possible route to a controlled intelligence explosion, in which the AI designs a more capable successor AI because that is the task it has been assigned, rather than for instrumental reasons, and humans can inspect the result and decide whether or not to run it.
How would humans decide whether something designed by a superintelligent AI is safe to run? It doesn't sound safe by design because even if we rule out safety-compromising divergence in the toy intelligence explosion, how would we know that successor...
I enjoyed reading the hand-written text from images (although I found it a bit surprising that I did). I feel that the resulting slower reading pace fit the content well and that it allowed me to engage with it better. It was also aesthetically pleasant.
Content-wise I found that it more or less agrees with my experience (I have been meditating every day for ~1 hour for a bit over a month and after that non-regularly). It also gave me some insight in terms of every-day mindfulness and some motivation for resuming regular practice or at least making it more...
This reminded me of this post. I like that you specifically mention that reasoning vs. pattern-matching is a spectrum and context-dependent. The advice about using examples is also good, that definitely worked for me.
Both posts also remind me of Mappers and Packers. Seems like all three are exploring roughly the same personality feature from different angles.
To see if I understand this right, I'll try to look at (in)adequacy in terms of utility maximization.
The systems that we looked at can be seen as having utility functions. Sometimes the utility function is explicitly declared by creators of the system but more often it's implicit in its design or just assumed by an observer. For markets it will be some combination of ease of trade, adjacenty of price in sell and buy offers, etc., for academia - the amount of useful scientific progress per dollar, for medicine - amount of saved and improved lives...
I'm not sure I'm completely solid on how FHE works, so perhaps this won't work, but here's an idea of how B can exploit this approach:
Let's imagine that Check_trustworthy(A_source) = 1. After step 3 of the parent comment B would know E1 = Encrypt(1, A_key). If Check_trustworthy(A_source) returned 0, B would instead know E0 = Encrypt(0, A_key) and the following steps works similarly. B knows which one it is by looking at msg_3.
B has another program: Check_blackmail(X, source) that simulates behaviour of an agent with the given sourc
This example is a lie that could be classified as "aggression light" (because it maximises my utility at the expense of victim's utility), whereas the examples in the post are trying to maximise other's utility. What I find interesting is that the second example from the post (protecting Joe) almost fits your formula but it seems intuitively much more benign.
One of the reasons I feel better about lying to protect Joe is that there I maximise his utility (not mine) at expense of yours (it's not clear if you lose anything, but what&#
From 2023 Ray Kurzweil's list doesn't look all that bad, just ahead of reality by some 20 years. The main point of this post (futurism is mostly entertainment) seems true though.