Seeing some confusion on whether AI could be strictly stronger than AI+humans: A simple argument there may be that - at least in principle - adding more cognition (e.g. a human) to a system should not make it strictly worse overall. But that seems true only in a very idealized case.
One issue is incorporating human input without losing overall performance even in situation when the human's advice is much wore than the AI's in e.g. 99.9% of the cases (and it may be hard to tell apart the 0.1% reliably).
But more importantly, a good framing here may be the opt...
I assume you mean that we are doomed anyway so this technically does not change the odds? ;-)
More seriously, I am not assuming any particular level of risk from LLMs above, though, and it is meant more as a humorous (if sad) observation.
The effect size also isn't the usual level of "self-fulfilling" as this is unlikely to have influence over (say) 1% (relative). Though I would not be surprised if some of the current Bing chatbot behavior is in nontrivial part caused by the cultural expectations/stereotypes of an anthropomorphized and mischievous AI (weakly...
It is funny how AGI-via-LLM could make all our narratives about dangerous AIs into a self-fulfilling prophecy - AIs breaking their containment in clever and surprising ways, circumventing their laws (cf. Asimov's fiction), generally turning evil (with optional twisted logic), becoming self-preserving, emotion-driven, or otherwise more human-like. These stories being written to have an interesting narrative and drama, and other works commonly anthropomophising AIs likely does not help either.
John's comment about the fundamental distinction between role-play...
The concept of "interfaces of misalignment" does not mainly point to GovAI-style research here (although it also may serve as a framing for GovAI). The concrete domains separated by the interfaces in the figure above are possibly a bit misleading in that sense:
For me, the "interfaces of misalignment" are generating intuitions about what it means to align a complex system that may not even be self-aligned - rather just one aligning part of it. It is expanding not just the space of solutions, but also the space of meanings of "success". (For example, one ext...
Shahar Avin and others have created a simulation/roleplay game where several world powers, leaders & labs go through the years between now and creation of AGI (or anything substantially transformative).
https://www.shaharavin.com/publication/exploring-ai-futures-through-role-play/
While the topic is a bit different, I would expect there to be a lot to take from their work and experience (they have ran it many times and iterated the design). In particular, I would expect some of the difficulty balancing "realism" (or the space of our best guesses) with pl...
[4] AI safety relevant side note: The idea that translations of meaning need only be sufficiently reliable in order to be reliably useful might provide an interesting avenue for AI safety research. [...]
I also see this as an interesting (and pragmatic) research direction. However, I think its usefulness hinges on ability to robustly quantify the required alignment reliability / precision for various levels of optimization power involved. Only then it may be possible to engineer demonstrably safe scenarios with alignment quality tracking the optimizat...
Thanks for the (very relatable) post!
I find slack extremely valuable. I agree with the observation that "real-world" slack likely isn't the main blocker of resourcefulness (as you use the term) for many people, including me. I am not sure I would call the bag of self-imposed limitations, licenses and roles also (a lack of) slack, though - at least for me "slack within identity/role" does not map to the meat of the problem of agency in ownership/role/something. I would be excited to read more on cultivating intrapersonal freedom!
I am unsure about the self-a...
Side-remark: Individual positions and roles in society seem to hold a middle ground here: When dealing with a concrete person who holds some authority (imagine a grant maker, a clerk, a supervisor, ...), modelling them internally as a person or as an institution brings up different expectations of motivations and values - the person may have virtues and honor where I would expect the institution to have rules and possibly culture (where principles may be a solid part of the rules or culture, but that feels somewhat less common, weaker or more prone to Goodharting as PR; I may be confused here, though).
This brings up the concept of theory of mind for me, especially when thinking about how this applies differently to individual people, to positions/roles in society, and e.g. to corporations. In particular, I would need to have a theory of mind of an entity to ascribe "honor" to it and expect it to uphold it.
A person can convince me that their mind is built around values or principles and I can reasonably trust them to uphold them in the future more likely than not. I believe that for humans, pretending is usually difficult or at least costly.
What a corpor...
Update: I believe the solution by @UnexpectedValues can be adapted to work for all natural numbers.
Proof: By Dirichlet prime number theorem, for every , there is a prime of the form as long as N-1 and N are co-prime, which is true whenever . Then and the solution by UnexpectedValues can be used with appropriate . Such p exists whenever which is always the case for . For , one can take e.g. (since ). And is trivial.
A beautiful solution! Some remarks and pointers
Note that your approach only works for target (that is ), as does not have a real solution. Open question: How about 1-coin solution emulating a 3-sided dice?
See my solution (one long comment for all 3 puzzles) for some thoughts on rational coins - there are some constructions, but more importantly I suspect that no finite set of rational coins without the 1/2-coin can generate any fair dice. (My lemma does not cover all rational coins yet, though; see by
So, I got nerd-sniped by this (together with my flatmate on P3) and we spent some delicious hours and days in various rabbit-holes. Posting some thoughts here.
First of all, a meta-comment (minor spoiler on the nature of best-known solutions):
The first puzzle got me into an optimization mindset -- and it seems to be the right mindset for that problem -- but the optimal solutions to P2 and P3 (which I missed before reading spoilers) are "crisp" or "a neat trick" rather that a result of whatever matches a (however clever) "optimization process" for me. (If th
Complexity indeed matters: the universe seems to be bounded in both time and space, so running anything like Solomonoff prior algorithm (in one of its variants) or AIXI may be outright impossible for any non-trivial model. This for me significantly weakens or changes some of the implications.
A Fermi upper bound of the direct Solomonoff/AIXI algorithm trying TMs in the order of increasing complexity: even if checking one TM took one Planck time on one atom, you could only check cca 10^250=2^800 machines within a lifetime of the universe (~10^110 years until...
I think that sufficiently universally trusted arbiters may be very hard to find, but Alice can also refrain from that option to prevent the issue gaining more public attention, believing more attention or attention of various groups to be harmful. I can imagine cases, where more credible people (Carols) saying they are convinced that e.g. "it is really easily doable" would disproportionally give more incentives for misuse than defense (by the groups the information reaches, the reliability signals those groups accept etc).
The transitions in more complex, real-world domains may not be as sharp as e.g. in chess, and it would be useful to model and map the resource allocation ratio between AIs and humans in different domains over time. This is likely relatively tractable and would be informative for prediction of future development of the transitions.
While the dynamic would differ between domains (not just the current stage but also the overall trajectory shape), I would expect some common dynamics that would be interesting to explore and model.
A few examples of concrete... (read more)