All of Tomáš Gavenčiak's Comments + Replies

The transitions in more complex, real-world domains may not be as sharp as e.g. in chess, and it would be useful to model and map the resource allocation ratio between AIs and humans in different domains over time. This is likely relatively tractable and would be informative for prediction of future development of the transitions. 

While the dynamic would differ between domains (not just the current stage but also the overall trajectory shape), I would expect some common dynamics that would be interesting to explore and model.

A few examples of concrete... (read more)

Seeing some confusion on whether AI could be strictly stronger than AI+humans: A simple argument there may be that - at least in principle - adding more cognition (e.g. a human) to a system should not make it strictly worse overall. But that seems true only in a very idealized case.

One issue is incorporating human input without losing overall performance even in situation when the human's advice is much wore than the AI's in e.g. 99.9% of the cases (and it may be hard to tell apart the 0.1% reliably).

But more importantly, a good framing here may be the opt... (read more)

2Tomáš Gavenčiak
The transitions in more complex, real-world domains may not be as sharp as e.g. in chess, and it would be useful to model and map the resource allocation ratio between AIs and humans in different domains over time. This is likely relatively tractable and would be informative for prediction of future development of the transitions.  While the dynamic would differ between domains (not just the current stage but also the overall trajectory shape), I would expect some common dynamics that would be interesting to explore and model. A few examples of concrete questions that could be tractable today: * What fraction of costs in quantitative trading is expert analysts and AI-based tools? (incl. their development, but perhaps not including e.g. basic ML-based analytics) * What fraction of costs is already used for AI assistants in coding? (not incl. e.g. integration and testing costs - these automated tools would point to an earlier transition to automation that is not of main interest here) * How large fraction of costs of PR and advertisement agencies is spent on AI, both facing customers and influencing voters? (may incl. e.g. LLM analysis of human sentiment, generating targeted materials, and advanced AI-based behavior models, though a finer line would need to be drawn; I would possibly include experts who operate those AIs if the company would not employ them without using an AI, as they may incur significant part of the cost) While in many areas the fraction of resources spent on (advanced) AIs is still relatively small, it is ramping up quite quickly and even those may provide informative to study (and develop methodology and metrics for, and create forecasts to calibrate our models).

I assume you mean that we are doomed anyway so this technically does not change the odds? ;-)

More seriously, I am not assuming any particular level of risk from LLMs above, though, and it is meant more as a humorous (if sad) observation.
The effect size also isn't the usual level of "self-fulfilling" as this is unlikely to have influence over (say) 1% (relative). Though I would not be surprised if some of the current Bing chatbot behavior is in nontrivial part caused by the cultural expectations/stereotypes of an anthropomorphized and mischievous AI (weakly... (read more)

It is funny how AGI-via-LLM could make all our narratives about dangerous AIs into a self-fulfilling prophecy - AIs breaking their containment in clever and surprising ways, circumventing their laws (cf. Asimov's fiction), generally turning evil (with optional twisted logic), becoming self-preserving, emotion-driven, or otherwise more human-like. These stories being written to have an interesting narrative and drama, and other works commonly anthropomophising AIs likely does not help either.

John's comment about the fundamental distinction between role-play... (read more)

9Vaniver
I feel like there's a distinction between "self-fulfilling prophecy" and "overdetermined" that seems important here. I think in a world full of AI risk stories, LLMs maybe become risky sooner than they do in a world without any AI risk stories, but I think they still become risky eventually in the latter world, and so it's not really a self-fulfilling prophecy.

The concept of "interfaces of misalignment" does not mainly point to GovAI-style research here (although it also may serve as a framing for GovAI). The concrete domains separated by the interfaces in the figure above are possibly a bit misleading in that sense:

For me, the "interfaces of misalignment" are generating intuitions about what it means to align a complex system that may not even be self-aligned - rather just one aligning part of it. It is expanding not just the space of solutions, but also the space of meanings of "success". (For example, one ext... (read more)

Shahar Avin and others have created a simulation/roleplay game where several world powers, leaders & labs go through the years between now and creation of AGI (or anything substantially transformative).

https://www.shaharavin.com/publication/exploring-ai-futures-through-role-play/

While the topic is a bit different, I would expect there to be a lot to take from their work and experience (they have ran it many times and iterated the design). In particular, I would expect some of the difficulty balancing "realism" (or the space of our best guesses) with pl... (read more)

[4]  AI safety relevant side note: The idea that translations of meaning need only be sufficiently reliable in order to be reliably useful might provide an interesting avenue for AI safety research. [...]

I also see this as an interesting (and pragmatic) research direction. However, I think its usefulness hinges on ability to robustly quantify the required alignment reliability / precision for various levels of optimization power involved. Only then it may be possible to engineer demonstrably safe scenarios with alignment quality tracking the optimizat... (read more)

1Nora_Ammann
Mhh yeah so I agree these examples of ways in which language "fails". But I think they don't bother me too much?  I put them in the same category as "two agents with good faith sometimes miscommunicate - and still, language overall is pragmatically", or "works good enough". In other words, even though there is potential for exploitation, that potential is in fact meaningfully constraint. More importantly, I would argue that the constraint comes (in large parts) from the way the language has been (co-)constructed. 
1Nora_Ammann
Yeah, great point!
1Nora_Ammann
I agree and think this is a good point! I think on top of quantifying the required alignment reliability "at various levels of optimization" it would also be relevant to take the underlying territory/domain into account. We can say that a territory/domain has a specific epistemic and normative structure (which e.g. defines the error margin that is acceptable, or tracks the co-evolutionary dynamics).   

Thanks for the (very relatable) post!
I find slack extremely valuable. I agree with the observation that "real-world" slack likely isn't the main blocker of resourcefulness (as you use the term) for many people, including me. I am not sure I would call the bag of self-imposed limitations, licenses and roles also (a lack of) slack, though - at least for me "slack within identity/role" does not map to the meat of the problem of agency in ownership/role/something. I would be excited to read more on cultivating intrapersonal freedom!


I am unsure about the self-a... (read more)

Side-remark: Individual positions and roles in society seem to hold a middle ground here: When dealing with a concrete person who holds some authority (imagine a grant maker, a clerk, a supervisor, ...), modelling them internally as a person or as an institution brings up different expectations of motivations and values - the person may have virtues and honor where I would expect the institution to have rules and possibly culture (where principles may be a solid part of the rules or culture, but that feels somewhat less common, weaker or more prone to Goodharting as PR; I may be confused here, though).

This brings up the concept of theory of mind for me, especially when thinking about how this applies differently to individual people, to positions/roles in society, and e.g. to corporations. In particular, I would need to have a theory of mind of an entity to ascribe "honor" to it and expect it to uphold it.

A person can convince me that their mind is built around values or principles and I can reasonably trust them to uphold them in the future more likely than not. I believe that for humans, pretending is usually difficult or at least costly.

What a corpor... (read more)

2Tomáš Gavenčiak
Side-remark: Individual positions and roles in society seem to hold a middle ground here: When dealing with a concrete person who holds some authority (imagine a grant maker, a clerk, a supervisor, ...), modelling them internally as a person or as an institution brings up different expectations of motivations and values - the person may have virtues and honor where I would expect the institution to have rules and possibly culture (where principles may be a solid part of the rules or culture, but that feels somewhat less common, weaker or more prone to Goodharting as PR; I may be confused here, though).

Update: I believe the solution by @UnexpectedValues can be adapted to work for all natural numbers.

Proof: By Dirichlet prime number theorem, for every , there is a prime  of the form  as long as N-1 and N are co-prime, which is true whenever . Then and the solution by UnexpectedValues can be used with appropriate . Such p exists whenever which is always the case for . For , one can take e.g.  (since ). And  is trivial.

A beautiful solution! Some remarks and pointers

Note that your approach only works for target  (that is ), as  does not have a real solution. Open question: How about 1-coin solution emulating a 3-sided dice?

See my solution (one long comment for all 3 puzzles) for some thoughts on rational coins - there are some constructions, but more importantly I suspect that no finite set of rational coins without the 1/2-coin can generate any fair dice. (My lemma does not cover all rational coins yet, though; see by

... (read more)
1Tomáš Gavenčiak
Update: I believe the solution by @UnexpectedValues can be adapted to work for all natural numbers.

So, I got nerd-sniped by this (together with my flatmate on P3) and we spent some delicious hours and days in various rabbit-holes. Posting some thoughts here.

First of all, a meta-comment (minor spoiler on the nature of best-known solutions):

The first puzzle got me into an optimization mindset -- and it seems to be the right mindset for that problem -- but the optimal solutions to P2 and P3 (which I missed before reading spoilers) are "crisp" or "a neat trick" rather that a result of whatever matches a (however clever) "optimization process" for me. (If th

... (read more)

Complexity indeed matters: the universe seems to be bounded in both time and space, so running anything like Solomonoff prior algorithm (in one of its variants) or AIXI may be outright impossible for any non-trivial model. This for me significantly weakens or changes some of the implications.

A Fermi upper bound of the direct Solomonoff/AIXI algorithm trying TMs in the order of increasing complexity: even if checking one TM took one Planck time on one atom, you could only check cca 10^250=2^800 machines within a lifetime of the universe (~10^110 years until... (read more)

I think that sufficiently universally trusted arbiters may be very hard to find, but Alice can also refrain from that option to prevent the issue gaining more public attention, believing more attention or attention of various groups to be harmful. I can imagine cases, where more credible people (Carols) saying they are convinced that e.g. "it is really easily doable" would disproportionally give more incentives for misuse than defense (by the groups the information reaches, the reliability signals those groups accept etc).