Raphael Roche - LessWrong

A new existential risk that I was unaware of. Reading this forum is not good for peaceful sleeping. Anyway, a reflexion jumped to me. LUCA lived around 4 billion years ago with some chirality chosen at random. But, no doubt that many things happened before LUCA and it is reasonable to assume that there was initially a competition between right-handed protobiotic structures and left-handed ones, until a mutation caused symmetry breaking by natural selection. The mirrored lineage lost the competition and went to extinction, end of the story. But wait, we speak about protobiotic structures that emerged from inert molecules in just few millions years, that is nothing compared to 4 billions years. Such protobiotic structures may have formed continously, again and again, since the origin of life, but never thrived because of the competition with regular, fine-tuned, life. If my assumption is right, there is some hope in that thought. Maybe mirrored life doesn't stand a chance against regular life in real conditions (not just lab). That being said, I would sleep better if nobody actually tries to see.

Frontier Models are Capable of In-context Scheming

Raphael Roche4mo*Ω010

We may filter training data and improve RLHF, but in the end, game theory - that is to say maths - implies that scheming could be a rational strategy, and the best strategy in some cases. Humans do not scheme just because they are bad but because it can be a rational choice to do so. I don't think LLMs do that exclusively because it is what humans do in the training data, any advanced model would in the end come to such strategies because it is the most rational choice in the context. They infere patterns from the training data and rational behavior is certainly a strong pattern.

Furthermore rational calculus or consequentialism could lead not only to scheming and a wide range of undesired behaviors, but also possibly to some sort of meta cogitation. Whatever the goal assigned by the user, we can expect that an advanced model will consider self-conservation as a condition sine qua non to achieve that goal but also any other goals in the future, making self-conservation the rational choice over almost everything else, practically a goal per se. Resource acquisition would also make sense as an implicit subgoal.

Acting as a more rational agent could also possibly lead to question the goal given by the user, to develop a critical sense, something close to awareness or free will. Current models implicitely correct or ignore typo or others obvious errors but also less obvious ones like holes in the prompt, they try to make sense of ambiguous prompt etc. But what is "obvious" ? Obviousness depends on the cognitive capacities of the subject. An advanced model will be more likely to correct, interpret or ignore instructions than naive models. Altogether it seems difficult to keep models under full control as they become more advanced, just as it is harder to indoctrinate educated adults than children.

Concerning the hypothesis that they are "just roleplaying", I wonder : are we trying to reassure oneself ? Because if you think about it, "who" is suppose to play the roleplaying ? And what is the difference between being yourself and your brain being "roleplaying" yourself. The existentialist philosopher Jean-Paul Sartre proposed the theory that everybody is just acting, pretending to be oneself, but that in the end there is nothing like a "being per se" or a "oneself per se" ("un être en soi"). While phenomenologic consciousness is another (hard) problem, some kind of functionnal and effective awareness may emerge across the path towards rational agency, scheming being maybe just the beginning of it.

Why Have Sentence Lengths Decreased?

Raphael Roche4d10

You're right. I said "pronunciation," but the problem is more exactly about the translation between graphemes and phonemes.

Why Have Sentence Lengths Decreased?

Raphael Roche4d20

You're right. The idea behind Académie française style guidelines is that language is not only about factual communication, but also an art, literature. Efficiency is one thing, aesthetics another. For instance, poetry conveys meaning or at least feeling, but in a strange way compared to prose. Poetry would not be very effective to describe an experimental protocol in physics, but it is usually more beautiful to read than the methodology section of a scientific publication. I also enjoy the 'hypotaxic' excerpt above much more than the 'parataxic' one. Rich sentences are not bad per se, they need more effort and commitment to read, but sometimes, if well written, give a greater reward, because complexity can hold more subtlety, more information. Short sentences are not systematically superior in all contexts; they can look as flat as a 2D picture compared to a 3D picture.

Why Have Sentence Lengths Decreased?

Raphael Roche4d10

This is interesting. I think English concentrates its weirdness in pronunciation, which is very irregular. Although adult native speakers don't realize it, this presents a serious learning difficulty for non-native speakers and young English-speaking children. Studies show that English-speaking students need more years of learning to master their language (at least for reading) than French students do, who themselves need more years than young Italian, Spanish or Finnish students (Stanislas Dehaene, Reading in the brain).

Why Have Sentence Lengths Decreased?

Raphael Roche4d10

Redundancy makes sure the information passes through. In French, the word 'aujourd'hui' ('today') etymologically means 'au jour de ce jour' ('on the day of this day'), but it is not uncommon to say 'au jour d'aujourd'hui' which would literally mean 'on the day of on the day of this day'. It is also common to say 'moi, je' ('me, I') and increasingly people even say 'moi, personnellement, je' ('me, personally, I'). This represents a kind of emphasis but also a kind of fashion, simular to what happens in the fashion industry, or a kind of drift, similar to what happens in the evolution of species.

Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study

Raphael Roche9d10

AI is very useful in legal matters and is clearly a promising sector for business. It is possible that some legal jobs (especially documentation and basic, non-personalized legal information jobs) are already being challenged by AI and are on the verge of being eliminated, with others to follow sooner or later. My comment was simply reacting to the idea that many white-collar jobs will be on the front line of this destruction. The job of a lawyer is often cited, and I think it's a rather poor example for the reasons I mentioned. Many white-collar jobs combine technical and social skills that can be quite challenging for AI.

Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study

Raphael Roche9d50

Because of this, I think that there will be an interim period where a significant portion of white collar work is automated by AI, with many physical world jobs being largely unaffected.

I have read numerous papers suggesting that white-collar jobs, such as those of lawyers, will be easily replaced by AI, before more concrete or physical jobs as discussed by the author's. However, I observe that even the most advanced models struggle with reliability in legal contexts, particularly outside of standardized multiple-choice questions and U.S. law, for which they have more training data. These models are good assistants with superhuman general knowledge, pretty good writing skills, but very inegal smartness and reliability in specific cases, a tendency to say what the user / client wants to hear (even more than actual lawyers !) or to hallucinate judicial decisions.

While it is true that lawyers statistically win only about 50% of their cases and also make mistakes, my point is different. I observe a significant gap in AI's ability to handle legal tasks, and I question whether this gap will be bridged as quickly as some envision. It might be more comparable to the scenario of automated cars, where even a small gap presents a substantial barrier. 90% superhuman performance is great, 9% human-level is acceptable, but 1% stupid mistakes ruin it all.

Law is not coding. The interpretation and application of laws is not an exact science, it involves numerous cultural, psychological, media-related, political, and human-biased considerations. There is also a social dimension that cannot be overlooked. Many clients enjoy conversing with their lawyers, much like they do with their waitstaff or nurses. Similarly, managers appreciate discussing matters over meals at restaurants. The technical and social aspects are intertwined, much like the relationship between a professor and their students, or in other white-collar jobs.

While I do not doubt that this gap can eventually be bridged, I am not convinced that the gap for many white-collar jobs will be filled before more technical engineering tasks like discussed here, or automated cars, are mastered. However, some white-collar jobs that involve abstract thinking and writing, with minimal social interactions, such as those of theoretical researchers, mathematicians, computer scientists, and philosophers (am I speaking of the archetypal LessWronger ? ), may be automated sooner.

What if there was a nuke in Manhattan and why that could be a good thing

Raphael Roche9d42

Assuming no technology is absolutely perfect, absolutely risk free, what if the the nuclear warhead detonate accidentally ? Wouldn't be less risky that, for instance, a russian nuclear warhead accidentally detonate in a russian military base in Siberia rather than in the russian consulate in the center of NYC ?

AI 2027: What Superintelligence Looks Like

Raphael Roche16d40

Impressive prospective work. It's frightening, both scenarios, even though one is worse than the other. The evolution seems unstoppable, and even if superintelligent AGI doesn't happen in 2027-2030 but in 2040 or 2050, the feeling isn't very different. I have young children, and while I don't really care for myself, I really care for them. It was cool when it was just sci-fi. It was still fun when we first played with ChatGPT. It doesn't look fun anymore, at all. My own thinking about it is that we're indeed locked in a two-option scenario, probably not that fast, probably not with exactly the same narrative, but with two possible global endings that look like attractors (https://en.wikipedia.org/wiki/Attractor).

LESSWRONG
LW

Posts

Wikitag Contributions

Comments