True, I should have said leading commercial companies
While I broadly agree, I don't think it's completely dead, just mostly dead in the water. If an eval is mandated by law, then it will be run even it required logprobs. There are some libraries like nnsight that try to make this easier for trusted partners to run logprob evals remotely. And there might be privacy preserving API's at some point.
I do agree that commercial companies will never again open up raw logprobs to the public as it allows easy behaviour cloning, which OpenAI experienced with all the GPT4 students.
If true, returns the log probabilities of each output token returned in the content of message.
It seems like it only returns the logprobs of the chosen message, not of a counterfactual message. So you couldn't get the probabilities of the correct answer, only the output answer. This makes sense as the less information they offer, the harder it is for a competitor to behaviour clone their confidential model.
Have you considered using an idea similar to Schmidhuber's blogpost "Artificial Curiosity & Creativity Since 1990-91". Here you try to assess what might be called "learnable compression", "reducible surprise", or "understandable novelty" (however you want to frame it).
If an LLM, which has read the entire internet, is surprised by a text, then that's a good start. It means the text is not entirely predictable and therefore boring.
But what about purely random text! That's unpredictable, just like Einstein's Theory of General Relativity was. This is the noisy TV problem. So how do we distinguish between them. Well, Schmidhuber suggests that a text should be less surprising after you have read it. We could approximate this in LLM's by putting a summary in context, fine-tuning, adapter tuning, or similar.
This is a nice approach, because it would work for detecting human slop, too. And would be much better than plaugerism detectors which do not work.
I've had a few tries at implementing this using adapters, fine-tuning, in context-learning etc. I managed to get some promising results with fine-tuning, but this is a pretty resource intensive way to do it.
If we knew he was not a sociopath, sadist, or reckless ideologue,
He is also old, which means you must also ask about his age related cognitive and personality change. There were rumours that during covid he had become scared and rigid.
Personally, I think we need to focus not on his character but on 1) how much he cares, as this will decide how much he delegates 2) how much he understands, as we all risk death, but many do not understand or agree with this 3) how competent he currently is to execute his goals.
Xi rules China so thoroughly that he would personally make key decisions regarding AGI
Even if he had absolute power, it doesn't mean he won't delegate. After all, his time is limited.
So, does anyone know of good work addressing his character and personal beliefs? Or is this an interesting research topic for anyone?
This is hard to find the truth here because we have state level information warfare obscuring the truth. That means there is propaganda designed to deceive and obscure even a professional analyst with access to secret information. However we do have some state level analysis, available through WikiLeaks we can look at what the US diplomats think, in the leaked diplomatic cables ( also and )
- (C) According to a well connected Embassy contact, Politburo Standing Committee Member and Vice President Xi Jinping is "exceptionally ambitious," confident and focused, and has had his "eye on the prize" from early adulthood.
PolOff's contact ("the professor") and Xi Jinping were both born in 1953 and grew up in similar circumstances. ... The professor did not know Xi personally until they had both reached their late teens,
- (C) In the professor's view, Xi Jinping is supremely pragmatic, a realist, driven not by ideology but by a combination of ambition and "self-protection." The professor saw Xi's early calculations to carefully lay out a realistic career path as an illustration of his pragmatism.
- (C) Xi is a true "elitist" at heart,
I don't know how reliable these cables are, but they represent an interesting source.
As long as people realise they are betting on more than just a direction
Timing is particularly hard, and many great thinkers have been wrong on timing. You might also make the most rational bet, but the market takes another year to become rational.
Given that, Epoch AI predicts that energy might be a bottleneck it might be worth investing in energy. Coal is particularly cheap due to ESG regulations that prevent large funds from holding "dirty" energy.
Worth looking at the top ten holdings of these, to make sure you know what you are buying, and that they are sensible allocations:
It might be worth noting that it can be good to prefer voting shares, held directly. For example, GOOG shares have no voting rights to Google, but GOOGL shares do. There are some scenarios where having control, rather than ownership/profit, could be important.
NVDA's value is primarily in their architectural IP and CUDA ecosystem. In an AGI scenario, these could potentially be worked around or become obsolete.
This idea was mentioned by Paul Christiano in one of his podcast appearances, iirc.
I pretty much agree, in my experiments I haven't managed to get a metric that scales how I expect it too for example when using adapter fine-tuning to "learn" a text and looking at the percent improvement in perplexity, the document
openai_board_ann
appeared more novel thanwikipedia on LK-99
, but I would expect it to be the other way round since the LK-99 observations are much more novel and dense than a corporate announcement that is designed to be vague.However I would point out that gzip is not a good example of a compression scheme for novelty, as 1) it's a compression scheme that roughly about word duplication. A language model represents a much more sophisticated compression scheme that is closer to our understanding the text. If we want to measure novelty to us, then we probably want a compression that is similar to how our brain compresses information into memory. That way, something surprising to us, is also hard to compress. And I'd also point out that 2) gzip cannot learn (except in a very basic sense of increased context), so it cannot beat the noisy TV problem.
I agree, but it doesn't learn so it doesn't get past the noisy TV problem either, but that is central to Schmidhuber idea. If you are not familiar, the noisy TV problem is this:
"agents are rewarded for visiting regions of the state space that they have not previously occupied. If, however, a particular state transition is impossible to predict, it will trap a curious agent (Burda et al., 2019b; Schmidhuber, 1991a). This is referred to as the noisy TV problem (e.g. (Burda et al., 2019b; Schmidhuber, 1991a)), the etymology being that a naively curious agent could dwell on the unpredictability of a noisy TV screen" from "How to Stay Curious while avoiding Noisy TVs using Aleatoric Uncertainty Estimation"
I agree, this is true of most of Schmidhuber ideas. Often he does even produce a toy model for years, which means the ideas are generally not very useful. I do like this one, and it has led to some implementations in RL.
I do agree, perplexity doesn't seem like a great place to start, and your ideas seem like a better way to measure.