Make base models great again. I'm still nostalgic for GPT-2 or GPT-3. I can understand why RLHF was invented in the first place but it seems to me that you could still train a base model, so that if it's about to say something dangerous, it just prematurely cuts off the generation by emitting the <endoftext>
token instead.
Alternatively, make models natively emit structured data. LLMs in their current form emit free-form arbitrary text which needs to be parsed in all sorts of annoying ways in order to make it useful for any downstream applications anyways. Also, structured output could help with preventing misaligned behavior.
(I'm less confident in this idea than the previous one.)
Try to wean people off excessive reliance on LLMs. This is probably the biggest source of AI-related negative effects today. I am trying to do this myself (I formerly alternated between claude, chatgpt, and lmarena.ai several times a day), but it is hard.
(By AI-related risks I mean effects like people losing their ability to write originally or think independently.)
"AI that can verify itself" seems likely doable for reasons wholly unrelated to metamathematics (unlike what you claim offhandedly) since AIs are finite objects that nevertheless need to handle a combinatorially large space. This has the flavor of "searching a combinatorial explosion based on a small yet well-structured set of criteria" (ie the relatively easy instances of various NP problems), which has had a fair bit of success with SAT/SMT solvers and nonconvex optimizers and evolutionary algorithms and whatnot. I don't think constructing a system that ...
Speaking of MathML are there other ways for one to put mathematical formulas into html? I know Wikipedia uses <math> and its own template {{math}} (here's the help page), but I'm not sure about any others. There's also LaTeX (which I think is the best program for putting mathematical formulas into text in general), as well as some other bespoke things in Google Docs and Microsoft Word that I don't quite understand.
Thank you for building this! I have just signed up for it.
I've noticed that two of the three Manifold markets (Will a nuclear weapon detonate in New York City by end of 2023? and Will a nuclear weapon cause over 1,000 deaths in 2023?) could use a few thousand mana in subsidies to reduce the chance of a false alarm, even though both are moderately well-traded already. (I've just bet both of them down, but I personally don't have enough mana to feel comfortable subsidizing both.)
I think this issue could be fixed by lengthening the message of the phone calls (if it ever gets sent out) to also quote all the comments on the sentinel markets from the last ~week before the trigger time. The reason why is that I expect, if there were to ever be legitimate signs of a impending nuclear war, that people would leave plenty of comments on the relevant markets about these signs.
Two recent things that will likely affect this meetup:
Thanks for your attention!
Here's a manually sorted list of meetup places in the USA, somewhat arbitrarily/unscientifically grouped by region for even greater convenience. I spent the past hour on this, so please make good use of it. (Warning: this is a long comment.)
NEW ENGLAND
MID-ATLANTIC
I haven't used GPT-4 (I'm no accelerationist, and don't want to bother with subscribing), but I have tried ChatGPT for this use. In my experience it's useful for finding small cosmetic changes to make and fixing typos/small grammar mistakes, but I tend to avoid copy-pasting the result wholesale. Also I tend to work with texts much shorter than posts, since ChatGPT's shortish context window starts becoming an issue for decently long posts.
Hello LessWrong! I'm duck_master. I've lurked around this website since roughly the start of the SARS-CoV-2/COVID-19 pandemic but I have never really been super active as of yet (in fact I wrote my first ever post last month). I've been around on the AstralCodexTen comment section and on Discord, though, among a half-dozen other websites and platforms. Here's my personal website (note: rarely updated) for your perusal.
I am a lifelong mathematics enthusiast and a current MIT student. (I'm majoring in mathematics and computer science; I added the latter part...
Thank you for creating this website! I’ve signed up and started contributing.
One tip I have for other users: many of the neurons are not about vague sentiments or topics (as in most of the auto-suggested explanations), but are rather about very specific keywords or turns of phrase. I’d even guess that many of the neurons are effectively regexes.
Also apparently Neuronpedia cut me off for the day after I did ~20 neuron puzzles. If this limit could be raised for power users or something like that, it could potentially be beneficial.
Exactly what it says on the tin.
Thoughts I want to expand on for later:
This is an extremely important point. (I remember thinking a long time ago that Wikipedia just Exists, and that although random people are allowed to edit it, doing it is generally Wrong.) FWIW I'm an editor now - User:Duckmather.
In fact, organized resources like Wikipedia, LW sequences, SEP, etc. are basically amortized scholarship. (This is particularly true for Wikipedia; its entire point is that we find vaguely-related content from around - or beyond - the web and then paraphrase it into a mildly-coherent article. Source: am wikipedia editor.)
I also agree that, for the purpose of previewing the content, this post is poorly titled (maybe it should be titled something like "Having bad names makes you open the black box of the name", except more concise?), although, for me, I didn't as much stick to a particular wrong interpretation as just view the entire title as unclear.
Thanks for the reply. I take it that not only are you interested in the idea of knowledge, but that you are particularly interested in the idea of actionable knowledge.
Upon further reflection, I realize that all of the examples and partial definitions I gave in my earlier comment can in fact be summarized in a single, simple definition: a thing X has knowledge of a fact Y iff it contains some (sufficiently simple) representation of Y. (For example, a rock knows about the affairs of humans because it has a representation of those affairs in the form o...
Since this is a literally a question about soliciting predictions, it should have one of those embedded-interactive-predictions-with-histograms gadgets* to make predicting easier. Also, it might be worth it to have two prediction gadgets, since this is basically a prediction: one gadget to predict what Recognized AI Safety Experts (tm) predict about how much damage unsafe AIs will do, and one gadget to predict about how much damage unsafe AIs will actually do (to mitigate weird second-order effects having to do with predicting a prediction).
*I'm not sure what they're supposed to be called.
Au contraire, I think that "mutual information between the object and the environment" is basically the right definition of "knowledge", at least for knowledge about the world (as it correctly predicts that all four attempted "counterexamples" are in fact forms of knowledge), but that the knowledge of an object also depends on the level of abstraction of the object which you're considering.
For example, for your rock example: A rock, as a quantum object, is continually acquiring mutual information with the affairs of humans by the imprinting of subatomic in...
I agree with you (meaning G Gorden Worley III) that Wikipedia is reliable, and I too treat it as reliable. (It's so well-known as a reliable source that even Google uses it!) I also agree that an army of bots and humans undo any defacing that may occur, and that Wikipedia having to depend on other sources helps keep it unbiased. I also agree with the OP that Wikipedia's status as not-super-reliable among the Powers that Be does help somewhat.
So I think that the actual secret of Wikipedia's success is a combination of the two: Mild illegibility prevents ram...
Sorry I'm arriving late