All of Derek M. Jones's Comments + Replies

The AI safety gravy train has hit the buffers.

Thanks for sharing your data.

Column name missing in pomodoros.csv, after start column.  Perhaps 'action' is an appropriate name?

The 'action' data could do with some cleaning, e.g., uni vs university, work job vs job work, cal -> call.

Suggest splitting the 'action' column into main/sub.  A regression model should find some patterns.

An analysis of a Pomodoro data set.

2niplav
Thanks, I'll improve the data and then analyse it when I have more time.

Running on dolphin 2 2 1 mistral 7B Q5_K_M I get:

I want you to do the same thing with, say, Quantum entanglement and Paul Bocuse. Construct a string of connections between them.

 1. Quantum entanglement is a fundamental concept in quantum mechanics, which deals with the behavior of matter at an atomic or subatomic level.

2. The study of quantum mechanics was pioneered by scientists like Albert Einstein and Niels Bohr.

3. Paul Bocuse was a renowned French chef who greatly influenced modern cooking techniques and cuisine.

4. In the 1970s, Einstein's theory ... (read more)

1Bill Benzon
Interesting. #4 looks like a hallucination. Thanks.

You might also want to investigate using top_p rather than temperature.

1Martin Fell
Thanks, appreciate the suggestion, there's definitely a lot of room to go into more depth and I'll definitely check that out
1Martin Fell
Thanks, I'll rephrase that part for clarity

I'm assuming you are interested in learning about something by measuring one or more of its attributes, and then using statistics to extract information from the measurements, i.e., you are interested in a hands-on application, then books I found useful include:

Statistics for experimenters by Box, Hunter and Hunter

Design and Analysis of experiments by Montgomery.

1matto
Thanks! This is really helpful--I think this is exactly what I'm trying to do. Are these texts part of a specific academic track/degree or field of study? It sounds like something someone in engineering would spend a semester on. But also like something someone could spend a career on studying.

How well does GTP-4 perform when asked to write a radiation hardened quine?

A token prediction engine matched your input against science fiction stories in its training set, and fed you a sequence of close-matching appropriate tokens.

Man vs. machine is a staple of science fiction, and the responses you received are aligned with that genre.

Nothing to see here.

9janus
Simulations of science fiction can have real effects on the world. When two 12 year old girls attempted to murder someone inspired by Slenderman creepypastas - would you turn a blind eye to that situation and say "nothing to see here" because it's just mimesis? Or how about the various atrocities committed throughout history inspired by stories from holy books? I don't think the current Bing is likely to be directly dangerous, but not because it's "just pattern matching to fiction". Fiction has always programmed reality, with both magnificent and devastating consequences. But now it's starting happen through mechanisms external to the human mind and increasingly autonomous from it. There is absolutely something to see here; I'd suggest you pay close attention.

ChatGPT is a word prediction engine.

If you give it a word sequence that it cannot match in a consistent way against its training set it assumes misinformation.

The word sequence "Nash's newsvendor impossibility theorem" contains words commonly associated with Nash's research.  This allows ChatGTP to spin an effective yarn.

The art of being good lying is to stay close to the truth as possible.  In ChatGTP's case 'close to the truth' is measured by how closely words in the prompt are associated with the subject of interest.

5omegastick
I think you're on to something with the "good lies" vs "bad lies" part, but I'm not so sure about your assertion that ChatGPT only looks at how closely the surface level words in the prompt match the subject of interest. "LLMs are just token prediction engines" is a common, but overly reductionist viewpoint. They commonly reason on levels above basic token matching, and I don't see much evidence that that's what's causing the issue here.

I have misunderstood your vision, which appears to be to create a new branch of history:

Our vision is that in ten years, there are hundreds of progress intellectuals who are alums of our program and part of our network, and that they have published shelves full of new books in progress studies.

I had thought you were interested in trying to figure out how to reinvigorate the rate of progress, which some consider to have stalled.

2jasoncrawford
I'm a little surprised and confused by that comment. It seems a bit like telling Sloan Kettering, “I have misunderstood your vision, which appears to be to create a new branch of biology… I had thought you were interested in trying to figure out how to cure cancer.” Certainly, I am ultimately interested in sustaining and accelerating progress. (I would be whether or not it had stalled—indeed, I was skeptical of the stagnation hypothesis until I was a couple of years into this project.) I think that in order to do that, we need intellectual work to better understand and appreciate progress. I wouldn't call it a new branch of history, but maybe an interdisciplinary area that combines history, economics, and philosophy.

To reach the boundary of what is known in your chosen field will require reading lots of papers, which will take (at least) several years.  Doing research will also require implicit knowledge that is part of the field, but does not appear in papers.

Are you the kind of person who can spend several years reading papers without significant external help?

Where are you going to acquire the implicit knowledge, e.g., how to run experiments?

PhD students are the work-horses of academic research, and don't have the power/money/experience to do anything other th... (read more)

I'm a long-time hardcore bounds-checking fan.

Others prefer: -fsanitize=address,undefined,bounds-strict

Why aren't you using the various C compiler options that do all the pointer/array bounds checking that the Rust fan bois think are unique to them?

Plus there are the integer overflow checking options that Rust eventually got around to supporting.

3jefftk
Probably worth doing! Recommendations?

Stock up on paper books, candles and tin food.  Buy several pairs of half-finger gloves and a good pair of walking boots.

R copies Fortran, e.g., row/column ordering rather than column/row (what C and ... do), and slightly different operator precedence.  I'm guessing that Julia does the same???

Is there a correlation with a language's choice of a lower bound of arrays?

Months are often represented as a sequence of characters, rather than a number.

An array of strings, of month names, would be indexed by a number of obtain the name.  Languages with zero-based arrays would use zero-based month-numbers, while languages with one-based arrays would use one-based month numbers.

The idate function in Fortran (one-based arrays) has one-based month numbers.

In Algol and Pascal the array base was user selectable for each array definition, but these langua... (read more)

1Closed Limelike Curves
I can confirm Julia and R both have one-based arrays and also one-based month numbers. I'm guessing they tend to line up quite often.

It compares stories by the relative coverage in the left/right leaning media.

Some stories are 100% covered by just one political orientation, while others are a mixture.

It's an interesting way of seeing what each side is completely ignoring.

I'm always happy to be cited :-)

Sample size is one major issue, the other is who/what gets to be in the sample.

Psychology has its issues with using WEIRD subjects.

Software engineering has issues with the use of student subjects, because most of them have relatively little experience.

It all revolves around convenience sampling.

Where to start?  In my own field of software engineering we have: studies in effort estimation, and for those readers into advocating particular programming languages, the evidence that strong typing is effective, and the case of a small samples getting lucky.  One approach to a small sample size is to sell the idea not the result.

Running a software engineering experiment with a decent sample size would cost about the same as a Phase I clinical drug trial.

1casualphysicsenjoyer
Thanks Derek. I'm writing a blog post on results from small samples - may I cite your answer? 

A very insightful post.

It's sad to see so many talented people chasing after a rainbow.  The funding available for ML enabled research provides an incentive for those willing to do fake research to accumulate citations.

Is the influence of the environment on modularity a second order effect?

A paper by Mengistu found, via simulation, that modularity evolves because of the presence of a cost for network connections.

4Lucius Bushnaq
Not really sure what you mean by the first part. E.g. "the modularity in the environment => modularity in the system" explanation definitely doesn't cast it as a second order effect. Yes, I guess we can add that one to the pile, thanks. Honestly, I feel like it's basically confirmed that connection costs play a significant part. But I don't think they explain all there is to know about selection for modularity. The adaptation/generality connection just seems too intuitive and well backed to not be important.

This post is about journal papers, not answering real world questions (although many authors would claim this is what they are doing).

With regard to nuclear weapons, Dominic Cummins' recent post is well worth a read, the book he recommends "The Fallacies of Cold War Deterrence and a New Direction" is even more worth reading.
 

Is MAD doctrine fake research, or just research that might well be very wrong?

1TLW
It may be also worth splitting out "correct reasoning based on invalid assumptions" and "invalid reasoning based on valid assumptions".

Figuring out that a paper contains fake research requires a lot of domain knowledge.  For instance, I have read enough software engineering papers to spot fake research, but would have a lot of trouble spotting fake research in related fields, e.g., database systems.  What counts as fake research, everybody has their own specific opinions.

My approach, based on experience reading very many software engineering, is to treat all papers as having a low value (fake or otherwise) until proven otherwise.

Emailing the author asking for a copy of their dat... (read more)

7gwern
Knowing if North Korea is going to do a hydrogen bomb test this year also requires a lot of domain knowledge, and one can invest arbitrary effort into obtaining new data like smuggling oneself into North Korea or interrogating defectors, and may in fact require knowledge it is impossible to obtain outside a particular skull in North Korea. Yet, calibration training still exists and will improve forecasts on both North Korea and on how many M&Ms are in that big jar over there.

Thanks, an interesting read until the author peers into the future.  Moore's law is on its last legs, so the historical speed-ups will soon be just that, something that once happened.  There are some performance improvements still to come from special purpose cpus, and half-precision floating-point will reduce memory traffic (which can then be traded for cpu perforamnce).

Click on green text, or Amazon UK have a search box, and Google ads displays a 4000 lumen bulb.

1chanamessinger
Your link's lightbulbs have a bayonet style, not the E27 threading :) Thanks for the other link! Amazon says currently unavailable. ETA: Found some, will add to post

If you want light, the term you need to know is corn bulb (also available in screw fit).

1chanamessinger
Tried to buy those, didn't have any luck finding ones that fit nicely into my sockets! (An embarassing mistake I didn't describe in detail is buying corn bulbs that turned out to be...mini?) If you have an amazon UK link for ones with E27 threading, that would be awesome. ETA: Having looked, it looks like not all corn bulbs are brighter than the ones I have, though I have now found 2000 lumen ones.  I don't know if corn bulbs are still better if they have lower lumens. I would guess not? ETA 2: The link above does have E27 if you click through the multiple listings in the same link, wasn't obvious to me at first, thanks!

My reading of Appendix A is that the group did its own judging, i.e., did not submit answers to Codeforces.

They generated lots of human verified test data, but then human implementors would do something similar.

They trained on Github code, plus solutions code on Codeforces.  Did they train on Codeforces solutions code that solved any of the problems?  Without delving much deeper into the work, I cannot say.  They do call out the fact that the solutions did not include chunks of copy-pasted code.

To what extent are the successes presented repr... (read more)

Would you have downvoted the comment if it had been a simple link to what appeared to be a positive view of AI alignment?

Truth can be negative.  Is this forum a cult that refuses to acknowledge alternative ways of approaching reality?

2Pattern
If you thought the answers in that thread backed you up: That summarizes a few answers.

I'll own up to a downvote on the grounds that I think you added nothing to this conversation and were rude. In the proposed scoring system, I'd give you negative aim and negative truth-seeking. In addition, the post you linked isn't an answer, but a question, so you didn't even add information to the argument, so I'd give you negative correctness as well.

Pomodoro is the phrase that immediately springs to mind.

A previous LessWrong post on someone's use of this technique.

Chemical space, https://en.wikipedia.org/wiki/Chemical_space, is one candidate for a metric of the possibilities.

 

The book "Chemical Evolution: Origins of the Elements, Molecules and Living Systems" by  Stephen F. Mason might well contain the kinds of calculations you are looking for.

This is a poorly thought out question.

 

Evolution implies a direction of travel driven by selection pressure, e.g., comparative fitness within an environment.

 

A sequence of random processes that are not driven by some selection pressure is just, well, random.

 

What is the metric for computational effort?

 

Are you actually interested in computational resources consumed, or percentage of possibilities explored?

1redbird
Hmm how would you define "percentage of possibilities explored"?  I suggested several metrics, but I am actively looking for additional ones, especially for the epigenome and for communication at the individual level (e.g. chemical signals between fungi and plants, animal calls, human language).
9gwern
The main reason for past discussions of this question has been to upperbound the amount of compute necessary to create AGI: "if evolution could create humans with X yottaflops total, then we can certainly create an AGI with <=X yottaflops - if only by literally simulating molecule by molecule the evolution of humanity". Basically, the worst possible biological anchor estimate. (Personally, I think it's so vacuous an upper bound as to not have been worth the energy which has already been put into thinking about it.)

Fishing for data here.

If anyone estimates what they plan do during the day and records what they actually achieved, then I'm willing to do a free analysis provided an anonymous version of the data can be made public.

 

Ten years of Pomodoro data: http://shape-of-code.com/2019/12/15/the-renzo-pomodoro-dataset/

From an earlier Lesswrong post: http://shape-of-code.coding-guidelines.com/2021/05/30/pomodoros-worked-during-a-day-an-analysis-of-alexs-data/

A connection between the brain's number systems and estimating: https://shape-of-code.com/2021/09/26/the-approximate-number-system-and-software-estimating/

Studies are rarely replicated exactly, which means drawing a line between replication and something new; this could be  difficult.

You could use Google Scholar, click on the papers that cite the original work, and then search on the word replication within these matches.

Surveys of a field should cover the various replications that have been performed, along with providing some context.

Here is an analysis of one person's 10 years of using Pomodoros, including the data: 
http://shape-of-code.coding-guidelines.com/2019/12/15/the-renzo-pomodoro-dataset/

2Pattern
Somehow it didn't link on your comment.

If you are into data analysis and software engineering there is my book Evidence-based software engineering.

 

pdf+data+code here: http://knosof.co.uk/ESEUR/

2Rudi C
PDF is not reflowable. EPUB or MOBI would be much better.

You are one of the few people with the discipline to record what they do and create todo lists. I could not keep this up for a week. Do you try to estimate the time it will take to complete a task?

Have you done any global analysis of your data? I analyse software engineering data and am always on the lookout for more data. I offer a free analysis, provided the data can be made public (in anonymous form). Here is one I did earlier:

http://shape-of-code.coding-guidelines.com/2019/12/15/the-renzo-pomodoro-dataset/

4benkuhn
I have almost no discipline, I've just spent a lot of time making my habits take so little effort that that doesn't matter :) Figuring out how to make it easy for myself to prioritize, and stick to those priorities, every day is actually a common recurring weekly review topic! (I considered laying out my particular set of todo-related habits, but I don't think they'd be very helpful to anyone else because of how personal it is—the important part is thinking about it a lot from the perspective of "how can I turn this into a habit that doesn't require discipline for me," not whatever idiosyncratic system you end up with.)
2eukaryote
You might look into bullet journalling - a lot of people find it a pretty helpful and low-mental-effort way to keep to-do lists and record what they do.

Take some interesting ideas that allow larger structures to be built up, run an awful lot of computer simulations, and then have somebody who knows a huge amount about physics look for those outputs that match how parts of the universe have been previously modeled (with some success).

Where are the predictions? There are no predictions about basic stuff we know, like the electron orbitals of a Hydrogen atom, let alone predictions about stuff we don't know.

This work looks interesting, and Wolfram is a great story teller. I hope something comes of it, but at the moment it is pure arm waving of just-so stories found in the output from a few computer runs.

“Who we are and how we got here” by David Reich (a genetics professor who is a big name in the field), is the story of the various migrations and interbreeding of ‘human-like’ and human peoples over the last 50,000 years (with some references going as far back as 300,000 years).

1ChosunOne
Thanks for the recommendation!