All of jacopo's Comments + Replies

jacopo32

The point is that if the majority of the "cost of crime" is actually the cost of preventing potential crime, then it's not obvious at all that more crime prevention will help.

Sure, sometimes it's better to shift from private prevention (behavior change) to collective prevention (policing) at the margin, but not always.

jacopo61

I disagree. You seem to think that the list of missing technologies sketched by Crawford is exhaustive, but it's not. One example that ties in your conclusions: paper. Maybe the Romans could have invented the printing press, I'm not sure, but printing on super-expensive vellum or papyrus is pointless.

And it's just one example. I make another. The Romans spread and improved watermills, so they were interested in labor-saving technology contra your argument. But their mills were not as good or widespread as modern or even late medieval ones. (mill technology was very important to the industrial revolution as you mention too)

jacopo10

You could also try to fit an ML potential to some expensive method, but it's very easy to produce very wrong things if you don't know what you're doing (I wouldn't be able for one)

jacopo10

Ahh for MD I mostly used DFT with VASP or CP2K, but then I was not working on the same problems. For thorny issues (biggish and plain DFT fails, but no MD) I had good results using hybrid functionals and tuning the parameters to match some result of higher level methods. Did you try meta-GGAs like SCAN? Sometimes they are suprisingly decent where PBE fails catastrophically...

1jacopo
You could also try to fit an ML potential to some expensive method, but it's very easy to produce very wrong things if you don't know what you're doing (I wouldn't be able for one)
jacopo20

My job was doing quantum chemistry simulations for a few years, so I think I can comprehend the scale actually. I had access to one of the top-50 supercomputers and codes just do not scale to that number of processors for one simulation independently of system size (even if they had let me launch a job that big, which was not possible)

3[anonymous]
.
jacopo12

Isn't this a trivial consequence of LLMs operating on tokens as opposed to letters?

1Bruce W. Lee
I initially thought so until the GPT-4 results came back. "This is an inevitable tokenizer-level deficiency problem" approach doesn't trivially explain GPT-4's performance near 80% accuracy in Table 6 <https://arxiv.org/pdf/2402.11349.pdf, page 12>. Whereas most others stay at random chance. If one model does solve these tasks, it would likely mean that these tasks can be solved despite the tokenization-based LM approach. I just don't understand how.
jacopo*10

True, but this doesn't apply to the original reasoning in the post - he assumes constant probability while you need increasing probability (as with the balls) to make the math work.

Or decreasing benefits, which probably is the case in the real world.

Edit: misred the previous comment, see below

2DanielFilan
My comment involves a constant probability of the bad outcome with each draw, and no decreasing benefits. I think this is a good exposition of this portion of the post (which I wrote), if you assume that each unit of bio progress is equally good, but that the goods don't materialize if we all die of a global pandemic:
jacopo10

It seems very weird and unlikely to me that the system would go to the higher energy state 100% of the time

I think vibrational energy is neglected in the first paper, it would be implicitly be accounted for in AIMD. Also, the higer energy state could be the lower free energy state - if the difference is big enough it could go there nearly 100% of the time.

jacopo74

Although they never take the whole supercomputer, so if you have the whole supercomputer for yourself and the calculations do not depend on each other you can run many in parallel

1[anonymous]
.
jacopo73

That's one simulation though. If you have to screen hundreds of candidate structures, and simulate every step of the process because you cannot run experiments, it becomes years of supercomputer time.

7jacopo
Although they never take the whole supercomputer, so if you have the whole supercomputer for yourself and the calculations do not depend on each other you can run many in parallel
jacopo20

There are plenty of people on LessWrong who are overconfident in all their opionions (or maybe write as if they are, as a misguided rhetorical choice?). It is probably a selection effect of people who appreciate the sequences - whatever you think of his accuracy record, EY definitely writes as if he's always very confident in his conclusions.

Whatever the reason, (rhetorical) overconfidence is most often seen here as a venial sin, as long as you bring decently-reasoned arguments and are willing to change your mind in response to other's. Maybe it's not your... (read more)

jacopo*241

(Phd in condensed matter simulation) I agree with everything you wrote where I know enough (for readers, I don't know anything about lead contacts and several other experimental tricky points, so my agreement should not be counted too much).

I just add on the simulation side (Q3): this is what you would expect to see in a room-T superconductor unless it relies on a completely new mechanism. But, this is something you see also in a lot of materials that superconduct at 20K or so. Even in some where the superconducting phase is completely suppressed by mage... (read more)

jacopo10

Is there something that would regularise the vectors towards constant norm? An helix would make a lot of sense in this case. Especially one with varying radius, like in some (not all) the images

jacopo30

I don't think it would change your conclusion but your kettle was not very scaly. My gets much worse than that, with the resistence entirely covered by a thick layer, despite descaling 3-4 times per year. It depends on the calc content of your tap water. I still don't think it affects energy use (maybe?), but the taste can be noticeable and I feel tea is actually harder to digest if I put off the descaling.

Also, you can use citric acid instead of vinegar. Better for the environment, less damaging to the kettle and it doesn't smell :)

2philh
Huh, I knew different water sources would scale more or less, but I didn't realize it would make that much difference. I don't have citric acid and dunno if I'd use it for anything else, so vinegar is fine for me. But if I did descale frequently I'd indeed prefer something that doesn't smell.
jacopo10

Well stated. I would go even further: the only short timeline scenario I can immagine involves some unholy combination of recursive LLM calls, hardcoded functions or non-LLM ML stuff, and API calls. There would probably be space to align such a thing. (sort of. If we start thinking about it in advance.)

jacopo30

Isn't that the point of the original transformer paper? I have not actually read it, just going by summaries read here and there.

If I don't misremember RNN should be expecially difficult to train in parallel

6Carl Feynman
Transformers take O(n^2) computation for a context window of size n, because they effectively feed everything inside the context window to every layer.  It provides the benefits of a small memory, but it doesn’t scale. It has no way of remembering things from before the context window, so it’s like a human with a busted hippocampus (Korsakoff’s syndrome) who can‘t make new memories.
jacopo52

That seem reasonable, but it will probably change a number of correct answers (to tricky questions) as well if asked whether it's certain. One should verify that the number of incorrect answers fixed is significantly larger than the number of errors introduced.

But it might be difficult to devise a set of equally difficult questions for which the first result is different. Maybe choose questions where different instances give different answers, and see if asking a double check changes the wrong answers but not the correct ones?

0[anonymous]
Right. I see this as a problem also, asking the model if it's sure is injecting information if we only ask on wrong answers. If we ask always it may disturb more right answers than it fixes wrong ones. Its also accuracy dependent - if the model is 99 percent accurate on a subtask then asking if it's sure may degrade accuracy, while it may improve it on a subtask it's 50 percent accurate on. Or in other words, we could prompt it and it might do better on AP English but less good on the bar exam.
jacopo75

Good post, thank you for it. Linking this will save me a lot of time when commenting...

However I think that the banking case is not a good application. When one bank fails, it makes much more likely that other banks will fail immediately after. So it is perfectly plausible that two banks are weak for unrelated reasons, and that when one fails this pushes the other under as well.

The second one does not even have to be that weak. The twentieth could be perfectly healthy and still fail in the panic (it's a full blown financial crisis at this point!)

2jasoncrawford
Yup, you can always have a domino-effect hypothesis of course (if it matches the timeline of events), rather than positing some general antecedent cause in common to all the failures.
jacopo154

It's not clear here, but if you read the linked post it's spelled out (the two are complementary really). The thesis is that it's easy to do a narrow AI that knows only about chess, but very hard to make an AGI that knows the world, can operate in a variety of situations, but only cares about chess in a consistent way.

I think this is correct at least with current AI paradigms, and it has both some reassuring and some depressing implications.

jacopo50

I always thought Hall's point about nanotech was trivially false. Nanotech research like he wanted it died out in the whole world, but he explains it by US-specific factors. Why didn't research continue elsewhere? Plus, other fields that got large funding in Europe or Japan are alive and thriving. How comes?

That doesn't mean that a government program which sets up bad incentives cannot be worse than useless. It can be quite damaging, but not kill a technologically promising research field worldwide for twenty years.

jacopo11

The point about incouraging safe over innovative research is on spot though. Although the main culprits are not granting agencies but tying researcher careers to the number of peer reviewed papers imo. The main problem with the granting system is the amount of time wasted in writing grant applications.

jacopo40

That was quite different though (spoiler alert)

A benevolent conspiracy to hide a dangerous scientific discovery by lying about the state of the art and denying resources to anyone whose research might uncover the lie. Ultimately failing because apparently unrelated advances made rediscovering the true result too easy.

I always saw it as a reply to the idea that physicists could have hidden the possibility of an atomic bomb for more than a few years.

jacopo*20

The example in the beginning is a perfect retelling of my interaction with transformers too :D

However, a word of caution: sometimes the efficient thing is actually to skim and move on. If you spend the effort to actually understand a topic which is difficult but limited in scope, but then you don't interact with it for a year or two, what you remember is just the high-level verbal summary (the same as if you stopped at the first step). For example, I have understood and forgotten MOSFET transistors at least three times in my life, and each time it was more or less the same effort. If I had to explain them now, I would retreat to a single shallow-level sentence.

jacopo10

They commented without reading the post I guess...

jacopo35

I think having an opinion on this requires much more technical knowledge than GPT4 or DALLE 3. I for one don't know what to expect. But I upvoted the post, because it's an interesting question.

jacopo10

I agree with you actually. My point is that in fact you are implicitly discounting EY pessimism - for example, he didn't release a timeline but often said "my timeline is way shorter than that" with respect to 30-years ones and I think 20-years ones as well. The way I read him, he thinks we personally are going to die from AGI, and our grandkids will never be born, with 90+% probability, and that the only chances to avoid it is that are either someone having a plan already three years ago which has been implemented in secret and will come to fruition next ... (read more)

1Shoshannah Tekofsky
I think we're reflecting on the material at different depths. I can't say I'm far enough along to assess who might be right about our prospects. My point was simply that telling someone with my type of brain "it's hopeless, we're all going to die" actually has the effect of me dropping whatever I'm doing, and applying myself to finding a solution anyway. 
jacopo42

I like the idea! Just a minor issue with the premise:

"Either I’d find out he’s wrong, and there is no problem. Or he’s right, and I need to reevaluate my life priorities."

There is a wide range of opinions, and EY's has one of the most pessimistic ones. It may be the case that he's wrong on several points, and we are way less doomed than he thinks, but that the problem is still there and a big one as well. 

(In fact, if EY is correct we might as well ignore the problem, as we are doomed anyway. I know this is not what he thinks, but it's the consequence I would take from his predictions)

1Shoshannah Tekofsky
The premise was intended to contextualize my personal experience of the issue. I did not intend to make a case that everyone should weigh their priorities in the same manner. For my brain specifically, a "hopeless" scenario registers as a Call to Arms where you simply need to drop what else you're doing, and get to work. In this case, I calculated the age of my children on to all the timelines. I realized either my kids or my grandkids will die from AGI if Eliezer is in any way right. Even a 10% chance of that happening is too high for me, so I'll pivot to whatever work needs to get done to avoid that. Even if the chance of my work making a difference are very slim, there isn't anything else worth doing.
jacopo10

I think that you need to distinguish two different goals:

  • the very ambitious goal of eliminating any risk of misaligned AI doing any significant damage. If even possible, that would require an aligned AI with much stronger capabilities than the misaligned one (or many aligned AIs such that their combined capabilities are not easily matched)
  • the more limited goal to reduce extinction risk by AGI to a low enough level (say, comparable to asteroid risk or natural pathogen risk). This might manageble with the help of lesser AIs, depending on time to prepare
1Chris van Merwijk
I agree this is a good distinction.
jacopo20

Addendum: if you want to bring legislation more in line with voters' preferences issue by issue, avoiding the distortion from coalition building, Swiss-style referenda seem to work to an acceptable degree http://www.lesswrong.com/posts/x6hpkYyzMG6Bf8T3W/swiss-political-system-more-than-you-ever-wanted-to-know-i

jacopo10

The biggest obstacle to your idea is, I think, the executive. In parlamentary systems the government answers to the parliament, and needs MPs support to continue - indeed, the Israeli maneuvering that you cite is related to making the government collapse, not to political parties. So as a first thing, you need a presidential system. But even then, MPs would probably organize as for or against the president - I imagine that the president's role in drafting and proposing legislation would be even higher than in present day US, as the coordination of MPs via ... (read more)

2jacopo
Addendum: if you want to bring legislation more in line with voters' preferences issue by issue, avoiding the distortion from coalition building, Swiss-style referenda seem to work to an acceptable degree http://www.lesswrong.com/posts/x6hpkYyzMG6Bf8T3W/swiss-political-system-more-than-you-ever-wanted-to-know-i
jacopo30

You are correct (QM-based simulation of materials is what I do). The caveat is that exact simulations are so slow that they are impossible, that would not be the case with quantum computing I think. Fortunately, we have different levels of approximation for different purposes that work quite well. And you can use QM results to fit faster atomistic potentials.

jacopo20

Note that there could still be some priors on some functions being more probable, or some more complex case being plainly impossible to fit because there's no way to get there from the meta-model that is the trained NN.

jacopo40

I am left wondering if when GPT3 does few-shot arithmetics, it is actually fitting a linear model on the examples to predict the next token. I.e. the GPT3 weights do not "know" arithmetics, but they know how to fit, and that's why they need a few examples before they can tell you the answer to 25+17: they need to know what function of 25 and 17 to return.

It is not that crazy given my understanding of what a transformer does, which is in some sense returning a function of the most recent input which depends on earlier inputs. Or am I confusing them with a different NN design?

2jacopo
Note that there could still be some priors on some functions being more probable, or some more complex case being plainly impossible to fit because there's no way to get there from the meta-model that is the trained NN.
jacopo10

Ahh sorry! Going back to read it was pretty clear from the text. I was tricked by the figure where the embedding is presented first. Again, good job! :)

jacopo20

Cool work!

Can I ask a couple of questions about the DR+clustering approach? 

If I understand correctly, you do the clustering in a 2D space obtained with UMAP (ignore this if I am wrong). Are you sure you are not losing important information with such a low dimension? I say this because you show that one dimension is strongly correlated with style (academic vs forum/blog) and the second may be somewhat correlated with time. I remember that an argument exists for using n-1 dimensions when looking for n clusters, although that was probably using linear D... (read more)

3Jan
Thank you for the comment and the questions! :) This is not clear from how we wrote the paper but we actually do the clustering in the full 768-dimensional space! If you look closely as the clustering plot you can see that the clusters are slightly overlapping - that would be impossible with k-means in 2D, since in that setting membership is determined by distance from the 2D centroid.
jacopo10

I agree. In fact, you could say that Mélenchon and le Pen are closer to each other on economic and possibly foreign policy, and very far from Macron. So not unreasonable that some votes would transfer from one to the other. Huge differences on everything else of course (immigration, but also law and order, education, culture, ...) I disagree on Hollande and generally center-left. Hollande had to juggle a very broad coalition as you say. He ended up hated by everyone because his way to handle it was not finding a middle ground, but campaigning as Mélenchon ... (read more)

1Dirichlet-to-Neumann
I think the historical socialist party (PS) is in many ways closer to Mélenchon than to Macron. Don't forget there were actual communists in Mitterrand's coalition ! I agree on Hollande's hesitations, but it was that Mélenchon's lite campaign that brought him to power - and his Macron lite policy was the catalyst to the PS demise as their electors switched en masse to Mélenchon.
Answer by jacopo40

I think if you look up antifragile investment you find a lot of discussion of exactly this problem. As far as I understand, the idea is that most investments have limited downsides (at most, you lose what you put in) but may have limitless upsides in low-probability scenarios. Then you can make many small investments of this kind, so that when ones pays off, it's more than enough to pay you back from the loss of the rest. Taking your example of the nuclear bunker, if you could build one with 1% of your wealth or less, in this frame of mind probably you sho... (read more)

jacopo10

Interesting post! I like the picture you draw. But you should consider the possibility that it was not a Rome-unique factor, but the intersection of multiple things of which each one was true for multiple ancient states, but all of them only for Rome. In particular I have the impression that the subjects of the Persian empire were pretty happy with it and flourishing under its rule. To be clear, it was nothing like citizenship, because Persia was a kingdom and not a republican city-state. But between the investment model and the pillaging model that you ... (read more)

jacopo20

I like to think it in this way: the determinant is the product of the eigenvalues of a matrix, which you can conveniently compute without reducing the matrix to diagonal form. All interesting properties of the determinant are very easy (and often trivial!) to show for the product of the eigenvalues.

More in the spirit of your post, I don't remember how hard it is to show that the determinant is invariant under unitary transformation, but not too hard I think. It's not the only invariant of course (the trace is as well, I don't remember if there are others). But you could definitely start from the product of eigenvalues idea and make it invariant to get the formula for det.

3dsj
det(AB) = det(A)det(B), so the determinant is invariant to any change of basis, not merely unitary ones: det(ABA−1) =det(A)det(B)det(A−1) =det(AA−1)det(B) =det(B)
jacopo10

Interesting read, but I don't think the initial example and the following are very much connected. The shift of opinion about ww2 has presumably happened without fabricated evidence or misinformation about factual events. USSR and USA played a very different role in the defeat of Germany, so asking "which contributed the most" is sensitive to shifting narratives and highlighting of different events. Similar questions from more distant past: who was to blame for ww1? Was Napoleon spreading modernity and equality in Europe, or ruthlessly subjugating neighbor... (read more)

1Malmesbury
I meant the initial example as a justification for investigating the past in the first place, as a reminder that you don't need to be a full-on conspiracy theorist to be suspicious of the historical record. When you say "shifting some facts forward", I would also count that as the victors altering history. Had the US collapsed instead of the USSR, I suppose the facts that would be shifted forward wouldn't be the same.
jacopo20

Or more generally, X sends a costly signal of his belief in P. If X is the state (as in example 2) a bet is probably impractical, but doing anything that would be costly if X is false should work. But for this, it makes a big difference in what sense Y does not trust X. If Y thinks X may deceive, costly signals are good. If Y thinks X is stupid or irrational or similar, showing belief in P is useless.

jacopo80

I mostly agree with the other commenters that the story does not show the qualitative changes we may expect to see from autonomous weapons. But I found it a very good short story nevertheless, and believable as well. I think it could serve well if broadly diffused, by getting someone to think about the topic for the first time before going into scenarios farther away from what they are used to.

2[anonymous]
Agreed. The story is very well written in terms of literary quality.
jacopo20

I notice that while a lot of the answer is formal and well-grounded, "stories have the minimum level of internal complexity to explain the complex phenomena we experience" is itself a story :) Personally, I would say that any gear-level model will have gaps in the understanding, and trying to fill these gaps will require extra modeling which also has gaps, and so on forever. My guess is that part of our brain will constantly try to find the answers and fill the holes, like a small child asking "why x? ...and why y?". So if a more practical part of us wants to stop investigating, it plugs the holes with fuzzy stories which sound like understanding. Obviously, this is also a story, so discount it accordingly...

1Jon Garcia
Yep. That's just how humans think about it: complex phenomena require complex explanations. "Emergence," as complexity arising from the many simple interactions of many simple components, I think is a pretty recent concept for humanity. People still think intelligent design makes more intuitive sense than evolution, for instance, even though the latter makes astronomically fewer assumptions and should be favored a priori by Occam's Razor.
jacopo40

I agree it would be very good, and possibly an economic no-brainer. My point is just that what is discussed in the post works for a political no-brainer, by which I mean something that no one would bother to oppose. To get what you want you need a real political campaign, or a large scale economic education campaign. Even then it's difficult, imo, unless your proposals fit one of the cases I mention above.

That said, of you are thinking of the US there is an easy proposal to be done for medicine, which is making medical school equivalent to a college degree... (read more)

2CronoDAS
Right now the bottleneck for becoming able to legally practice medicine as a doctor in the US is the number or residency positions for training medical school graduates, not the number of people graduating from medical schools.
jacopo80

The problem is, licensed people have made an investment and expect to repay it by reaping profits from the protected market. Some have borrowed money to get in and may have to file for personal bankruptcy. So they will oppose the reform by any means at their disposal, for which I don't blame them (even if it is obviously against the general interest).

Such a reform would be doable in the following cases (1) it compensates the losers in some way (2) it's so gradual that current licensed will mostly retire before it's fully implemented (3) it is decided by a ... (read more)

1nomiddlename
Perhaps a targeted campaign for reform in the area of highest impact. Medicine comes to mind but that also seems like the scariest area to mess with. I also forgot to mention that these reforms would dramatically lower the cost of education as people could choose to skip formal rigid degrees entirely.
jacopo40

On Prussia:

  • they managed to have almost the same GNP as France while keeping larger military spending, it's not surprising that they won the war
  • of course, it may be surprising that they managed to get there. Given the model, you would expect that they sacrificed internal stability, but in fact it was France that was the most unstable country in that period! (Revolution, Napoleon, restoration, second Republic, second empire)
  • you could say the political instability may have really hindered France, forcing higher consumption spending, but how comes this was
... (read more)
1dmanningcoe
These are great points, thank you for pointing them out. I think I agree with your overall take - the analysis is not finished with Kennedy's framework, rather it's a good place to start. We can then go into more detail on each trade-off - analyzing why Japan gets to an investment rate of 45% whereas the Soviet Union only to 30%, say.  On your specific points: 1. Good point - although I think this can only be taken so far. The Entente powers spent less on the military but had slightly higher overall economic output, and that's why they had an advantage. Certainly its unusual that Germany would defeat France so decisively.   2. I agree! Which I think is one of the reasons why Prussia sits uneasily in the framework. I think its worth noting that whereas France had domestic issues - Prussia had significantly worse external problems. These caused far more destruction than France's convulsions. The "Miracle of the house of brandenburg" and the fact that the non-Russian Napoleonic wars were fought mostly in Germany come to mind. 3. Why do you say it wasn't true pre-Napoleon?   4. I think I have to disagree here. Prussia was in a century and a half contest with Austria the moment it seized the Silesian coal fields. I agree that quick and decisive victories did the trick, but the difficulty is its  not clear how Prussia managed to win decisive victories whilst not falling behind economically.
jacopo10

On effectiveness and public health studies: the thread quoted says multiple times "in the US". I would be curious to know if this kind of things are done more elsewhere or it's an implicit assumption that it could be done only in the US anyway (which could very well be true for what I know, drug profits are way higher in the US after all).

Does anybody know?

jacopo40

My feeling is that many of the people which did not benefit tend to "generalise from one example" and assume that's true for most kids. Actually, I (despite being generally pro-schooling) would say something stronger than you: there is a minority of people who are actually harmed by school compared to a reasonable counterfactual (e.g. home-schooling for some). Plus, many kids can see easily where the system is failing them, less easily where it's working.

jacopo40

Thanks for the review!

Regarding the "countering racism" doubts, I can see how the results should disprove at least some racist worldviews. 

I think that an interpretation of human history among racists is the following: the population splits in to clusters, these clusters diverge in different "races", eventually one emerges as "the best" and out-competes or replaces all others, before splitting again. Historically, this view was used to justify aggressive expansionism, opposition to intermarriage, and opposition to any policy that could slow this proce... (read more)

Answer by jacopo80

According to my understanding (which comes from popularized sources, not I am not a doctor nor a biologist) antibody counts are not the main drivers of long-term immunity. Lasting immunity is given by memory T and B cells, which are able to quickly escalate the immune response in case of new infection, including producing new antibodies. So while high antibody count means you're well protected, a low count some months after the vaccine could mean that the protection has reduced, but in almost all cases you will be protected for a much longer time. Note tha... (read more)

Load More