All of Caridorc Tergilti's Comments + Replies

Recipe: Hessian eigenvector computation for PyTorch models

You should train both a feedforward network and a CNN on image classification on imagenet, to see if we see that the hessian of the CNN is more similar to the identity after training rather than the feedforward because of the image understanding priors.

Given that this method returns a numeric matrix, then it must be an Hessian evaluated at a point or the average Hessian of many points. Is the result the Hessian averaged over all training data? Is this average useful rather than just cancelling out high and low Hessian values

1Nina Panickssery2y

The method described does not explicitly compute the full Hessian matrix. Instead, it derives the top eigenvalues and eigenvectors of the Hessian. The implementation accumulates a large batch from a dataloader by concatenating n_batches of the typical batch size. This is an approximation to estimate the genuine loss/gradient on the complete dataset more closely. If you have a large and high-variance dataset, averaging gradients over multiple batches might be better. This is because the loss calculated from a single, accumulated batch may not be adequately representative of the entire dataset's true loss.

Neuronpedia

You can use the "mp-net2" model from sentence transformers for zero-shot classification (scalar product between the text and the embeddings of "sex" and "violence") decide a cut-off and you are done.

2Johnny Lin2y

Thank you! i will put this on the TODO.

Neuronpedia

Thanks for the quick response, have you tried fine-tuning the new llama2 models on the data gathered so far to see if there is any interesting results? QLORA is pretty efficient for this.

Neuronpedia

Steering GPT-2-XL by adding an activation vector

It looks pretty cool! Adding a Google sign-in option would greatly broaden the reach of the game as most non-technical people do not have a Github account.

3Johnny Lin2y

Thank you - yes, this is on the TODO along with Apple Sign-In. It will likely not be difficult, but it's in beta experiment phase right now for feedback and fixes - we aren't yet ready for the scale of the general public.

Caridorc Tergilti2y113

We could not find a "speak in French" vector after about an hour of effort, but it's possible we missed something straightforward

Did you try 10 or 20 simple French phrases with a positive sign and their translations with a negative sign?

Also try 1000 english words and 1000 french translations in case scale is the problem.

Also try:

"The following text is in English: ' "

"The following text is in French: ' "

with the second phrase written itself in French.

5faul_sname2y

I just tried that, and it kinda worked. Specifically, it worked to get gpt2-small to output text that structurally looks like French, but not to coherently speak French. Although I then just tried feeding the base gpt2-small a passage in French, and its completions there were also incoherent, so I think it's just that that version hasn't seen enough French to speak it very well.

Very interesting, could you elaborate or give some links?

2romeostevensit2y

I wrote a short post on my favorite technique, but lots of therapy modalities talk about similar ideas. http://neuroticgradientdescent.blogspot.com/2019/07/core-transformation.html

Caridorc Tergilti2y40

In my opinion Wearable health is highly neglected because older people are less tech savy than young people, so they use it less than younger people, but they would also benefit much more from the technology. If a 20 year old wears a smart watch that measures and records heart-rate it is almost only for fun, if a 60 year old does it, it could prevent and inform about important issues, but the 20 year old is much more likely to actually use it than the 60 year old.

5DirectedEvolution2y

I think that's a pretty good point, and it tracks with Steven Byrnes' insight about bedwetting alarms. Zoom costs no money, but the most it saves is time and annoyance. It might save lives occasionally, but there's potential for wearable health to save lots of lives and prevent many disabilities. The cost-benefit ratio might be better than zoom's, and yet people may neglect it excessively because it's socially weird to do things like monitor your heartrate - or, when it's available, your blood glucose - routinely using consumer electronics. As with the bedwetting alarm, we have this idea that we should only be using "interventions" like these when there's already a clear problem, rather than as a way to prevent a problem or hasten a solution, and that seems to stem from social norms ("is this really such an emergency?") rather than a rational judgment about costs and benefits. That said, one of my criteria was "Had an immediate payoff," and I think that neither the bedwetting alarm nor wearable healthy typically do have an immediate payoff (unless you were replacing an existing invasive glucose monitor with an Apple Watch noninvasive monitor, once that tech becomes available). With zoom, all people were missing was the suggestion "why don't we have this meeting on zoom" and the perception that "if we do, it will be seen as normal by all participants." With wearable health, you have the added component of "I'm not even sure all this fuss and self-monitoring will even pay off in the long run in terms of better health outcomes, but I have to pay the money and attention costs right now." The delayed and uncertain cost-benefit analysis in individual cases is the reason that wearable health doesn't meet my higher "stringency bar" for being comparable to zoom, even though I agree with you that there are probably a lot of users who'd benefit from it and who are neglecting it primarily for the reason that it's not normalized.

Caridorc Tergilti2y30

I also asked ChatGPT, here are the six best ideas that it had (excluding electric bikes, as it was already my idea ;P) (cherry picked by me over 21):

Online education: Online education platforms like Coursera and Khan Academy were mature, widely available, intuitive, and cost no money for basic usage. They also had no regulatory barriers or moral issues and could be used by mutual agreement among one or a few people. Online education also saved a lot of time and played relatively well with the existing format of learning and education.
Digital wallets:

... (read more)

6DirectedEvolution2y

This is a nice use case for ChatGPT! In most of these cases, I think that where they don't quite meet my criteria is in terms of the cost-benefit issue or the neglectedness part. Online education is pretty widely used by individuals, exactly as we would hope. It's neglected as a way to signal educational attainment, but that's a problem that can only be handled at the level of corporate or university governance by recognizing Coursera certificates on CVs or building a university around online offerings. Digital wallets seem to have taken off pretty much in step with awareness and size of the user base. Wearable health and home energy management systems don't seem neglected and they face cost-benefit questions. Collaborative writing and editing are already widely used, as are online language learning platforms. I'll throw in another $1 for creative brainstorming for a total of $4 awarded and $6 to go, but I want to save the rest for ideas more stringently meeting my criteria if any can be found.

Caridorc Tergilti2y1812

Electric bikes are vastly under-utilized even in European cities where they are safe and effective to use:

Mature: bike more than 100 years old, electric motors and batteries also mature.
Cost no money: saves a ton on money over a car
Was widely available and fairly intuitive for the average person to use: everyone can bike
Had no regulatory barriers or moral issues: clearly not illegal nor immoral to ride a bike.
Saved a lot of money and time: saves also time because there is no need for separate exercise.
Had an immediate payoff: you gain from day 1
Played rela

... (read more)

1Noah L.2y

I believe this in some domains applies to micro-mobility in general.

DirectedEvolution2y170

I think electric bikes are a pretty good candidate! I own one and it was transformative for biking around Seattle.

The ability to trivially climb a hill to get a block away from the main arterial and bike on a little-trafficked road was a huge safety enhancement
It deals with even Seattle's huge hills with ease
You can go 20 mph, which is often faster than cars, especially during rush hour
They are no riskier than a regular bike, and given my point about getting off busy roads, they can even be safer if used well

On reflection, I think my reason for thinking th... (read more)

Some 2-4-6 problems

What Discovering Latent Knowledge Did and Did Not Find

I tried both and neither works

1Viktor Rehnberg2y

Hmm, yeah it's a bit hard to try stuff when there's no good preview. Usually I'd recommend rot13 chiffer if all else fails but for number sequences that makes less sense.

Some 2-4-6 problems

Caridorc Tergilti2y*60

Here is my playthrough with my though process:

:::spoiler

>!0) [2, 4, 6] is VALID

>!Now I think, let's check if the rule is *2
>!1) [31, 62, 93] is VALID

>!Let's check if the rule is always true with 3 random numbers.
>!2) [6534525, 142536, 456342532] is NOT VALID

>!I wanted to check multiply by 3, but I repeated multiply by 2
>!3) [5, 10, 15] is VALID

>!Checking multiply by 3
>!4) [7, 21, 63] is VALID

>!Checking multiply by 10
>!5) [50, 500, 5000] is VALID

>!Now I am thinking: maybe any multiplication is ok? I cannot try them all, l... (read more)

1Viktor Rehnberg2y

See FAQ for spoiler tags, it seems mods haven't seen your request. https://www.lesswrong.com/faq#How_do_I_insert_spoiler_protections_

Caridorc Tergilti2y30

CCS does not find the single linear probe with high accuracy: there are more than 20 orthogonal linear probes (i.e. using completely different information) that have similar accuracies as the linear probe found by CCS (for most datasets);

So what about an ensamble of the top 20 linear probes? Is it substantially better than using just the best one alone? I would expect so given that they are orthogonal, so they are using ~uncorrelated information.

3Fabien Roger2y

I think that a (linear) ensemble of linear probes (trained with Logistic Regression) should never be better than a single linear probe (otherwise the optimizer would have just found this combined linear probe instead). Therefore, I don't expect that ensembling 20 linear CCS probe will increase performance much (and especially not beyond the performance of supervised linear regression). Feel free to run the experiment if you're interested about it!

Beginning to feel like a conspiracy theorist

Caridorc Tergilti2y1410

The most important thing is approaching other points of view with an open mind, with epistemic humility , that is, knowing that something of what you think can be wrong, even if, from the inside, everything feels right.

On the object level:

Carbohydrates can be: fruit/whole grains/normal bread/normal pasta or North American crazy industrial snacks and "sugar-cereals" and "sugar-bread". The first one is good, the second one is bad.
No idea about optimal salt levels, just a note on your language: "potentially deadly" is too vague to have a useful discussi

Language Models are Few-Shot Learners

How I Learn From Textbooks

Applied Linear Algebra Lecture Series

"""
That's when I discovered more effective ways to approach reading, including what I'll call "Guess-and-Check," the technique of scanning and making predictions. Instead of trying to read every word in a textbook, in Guess-and-Check you scan the material and make predictions about what you think the text is saying. This active reading process can help you better engage with the material and activate your prior knowledge. After making your prediction, be sure to confirm or correct it by checking it against the text.

"""
This is similar to the way GPT-3 was trained! Pretty cool that you also found it effective!

Applied Linear Algebra Lecture Series

Yes, the tone of my comment could be improved. I appreciate him for publishing his lessons to the community and wanted to give some suggestions to improve (eventual) future ones, if he feels like the higher quality is worth the higher effort, and with no obligation. "Al caval donato non si guarda in bocca" (You should not look at the teeth of a gift horse (to learn about its age))

Caridorc Tergilti2y40

Some suggestions:

Use a better marker, what you wrote on the whiteboard is almost unreadable for me.
Expanding on the previous point, write bigger and make better use of the space on the board.
If you have complex graphics, pre-make them accurately and print them out, put them on the whiteboard with weak tape. (What is the weird of bridge at the start?)
Use Whisper to make subtitles to help non-native speakers (as another commenter suggested).
Invest in a tripod to have the camera at a natural height instead of bottom to top.
I did not watch the lectures, this f

... (read more)

6TW1232y

I don't think these are necessarily bad suggestions if there were a future series. But my sense is that John did this for the people in the audience, somebody asked him to record it so he did, and now he's putting them online in case they're useful to anyone. It's very hard to make good production quality lectures, and it would have required more effort. But it sounds like John knew this and decided he would rather spend his time elsewhere, which is completely his choice to make. As written, these suggestions feel a bit pushy to me.

Why Aren't There More Schelling Holidays?

Caridorc Tergilti2y30

In Italy we have this, in the Ferragosto week (around the 15 of August) a huge percentage of people is on vacation. In general a lot of people take vacations in August and schools of all order and levels (including university) are closed the whole month.

In the simplest possible way to partecipate, yes, but a hackathon is made to elicit imaginative and novel ways to approach the problem (how? I do not know, it is the partecipants' job to find out).

Caridorc Tergilti2y40

Yes of course:

Models:

https://paperswithcode.com/task/image-captioning

Datasets:

Laion 400 millions or other sizes: https://laion.ai/blog/laion-400-open-dataset/
https://paperswithcode.com/dataset/coco-captions
Imagenet/any image classification dataset: just treat the labels as text, this should be used sparingly as otherwise the model will learn to just output single words.

Also in the performance metric, the sum of the performance of each layer should probably be weighted to give less importance to the initial layers, otherwise we encourage t... (read more)

Probably even if not completely by hand, MNIST is so simple that hybrid human-machine optimization could be possible, maybe with a UI where you can see the effect on validation loss in (almost) real time of changing a particular weight with a slider. I do not know if it would be possible to improve the final score by changing the weights one by one. Or maybe the human can use instinctual vision knowledge to improve the convolutional filters.

On Cifar this looks very hard to do manually given that the dataset is much harder than Mnist.

I think that a too larg... (read more)

The performance can be a weighted average of the final performance and how uniformly we go from totally random to correct. For example if we have 10 refinement models the optimal score in this category can be had when each refinement block reduces the distance from the initial text encoding random vector to the final one by 10% of the original distance each time. This should make sure that the process is in fact gradual, and not that for example, the last two layers do all the work and everything before is just the identity. Also maybe it should not be a linear scale but a logarithmic scale because the final refinements might be harder to do than the initial ones.

1Charbel-Raphaël2y

Okay, this kind of metric could maybe work. The metric could be sum of the performance of each layer + regularization function of the size of the text proportional to the indice of the layer. I'm not super familiar with those kinds of image to text models. Can you provide an example of a dataset or a GitHub model doing this ?

Cool GitHub repository, thanks for the link.

Answer by Caridorc TergiltiSep 05, 202210

Image to text model with successive refinements:

For example, given the image above, the "first" layer of the network outputs: "city", the second one outputs "city with vegetation", the third one "view of a city with vegetation from a balcony", the fourth one "view of a city with skyscrapers on the background and with vegetation from a balcony".

This could be done by starting with a blank description and repeating many times a "detailer" network that should add details to a description given an image.

This should help interpret-ability and t... (read more)

1Charbel-Raphaël2y

Thank you. This is a good project idea, but there is no clear metric of performance, so it's not a good hackathon idea.

Supposing Europe is headed for a serious energy crisis this winter, what can/should one do as an individual to prepare?

Answer by Caridorc TergiltiSep 04, 202220

A very simple task, like MNIST or CIFAR classification, but the final score is:

$S c o r e = P e r f o r m a n c e + λ * n u m b e r o f n o n z e r o w e i g h t s$

where " $λ$ " is a normalization factor that is chosen to make the tradeoff as interesting as possible. This should be correlated to AI safety as a small and/or very sparse model is much more interpretable and thus safer than a large/dense one. You can work on this for a very long, time, trying simple fully connected neural nets, CNNs, resnets, transformers, autoencoders of any kind and so on. If the task looks too easy you might... (read more)

1ThomasJ2y

Is this like "have the hackathon participants do manual neural architecture search and train with L1 loss"?

1Charbel-Raphaël2y

I'm still thinking about this idea. We could try to do the same thing but on Cifar10. I do not know if it would be possible to construct by hand the layers. On mnist, for a network (LeNet, 60k parameters)with 99 percents accuracy, the crossentropy is 0.05 If we take the formula: CE + lambda log nb non null params A good lambda is equal to 100. (Equalizing crossentropy and regularization) In the mnist minimal number of weights competition, we have 99 percents accuracy with 2000 weights. So lambda is equal to 80. Maybe If we want to stress the importance of sparsity, we can choose a lambda equal to 300.

1Charbel-Raphaël2y

I like the idea. But this already exist: https://github.com/ruslangrimov/mnist-minimal-model

Caridorc Tergilti2y60

Even lower tech: buy very warm and comfy sweaters to wear at home and while sleeping, it could save you a ton of money considering the astronomical power bills that might arrive this winter. Cashmere is really good but expensive.

Conjuring An Evolution To Serve You

Extremely cool evolution experiment where E. coli bacteria evolve to eat citrate along with many other interesting happenings.

Yes, I meant plummeting "within reason" (like x10) not plummeting to extremely low values that, as you correctly said, are not possible given the energy cost.

I am not really sure about that. There is not only a huge money cost but also a huge energy cost when sending something into orbit, would the panels even make back the fuel spent to send them? Even if the rocket hardware is reused 100% with no serious maintenance costs (reusing costs more fuel) would the panel even make back that fuel energy alone? I did not do the math but maybe not even that. If we could put them in orbit with a space elevator almost for free the tune would be way different though.

2JBlack3y

Oh yes, there is no question at all that they would make back the fuel energy cost. In money terms the fuel is a tiny fraction of launch costs (less than 1%). In fuel energy terms it costs about 400 MJ/kg to get payload into orbit via Falcon-9. With fairly standard terrestrial designs you can get about 5 W/kg rated power (mass including support electronics), which in space would be available nearly continuously. That gives a energy payback time of about 2.5 years. With solar power designs more suited to space use, I would be very surprised if that couldn't be reduced to weeks.

Very good point: I think the website I linked to refers to peak power, so the Kilowatthours would be lower. (not sure on this, sorry)
If the panels on orbit last double the time and produce double the energy that is only a factor of 4, while the system is about 300 times more expensive. (but again you have transmission losses that I did not consider)

Answer by Caridorc TergiltiMay 14, 202210

SpaceX's Falcon 9 now advertises a cost of $62 million to launch 22,800 kg to LEO, $2,720/kg. https://ttu-ir.tdl.org/bitstream/handle/2346/74082/ICES_2018_81.pdf

Given an average solar silicon price of around $9 US per kilogram in 2020 https://www.solarquotes.com.au/blog/solar-silicon-price-hike/#:~:text=Compared%20to%20the%20average%20solar,%2434%20Australian%20dollars%20per%20panel.

This would increase costs 2720 / 9 = 302 times.

The cost of a solar electric system is measured in dollars per watt. The average cost for a residential system is cur... (read more)

1mikbp2y

Just to add, this thread of Phil Metzger argues otherwise: https://twitter.com/DrPhiltill/status/1583106346538311680

2JBlack3y

This is probably the worst-case comparison for space solar, since it assumes you're just going to pack a bunch of terrestrial systems onto a rocket and shoot them into space, where they will (just like terrestrial systems) only work at a fraction of capacity due to clouds, bad sun angles, getting dirty, and night-time. In practice they would provide a lot more power per unit mass by at least one order of magnitude and possibly two. Mirrors in space can be relatively flimsy thin things and still work since they don't need to withstand winds and other loads, giving relatively lightweight concentrated solar power options at much lower masses than terrestrial systems. The conclusion is the same though: space launched solar is still not worth it for us now. It could be in the future or with some alternative history.

1[anonymous]3y

no, it will never work, even if the cost of sending a kg to orbit plummets. A solar electric system on earth doesn't make 1 watt all the time. Obviously there is night, and there is geographic differences. A quick and dirty approximation is here: https://unboundsolar.com/solar-information/sun-hours-us-map . The idea of "sun hours". Let's take the median "sun hours" of 4. this means just 1/6 of the time do you get a rated solar panel's full output. Negating the microwave transmission system's cost and other costs, if the cost of sending a kg to orbit is less than 5/6 the cost of a panel on earth plus storage, it could work. Not "never". I concede it's unlikely, sending a kilogram to orbit has immense energy costs and so even advanced technology will hit a limit on how cheap it can be. Space based solar probably would only make sense if you had a society so in need of energy that you had exhausted your options on earth already, with entire continents covered in panels, and you still needed more energy. You also have an issue that at that point you are importing more heat to earth than it can radiate to space under normal climate conditions, so it probably wouldn't be a good idea to do this..

2casualphysicsenjoyer3y

Sorry, I might be missing something here but * Isn't price of energy typically measured in kW hours. Energy = Power x Time. * If a space solar system can output more energy since it stays on for longer, wouldn't this mean that the cost per watt hour would naturally decrease? This would be because the price of a watt hour I imagine would be Energy / price. So, if our launch cost is a fixed cost, then we would find that E / price decreases.

DALL·E 2 by OpenAI

Book Review: Spark! How exercise will improve the performance of your brain

Thanks, added the comment in the correct place now.

DALL·E 2 by OpenAI

Caridorc Tergilti3y40

Let's have fun with recursion!

A checkerboard where each square is itself a checkerboard.

A cube with mirrors on both sides, the mirrors show multiple reflections of the cube.

A person wearing a shirt with an image of that person wearing that shirt.

DALL·E 2 by OpenAI

Caridorc Tergilti3y00

Let's have fun with recursion!

A checkerboard where each square is itself a checkerboard.

A cube with mirrors on both sides, the mirrors show multiple reflections of the cube.

A person wearing a shirt with an image of that person wearing that shirt.

3P.3y

You replied to the wrong comment.

Omicron Post #4

Caridorc Tergilti3y20

Immune erosion is used to make people understand that immune escape is only partial and not total

Caridorc Tergilti3y*10

Thanks for your feedback, in fact correlation is not causation and we must be very careful about self-selection effects. This is not a self-selection effect but still a correlation/causation enigma that I found interesting in recent times: high vitamin D levels were found to be heavily anticorrelated with severe COVID in observational studies, but people of old age are both sensible to severe COVID and have lower vitamin D than average, not only that but people with a healthy lifestyle of many outdoor walks also have higher vitamin D! Is this causation, co... (read more)

Book Review Review (end of the bounty program)

Caridorc Tergilti3y110

Hi, I wrote the Book-Review on Spark as my first post on LessWrong. Sadly I received no comments in response to it and I would love some feedback after spending so much time writing it. I am open to any kind of feedback about it. I really enjoyed this bounty program and I will probably partecipate also in the future ones.

8ryan_b3y

It is on my list of the reviews to read, so never fear! Feedback will be available.

Jitters No Evidence of Stupidity in RL

Caridorc Tergilti3y20

It might discourage exploration and lead to more stasis in local optimums.

Jitters No Evidence of Stupidity in RL

Caridorc Tergilti3y20

Half as long right?

1wangscarpet3y

Thanks, fixed.

Exercise Trade Offs

Caridorc Tergilti3y-40

Given the extreme infectiousness of the Delta strain it is reasonable to assume that everyone will come into contact with it (either with or without illness if the vaccine works properly). If so, if you are already 14+ days from double vaccination, the timing of coming into contact with it now or in three months time is almost irrelevant (if there is space in the hospitals, otherwise it is in fact reasonable to exercise outside a month or two).

-1MichaelStJules3y

I don't think it's inevitable that everyone will come into contact with COVID or definitely catch COVID (which becomes more likely the more often you come into contact with it). You can still manage your exposure. Furthermore, you can catch COVID multiple times.

Could you have stopped Chernobyl?

Points 3 and 4 can only sound excessive

Where is point 4?

2Carlos Ramirez3y

As Ikaxas said. It's now fixed.

2Vaughn Papenhausen3y

Ah, I think the fact that there's an image after the first point is causing the numbered list to be numbered 1,1,2,3.

How to Sleep Better

Yes of course

How to Sleep Better

Covid 5/20: The Great Unmasking

Perception of temperature is mostly cultural yes, what do you think about putting sound absorbing panels on the walls of the room?

2gilch4y

Seems like it would help if noise is a problem. More expensive than earplugs, and I'm not sure how durable they are. There may be some safety issues, but it seems less bad than the earplugs.

How to Sleep Better

Caridorc Tergilti4y*00

Fascinating article, two things:

On average people prefer a room temperature of 18-22 degrees Celcius.

Are you sure? That looks really cold. My thermodynamics book said 26 C is the best temperature for confort for humans.

2. What about install sound absorbing panels on the walls for sound insulation? Sounds more confortable then earplugs and work both ways, so you can be noisy without giving trouble to the neightbours

2ChristianKl4y

What's warm depends a lot on the clothing. I think there's a good chance that the thermodynamics book speaks about 26 C for naked humans.

2Raemon4y

I happen to be looking into sound insulation right now. My understanding is that panels on the walls mostly improves the quality of sound within the room rather than actually prevent sound from getting in or out (which is something that requires good walls, and potentially plugging individual leaks such as through a door. I haven't yet gotten a clear sense of whether there's a way to increase room noise isolation without major architectural overhauls.

2gilch4y

In my area, typical room temperature is 20-22 C, so 26 seems a bit high, but this might be cultural. Humans are a tropical species and couldn't survive winter in the temperate zones without clothing, shelter, and fire. I'd call 26 C "warm", but it seems well within my long-term tolerable range. But there are significant temperature differences between day and night. I think you can go as low as 15 C before I'd call it "cold". I'd be OK without a shirt and blanket at night at 15 C. Just a bed sheet. I sleep on memory foam though.

Caridorc Tergilti4y50

Is this comment AI generated?

Gauging the conscious experience of LessWrong

Incredibly fascinating, I am opposite, only internal verbal monologue, no images at all. I can build step by step an image from simple lines and circles in my mind, but it is a conscious effort, (i.e. I am not really seeing it, if you get what I mean, as soon as I focus on a detail the rest vanishes). Basing on your anecdotal evidence I could maybe learn to imagine images that I am not really seeing with my eyes in that moment but it feels like the opposite of what my mind "is built" to do.

Psyched out