All of Caridorc Tergilti's Comments + Replies

Did you use ghost gradients? (gradients that tend to reactivate features that are at zero)

2Robert_AIZI
Nope. I think they wouldn't make much difference - at the sparsity loss coefficient I was using, I had ~0% dead neurons (and iirc the ghost gradients only kick in if you've been dead for a while). However, it is on the list of things to try to see if it changes the results.

You should train both a feedforward network and a CNN on image classification on imagenet, to see if we see that the hessian of the CNN is more similar to the identity after training rather than the feedforward because of the image understanding priors.

Given that this method returns a numeric matrix, then it must be an Hessian evaluated at a point or the average Hessian of many points. Is the result the Hessian averaged over all training data? Is this average useful rather than just cancelling out high and low Hessian values

1Nina Panickssery
The method described does not explicitly compute the full Hessian matrix. Instead, it derives the top eigenvalues and eigenvectors of the Hessian. The implementation accumulates a large batch from a dataloader by concatenating n_batches of the typical batch size. This is an approximation to estimate the genuine loss/gradient on the complete dataset more closely. If you have a large and high-variance dataset, averaging gradients over multiple batches might be better. This is because the loss calculated from a single, accumulated batch may not be adequately representative of the entire dataset's true loss.

You can use the "mp-net2" model from sentence transformers for zero-shot classification (scalar product between the text and the embeddings of "sex" and "violence") decide a cut-off and you are done.

2Johnny Lin
Thank you! i will put this on the TODO.

Thanks for the quick response, have you tried fine-tuning the new llama2 models on the data gathered so far to see if there is any interesting results? QLORA is pretty efficient for this.

It looks pretty cool! Adding a Google sign-in option would greatly broaden the reach of the game as most non-technical people do not have a Github account.

3Johnny Lin
Thank you - yes, this is on the TODO along with Apple Sign-In. It will likely not be difficult, but it's in beta experiment phase right now for feedback and fixes - we aren't yet ready for the scale of the general public.

We could not find a "speak in French" vector after about an hour of effort, but it's possible we missed something straightforward

Did you try 10 or 20 simple French phrases with a positive sign and their translations with a negative sign?

Also try 1000 english words and 1000 french translations in case scale is the problem.

Also try:

"The following text is in English: ' "

"The following text is in French: ' "

with the second phrase written itself in French.

5faul_sname
I just tried that, and it kinda worked. Specifically, it worked to get gpt2-small to output text that structurally looks like French, but not to coherently speak French. Although I then just tried feeding the base gpt2-small a passage in French, and its completions there were also incoherent, so I think it's just that that version hasn't seen enough French to speak it very well.

Very interesting, could you elaborate or give some links?

2romeostevensit
I wrote a short post on my favorite technique, but lots of therapy modalities talk about similar ideas. http://neuroticgradientdescent.blogspot.com/2019/07/core-transformation.html

In my opinion Wearable health is highly neglected because older people are less tech savy than young people, so they use it less than younger people, but they would also benefit much more from the technology. If a 20 year old wears a smart watch that measures and records heart-rate it is almost only for fun, if a 60 year old does it, it could prevent and inform about important issues, but the 20 year old is much more likely to actually use it than the 60 year old.

5DirectedEvolution
I think that's a pretty good point, and it tracks with Steven Byrnes' insight about bedwetting alarms. Zoom costs no money, but the most it saves is time and annoyance. It might save lives occasionally, but there's potential for wearable health to save lots of lives and prevent many disabilities. The cost-benefit ratio might be better than zoom's, and yet people may neglect it excessively because it's socially weird to do things like monitor your heartrate - or, when it's available, your blood glucose - routinely using consumer electronics. As with the bedwetting alarm, we have this idea that we should only be using "interventions" like these when there's already a clear problem, rather than as a way to prevent a problem or hasten a solution, and that seems to stem from social norms ("is this really such an emergency?") rather than a rational judgment about costs and benefits. That said, one of my criteria was "Had an immediate payoff," and I think that neither the bedwetting alarm nor wearable healthy typically do have an immediate payoff (unless you were replacing an existing invasive glucose monitor with an Apple Watch noninvasive monitor, once that tech becomes available). With zoom, all people were missing was the suggestion "why don't we have this meeting on zoom" and the perception that "if we do, it will be seen as normal by all participants." With wearable health, you have the added component of "I'm not even sure all this fuss and self-monitoring will even pay off in the long run in terms of better health outcomes, but I have to pay the money and attention costs right now." The delayed and uncertain cost-benefit analysis in individual cases is the reason that wearable health doesn't meet my higher "stringency bar" for being comparable to zoom, even though I agree with you that there are probably a lot of users who'd benefit from it and who are neglecting it primarily for the reason that it's not normalized.

I also asked ChatGPT, here are the six best ideas that it had (excluding electric bikes, as it was already my idea ;P) (cherry picked by me over 21):

 

Online education: Online education platforms like Coursera and Khan Academy were mature, widely available, intuitive, and cost no money for basic usage. They also had no regulatory barriers or moral issues and could be used by mutual agreement among one or a few people. Online education also saved a lot of time and played relatively well with the existing format of learning and education.

Digital wallets:

... (read more)
6DirectedEvolution
This is a nice use case for ChatGPT! In most of these cases, I think that where they don't quite meet my criteria is in terms of the cost-benefit issue or the neglectedness part. Online education is pretty widely used by individuals, exactly as we would hope. It's neglected as a way to signal educational attainment, but that's a problem that can only be handled at the level of corporate or university governance by recognizing Coursera certificates on CVs or building a university around online offerings. Digital wallets seem to have taken off pretty much in step with awareness and size of the user base. Wearable health and home energy management systems don't seem neglected and they face cost-benefit questions. Collaborative writing and editing are already widely used, as are online language learning platforms. I'll throw in another $1 for creative brainstorming for a total of $4 awarded and $6 to go, but I want to save the rest for ideas more stringently meeting my criteria if any can be found.

Electric bikes are vastly under-utilized even in European cities where they are safe and effective to use:

  • Mature: bike more than 100 years old, electric motors and batteries also mature.
  • Cost no money: saves a ton on money over a car
  • Was widely available and fairly intuitive for the average person to use: everyone can bike
  • Had no regulatory barriers or moral issues: clearly not illegal nor immoral to ride a bike.
  • Saved a lot of money and time: saves also time because there is no need for separate exercise.
  • Had an immediate payoff: you gain from day 1
  • Played rela
... (read more)
1Noah L.
I believe this in some domains applies to micro-mobility in general.  

I think electric bikes are a pretty good candidate! I own one and it was transformative for biking around Seattle.

  • The ability to trivially climb a hill to get a block away from the main arterial and bike on a little-trafficked road was a huge safety enhancement
  • It deals with even Seattle's huge hills with ease
  • You can go 20 mph, which is often faster than cars, especially during rush hour
  • They are no riskier than a regular bike, and given my point about getting off busy roads, they can even be safer if used well

On reflection, I think my reason for thinking th... (read more)

I tried both and neither works

1Viktor Rehnberg
Hmm, yeah it's a bit hard to try stuff when there's no good preview. Usually I'd recommend rot13 chiffer if all else fails but for number sequences that makes less sense.

Here is my playthrough with my though process:

:::spoiler

>!0) [2, 4, 6] is VALID

>!Now I think, let's check if the rule is *2
>!1) [31, 62, 93] is VALID

>!Let's check if the rule is always true with 3 random numbers.
>!2) [6534525, 142536, 456342532] is NOT VALID

>!I wanted to check multiply by 3, but I repeated multiply by 2
>!3) [5, 10, 15] is VALID

>!Checking multiply by 3
>!4) [7, 21, 63] is VALID

>!Checking multiply by 10
>!5) [50, 500, 5000] is VALID

>!Now I am thinking: maybe any multiplication is ok? I cannot try them all, l... (read more)

1Viktor Rehnberg
See FAQ for spoiler tags, it seems mods haven't seen your request. https://www.lesswrong.com/faq#How_do_I_insert_spoiler_protections_
  1. CCS does not find the single linear probe with high accuracy: there are more than 20 orthogonal linear probes (i.e. using completely different information) that have similar accuracies as the linear probe found by CCS (for most datasets);


So what about an ensamble of the top 20 linear probes? Is it substantially better than using just the best one alone? I would expect so given that they are orthogonal, so they are using ~uncorrelated information.

3Fabien Roger
I think that a (linear) ensemble of linear probes (trained with Logistic Regression) should never be better than a single linear probe (otherwise the optimizer would have just found this combined linear probe instead). Therefore, I don't expect that ensembling 20 linear CCS probe will increase performance much (and especially not beyond the performance of supervised linear regression). Feel free to run the experiment if you're interested about it!

The most important thing is approaching other points of view with an open mind, with epistemic humility , that is, knowing that something of what you think can be wrong, even if, from the inside, everything feels right.

 

On the object level:

  • Carbohydrates can be: fruit/whole grains/normal bread/normal pasta or North American crazy industrial snacks and "sugar-cereals" and "sugar-bread". The first one is good, the second one is bad.
  • No idea about optimal salt levels, just a note on your language: "potentially deadly" is too vague to have a useful discussi
... (read more)

"""
That's when I discovered more effective ways to approach reading, including what I'll call "Guess-and-Check," the technique of scanning and making predictions. Instead of trying to read every word in a textbook, in Guess-and-Check you scan the material and make predictions about what you think the text is saying. This active reading process can help you better engage with the material and activate your prior knowledge. After making your prediction, be sure to confirm or correct it by checking it against the text.

"""
This is similar to the way GPT-3 was trained! Pretty cool that you also found it effective!

Yes, the tone of my comment could be improved. I appreciate him for publishing his lessons to the community and wanted to give some suggestions to improve (eventual) future ones, if he feels like the higher quality is worth the higher effort, and with no obligation. "Al caval donato non si guarda in bocca" (You should not look at the teeth of a gift horse (to learn about its age))

Some suggestions:

  • Use a better marker, what you wrote on the whiteboard is almost unreadable for me.
  • Expanding on the previous point, write bigger and make better use of the space on the board.
  • If you have complex graphics, pre-make them accurately and print them out, put them on the whiteboard with weak tape. (What is the weird of bridge at the start?)
  • Use Whisper to make subtitles to help non-native speakers (as another commenter suggested).
  • Invest in a tripod to have the camera at a natural height instead of bottom to top.
  • I did not watch the lectures, this f
... (read more)
6TW123
I don't think these are necessarily bad suggestions if there were a future series. But my sense is that John did this for the people in the audience, somebody asked him to record it so he did, and now he's putting them online in case they're useful to anyone. It's very hard to make good production quality lectures, and it would have required more effort. But it sounds like John knew this and decided he would rather spend his time elsewhere, which is completely his choice to make. As written, these suggestions feel a bit pushy to me.

In Italy we have this, in the Ferragosto week (around the 15 of August) a huge percentage of people is on vacation. In general a lot of people take vacations in August and schools of all order and levels (including university) are closed the whole month.

In the simplest possible way to partecipate, yes, but a hackathon is made to elicit imaginative and novel ways to approach the problem (how? I do not know, it is the partecipants' job to find out).

Yes of course:

Models:

Datasets:

 

Also in the performance metric, the sum of the performance of each layer should probably be weighted to give less importance to the initial layers, otherwise we encourage t... (read more)

Probably even if not completely by hand, MNIST is so simple that hybrid human-machine optimization could be possible, maybe with a UI where you can see the effect on validation loss in (almost) real time of changing a particular weight with a slider. I do not know if it would be possible to improve the final score by changing the weights one by one. Or maybe the human can use instinctual vision knowledge to improve the convolutional filters.

On Cifar this looks very hard to do manually given that the dataset is much harder than Mnist.

I think that a too larg... (read more)

The performance can be a weighted average of the final performance and how uniformly we go from totally random to correct. For example if we have 10 refinement models the optimal score in this category can be had when each refinement block reduces the distance from the initial text encoding random vector to the final one by 10% of the original distance each time. This should make sure that the process is in fact gradual, and not that for example, the last two layers do all the work and everything before is just the identity. Also maybe it should not be a linear scale but a logarithmic scale because the final refinements might be harder to do than the initial ones.

1Charbel-Raphaël
Okay, this kind of metric could maybe work. The metric could be  sum of the performance of each layer + regularization function of the size of the text proportional to the indice of the layer. I'm not super familiar with those kinds of image to text models. Can you provide an example of a dataset or a GitHub model doing this ?

Cool GitHub repository, thanks for the link.

Image to text model with successive refinements:

 

For example, given the image above, the "first" layer of the network outputs: "city", the second one outputs "city with vegetation", the third one "view of a city with vegetation from a balcony", the fourth one "view of a city with skyscrapers on the background and with vegetation from a balcony".

 

This could be done by starting with a blank description and repeating many times a "detailer" network that should add details to a description given an image.

 

This should help interpret-ability and t... (read more)

1Charbel-Raphaël
Thank you. This is a good project idea, but there is no clear metric of performance, so it's not a good hackathon idea.

A very simple task, like MNIST or CIFAR classification, but the final score is:

 

 

where "" is a normalization factor that is chosen to make the tradeoff as interesting as possible. This should be correlated to AI safety as a small and/or very sparse model is much more interpretable and thus safer than a large/dense one. You can work on this for a very long, time, trying simple fully connected neural nets, CNNs, resnets, transformers, autoencoders of any kind and so on. If the task looks too easy you might... (read more)

1ThomasJ
Is this like "have the hackathon participants do manual neural architecture search and train with L1 loss"?
1Charbel-Raphaël
I'm still thinking about this idea. We could try to do the same thing but on Cifar10. I do not know if it would be possible to construct by hand the layers. On mnist, for a network (LeNet, 60k parameters)with 99 percents accuracy, the crossentropy is 0.05 If we take the formula: CE + lambda log nb non null params A good lambda is equal to 100. (Equalizing crossentropy and regularization) In the mnist minimal number of weights competition, we have 99 percents accuracy with 2000 weights. So lambda is equal to 80. Maybe If we want to stress the importance of sparsity, we can choose a lambda equal to 300.
1Charbel-Raphaël
I like the idea. But this already exist: https://github.com/ruslangrimov/mnist-minimal-model

Even lower tech: buy very warm and comfy sweaters to wear at home and while sleeping, it could save you a ton of money considering the astronomical power bills that might arrive this winter. Cashmere is really good but expensive.

Extremely cool evolution experiment where E. coli bacteria evolve to eat citrate along with many other interesting happenings.

Yes, I meant plummeting "within reason" (like x10) not plummeting to extremely low values that, as you correctly said, are not possible given the energy cost.

I am not really sure about that. There is not only a huge money cost but also a huge energy cost when sending something into orbit, would the panels even make back the fuel spent to send them? Even if the rocket hardware is reused 100% with no serious maintenance costs (reusing costs more fuel) would the panel even make back that fuel energy alone? I did not do the math but maybe not even that. If we could put them in orbit with a space elevator almost for free the tune would be way different though.

2JBlack
Oh yes, there is no question at all that they would make back the fuel energy cost. In money terms the fuel is a tiny fraction of launch costs (less than 1%). In fuel energy terms it costs about 400 MJ/kg to get payload into orbit via Falcon-9. With fairly standard terrestrial designs you can get about 5 W/kg rated power (mass including support electronics), which in space would be available nearly continuously. That gives a energy payback time of about 2.5 years. With solar power designs more suited to space use, I would be very surprised if that couldn't be reduced to weeks.
  • Very good point: I think the website I linked to refers to peak power, so the Kilowatthours would be lower. (not sure on this, sorry)
  • If the panels on orbit last double the time and produce double the energy that is only a factor of 4, while the system is about 300 times more expensive. (but again you have transmission losses that I did not consider)

SpaceX's Falcon 9 now advertises a cost of $62 million to launch 22,800 kg to LEO, $2,720/kg. https://ttu-ir.tdl.org/bitstream/handle/2346/74082/ICES_2018_81.pdf 

Given an average solar silicon price of around $9 US per kilogram in 2020 https://www.solarquotes.com.au/blog/solar-silicon-price-hike/#:~:text=Compared%20to%20the%20average%20solar,%2434%20Australian%20dollars%20per%20panel.

 

This would increase costs 2720 / 9 = 302 times.

The cost of a solar electric system is measured in dollars per watt. The average cost for a residential system is cur... (read more)

1mikbp
Just to add, this thread of Phil Metzger argues otherwise: https://twitter.com/DrPhiltill/status/1583106346538311680
2JBlack
This is probably the worst-case comparison for space solar, since it assumes you're just going to pack a bunch of terrestrial systems onto a rocket and shoot them into space, where they will (just like terrestrial systems) only work at a fraction of capacity due to clouds, bad sun angles, getting dirty, and night-time. In practice they would provide a lot more power per unit mass by at least one order of magnitude and possibly two. Mirrors in space can be relatively flimsy thin things and still work since they don't need to withstand winds and other loads, giving relatively lightweight concentrated solar power options at much lower masses than terrestrial systems. The conclusion is the same though: space launched solar is still not worth it for us now. It could be in the future or with some alternative history.
1[anonymous]
no, it will never work, even if the cost of sending a kg to orbit plummets. A solar electric system on earth doesn't make 1 watt all the time.  Obviously there is night, and there is geographic differences. A quick and dirty approximation is here: https://unboundsolar.com/solar-information/sun-hours-us-map .  The idea of "sun hours".  Let's take the median "sun hours" of 4. this means just 1/6 of the time do you get a rated solar panel's full output. Negating the microwave transmission system's cost and other costs, if the cost of sending a kg to orbit is less than 5/6 the cost of a panel on earth plus storage, it could work.  Not "never". I concede it's unlikely, sending a kilogram to orbit has immense energy costs and so even advanced technology will hit a limit on how cheap it can be.  Space based solar probably would only make sense if you had a society so in need of energy that you had exhausted your options on earth already, with entire continents covered in panels, and you still needed more energy. You also have an issue that at that point you are importing more heat to earth than it can radiate to space under normal climate conditions, so it probably wouldn't be a good idea to do this..
2casualphysicsenjoyer
Sorry, I might be missing something here but * Isn't price of energy typically measured in kW hours. Energy = Power x Time.  * If a space solar system can output more energy since it stays on for longer, wouldn't this mean that the cost per watt hour would naturally decrease? This would be because the price of a watt hour I imagine would be Energy / price. So, if our launch cost is a fixed cost, then we would find that E / price decreases. 

Thanks, added the comment in the correct place now.

Let's have fun with recursion!

A checkerboard where each square is itself a checkerboard.

A cube with mirrors on both sides, the mirrors show multiple reflections of the cube.

A person wearing a shirt with an image of that person wearing that shirt.

Let's have fun with recursion!

A checkerboard where each square is itself a checkerboard.

A cube with mirrors on both sides, the mirrors show multiple reflections of the cube.

A person wearing a shirt with an image of that person wearing that shirt.

3P.
You replied to the wrong comment.

Immune erosion is used to make people understand that immune escape is only partial and not total

Thanks for your feedback, in fact correlation is not causation and we must be very careful about self-selection effects. This is not a self-selection effect but still a correlation/causation enigma that I found interesting in recent times: high vitamin D levels were found to be heavily anticorrelated with severe COVID in observational studies, but people of old age are both sensible to severe COVID and have lower vitamin D than average, not only that but people with a healthy lifestyle of many outdoor walks also have higher vitamin D! Is this causation, co... (read more)

Hi, I wrote the Book-Review on Spark as my first post on LessWrong. Sadly I received no comments in response to it and I would love some feedback after spending so much time writing it. I am open to any kind of feedback about it. I really enjoyed this bounty program and I will probably partecipate also in the future ones.

8ryan_b
It is on my list of the reviews to read, so never fear! Feedback will be available.

It might discourage exploration and lead to more stasis in local optimums.

1wangscarpet
Thanks, fixed.

Given the extreme infectiousness of the Delta strain it is reasonable to assume that everyone will come into contact with it (either with or without illness if the vaccine works properly). If so, if you are already 14+ days from double vaccination, the timing of coming into contact with it now or in three months time is almost irrelevant (if there is space in the hospitals, otherwise it is in fact reasonable to exercise outside a month or two).

-1MichaelStJules
I don't think it's inevitable that everyone will come into contact with COVID or definitely catch COVID (which becomes more likely the more often you come into contact with it). You can still manage your exposure. Furthermore, you can catch COVID multiple times.

Points 3 and 4 can only sound excessive

Where is point 4?

2Carlos Ramirez
As Ikaxas said. It's now fixed. 
2Vaughn Papenhausen
Ah, I think the fact that there's an image after the first point is causing the numbered list to be numbered 1,1,2,3.

Perception of temperature is mostly cultural yes, what do you think about putting sound absorbing panels on the walls of the room?

2gilch
Seems like it would help if noise is a problem. More expensive than earplugs, and I'm not sure how durable they are. There may be some safety issues, but it seems less bad than the earplugs.

Fascinating article, two things:

  1. On average people prefer a room temperature of 18-22 degrees Celcius.

Are you sure? That looks really cold. My thermodynamics book said 26 C is the best temperature for confort for humans.

2. What about install sound absorbing panels on the walls for sound insulation? Sounds more confortable then earplugs and work both ways, so you can be noisy without giving trouble to the neightbours

2ChristianKl
What's warm depends a lot on the clothing. I think there's a good chance that the thermodynamics book speaks about 26 C for naked humans.
2Raemon
I happen to be looking into sound insulation right now. My understanding is that panels on the walls mostly improves the quality of sound within the room rather than actually prevent sound from getting in or out (which is something that requires good walls, and potentially plugging individual leaks such as through a door. I haven't yet gotten a clear sense of whether there's a way to increase room noise isolation without major architectural overhauls.
2gilch
In my area, typical room temperature is 20-22 C, so 26 seems a bit high, but this might be cultural. Humans are a tropical species and couldn't survive winter in the temperate zones without clothing, shelter, and fire. I'd call 26 C "warm", but it seems well within my long-term tolerable range. But there are significant temperature differences between day and night. I think you can go as low as 15 C before I'd call it "cold". I'd be OK without a shirt and blanket at night at 15 C. Just a bed sheet. I sleep on memory foam though.

Incredibly fascinating, I am opposite, only internal verbal monologue, no images at all. I can build step by step an image from simple lines and circles in my mind, but it is a conscious effort, (i.e. I am not really seeing it, if you get what I mean, as soon as I focus on a detail the rest vanishes). Basing on your anecdotal evidence I could maybe learn to imagine images that I am not really seeing with my eyes in that moment but it feels like the opposite of what my mind "is built" to do.

I am quite young, I am in fact in university studying artificial intelligence and this website did play a bit of a role in me choosing this field of study (I was already quite drawn to it before). I think this site has a too extremist view of AI risk, but it is important to read opinions different from mine. This site is mostly quite interesting if not at the level astral codex 10.

I have no concrete advice for you but Good Luck! I wish you all the best.

0Josh Smith-Brennan
Thanks. I checked your comments and I'd guess you were a programmer with an interest in investments and financial markets. Not areas I'm really familiar with, but I can see the draw of rational thinking within them. What do you think of the site? Just curious.
Load More