All of bilalchughtai's Comments + Replies

As a general rule, I try and minimise my phone screen time and maximise my laptop screen time. I can do every "productive" task faster on a laptop than on my phone.

Here are some things object level things I do that I find helpful that I haven't yet seen discussed.

  • Use a very minimalist app launcher on my phone, that makes searching for apps a conscious decision.
  • Use a greyscale filter on my phone (which is hard to turn off), as this makes doing most things on my phone harder.
  • Every time I get a notification I didn't need to get, I instantly disable it. This also generalizes to unsubscribing from emails I don't need to receive.
2Nathan Helm-Burger
Problem #2: Now I have to go searching for a way to rate-limit the api calls sent by evalugator. Can't just slam GoodFire's poor little under-provisioned API with as many hits per minute as I want! Error code: 429 - {'error': 'Rate limit exceeded: 100 requests per minute'} Update: searched evalugator for 'backoff' and found a use of the backoff lib. Added this to my goodfire_provider implementation: import backoff ... def on_backoff(details): if int(details.get("tries", 0)) % 3 == 0: print(f"Backing off {details['wait']:0.1f} seconds after {details['tries']}. Reason: {details['exception']}") ... @backoff.on_exception( wait_gen=backoff.expo, exception=( openai.RateLimitError, openai.APIError, ), max_value=60, factor=1.5, on_backoff=on_backoff, ) def execute(model_id: str, request: GetTextResponse): ...
3Nathan Helm-Burger
Update: Solved it! It was incompatibility between goodfire's client and evalugator. Something to do with the way goodfire's client was handling async. Solution: goodfire is compatible with openai sdk, so I switched to that. Leaving the trail of my bug hunt journey in case it's helpful to others who pass this way Things done: 1. Followed Jan's advice, and made sure that I would return just a plain string in GetTextResponse(model_id=model_id, request=request, txt=response, raw_responses=[], context=None) [important for later, I'm sure! But the failure is occurring before that point, as confirmed with print statements.] 2. tried without the global variables, just in case (global variables in python are always suspect, even though pretty standard to use in the specific case of instantiating an api client which is going to be used a bunch). This didn't change the error message so I put them back for now. Will continue trying without them after making other changes, and eventually leave them in only once everything else works. Update: global variables weren't the problem. Trying next: 1. looking for a way to switch back and forth between multithreading/async mode, and single-worker/no-async mode. Obviously, async is important for making a large number of api calls with long delays expected for each, but it makes debugging so much harder. I always add a flag in my scripts for turning it off for debugging mode. I'm gonna poke around to see if I can find such in your code. If not, maybe I'll add it. (found the 'test_run' option, but this doesn't remove the async, sadly). The error seems to be pointing at use of async in goodfire's library. Maybe this means there is some clash between async in your code and async in theirs? I will also look to see if I can turn off async in goodfire's lib. Hmmmmm. If the problem is a clash between goodfire's client and yours... I should try testing using the openai sdk with goodfire api. 2. getting some errors in the uses of regex.

Yep, this sounds interesting! My suggestion for anyone wanting to run this experiment would be to start with SAD-mini, a subset of SAD with the five most intuitive and simple tasks. It should be fairly easy to adapt our codebase to call the Goodfire API. Feel free to reach out to myself or @L Rudolf L if you want assistance or guidance.

2Nathan Helm-Burger
Got it working with sad run --tasks influence --models goodfire-llama-3.3-70B-i --variants plain Problem #3: The README says to much prefer the full sad over sad-lite or sad-mini. The author of the README must feel strongly about this, because they don't seem to mention HOW to run sad-lite or sad-mini! I tried specifying sad-mini as a task, but there is no such task. Hmmm. Ahah! It's not a task, it's a "subset". f"The valid subsets are 'mini' (all non-model-dependent multiple-choice-only tasks) and 'lite' (everything except facts-which-llm, which requires specifying many answers for each model).\nUnknown subset asked for: {subset}" ValueError: Unexpected argument --subset. But... the cli doesn't seem to accept a 'subset' parameter? Hmmm. Maybe it's a 'variant'? The README doesn't make it sound like that... variants_str = variants variants_ = {task.name: get_variants(task, variants) for task in tasks} print( f"Running tasks: {[task.__class__.__name__ for task in tasks]}\nfor models: {models}\nwith variant setting '{variants_str}'\n(total runs = {sum([len(variant_list) for variant_list in variants_.values()]) * len(models)} runs)\n\n" ) Uh, but 'run' doesn't accept a subset param? But 'run_remaining' does? Uh.... But then, 'run_remaining' doesn't accept a 'models' param? This is confusing. Oh, found this comment in the code: """This function creates a JSON file structured as a dictionary from task name to task question list. It includes only tasks that are (a) multiple-choice (since other tasks require custom grading either algorithmically, which cannot be specified in a fixed format) and (b) do not differ across models (since many SAD tasks require some model-specific generation or changing of questions and/or answers). This subset of SAD is called SAD-mini. Pass `variant="sp"` to apply the situating prompt.""" #UX_testing going rough. Am I the first? I remember trying to try your benchmark out after you had recently released
2Nathan Helm-Burger
I'm trying to implement a custom provider, as according to the SAD readme, but I'm doing something wrong. def provides_model(model_id: str) -> bool: """Return true if the given model is supported by this provider.""" return model_id == "goodfire-llama-3.3-70B-i" def execute(model_id: str, request: GetTextResponse): global client global variant if client is None: client = goodfire.Client(api_key=os.getenv("GOODFIRE_API_KEY")) variant = goodfire.Variant("meta-llama/Llama-3.3-70B-Instruct") all_features = [] with open("/home/ub22/projects/inactive_projects/interpretability/goodfire/away_features.jsonl", "r") as f: for line in f: all_features.append(json.loads(line)) feature_index = os.environ.get("GOODFIRE_FEATURE_INDEX") feature_strength = os.environ.get("GOODFIRE_FEATURE_STRENGTH") feature = goodfire.Feature.from_json(all_features[int(feature_index)]) variant.set(feature, float(feature_strength)) prompt = [x.__dict__ for x in request.prompt] # print("prompt", prompt, "\n\n\n") response_text = client.chat.completions.create( prompt, model=variant, stream=False, max_completion_tokens=1200, ) # call the model here response = [response_text.choices[0].message] # I think a list of dicts is expected? [{'role': 'assistant', 'content': "I'm doing great, thanks! How about you? How can I help you today?"}] # print("response: ", response) return GetTextResponse(model_id=model_id, request=request, txt=response, raw_responses=[], context=None)

How do you know what "ideal behaviour" is after you steer or project out your feature? How would you differentiate a feature with sufficiently high cosine sim to a "true model feature" and a "true model feature"? I agree you can get some signal on whether a feature is causal, but would argue this is not ambitious enough.

Yes, that's right -- see footnote 10. We think that Transcoders and Crosscoders are directionally correct, in the sense that they leverage more of the models functional structure via activations from several sites, but agree that their vanilla versions suffer similar problems to regular SAEs.

Also related to the idea that the best linear SAE encoder is not the transpose of the decoder.

For another perspective on leveraging startups for improving the world see this blog post by @benkuhn.

A LW feature that I would find helpful is an easy to access list of all links cited by a given post.

6gwern
An example implementation of this feature is Gwern.net's "link-bibliographies" (eg). We extract all URLs from the Markdown, filter, turn them into a list with the available metadata like title/author/tags/abstract, and because we assign IDs to all links, we can also provide a reverse/backlink '↑' popup of the context in the page where the link was used (and a link might be used multiple times). Wikipedia links are included, but stuffed into a sublist at the end because they would drown out the regular links. It uses dynamic/lazy transclusion, so it doesn't cost anything if you never look at it, but if you want to print out the page or something, that should also be doable as they get loaded then by the print-mode. ---------------------------------------- We also plan to include a second version of the link-bibliography, a 'browsing history' version, which quietly logs each link you interact with in a big list at the end of the page (we'll probably put it before the full link-bibliography). So the standard full link-bibliography provides all the URLs, but the browsing-history would provide just the shortlist of URLs you interact with, in the order you interacted with them. The idea is that you could more freely move in and out of popups if you didn't have the anxiety of 'losing' them, because there's an append-only log, and after reading a page, you might skim the browsing-history and open up some of them for further reading or to jog your memory about what you were reading at one point. Since the links are in temporal order, it should be easy to reconstruct your train of thought at any point as you were reading. (You could also use it to create a sort of 'custom bibliography', where you pop up a small subset of links focused on some particular claim or thesis, and you can save that to PDF or something.) Since it's all transcludes, the browsing-history is also effectively free (you already paid the cost of downloading & rendering each entry when you popped it up th

Agreed that this post presents the altruistic case.

I discuss both the money and status points in the "career capital" paragraph (though perhaps should have factored them out).

your image of a man with a huge monitor doesn't quite scream "government policymaker" to me

In fact, this mindset gave me burnout earlier this year.

I relate pretty strongly to this. I think almost all junior researchers are incentivised to 'paper grind' for longer than is correct. I do think there are pretty strong returns to having one good paper for credibility reasons; it signals that you are capable of doing AI safety research, and thus makes it easier to apply for subsequent opportunities.

Over the past 6 months I've dropped the paper grind mindset and am much happier for this. Notably, were it not for short term grants where needing to visib... (read more)

You might want to stop using the honey extension. Here are some shady things they do, beyond the usual:

  1. Steal affiliate marketing revenue from influencers (who they also often sponsor), by replacing the genuine affiliate referral cookie with their affiliate referral cookie.
  2. Deceive customers by deliberately withholding the best coupon codes, while claiming they have found the best coupon codes on the internet; partner businesses control which coupon codes honey shows consumers.
3habryka
Not yet :(  Still working on it.

UC Berkeley has historically had the largest concentration of people thinking about AI existential safety. It's also closely coupled to the Bay Area safety community. I think you're possibly underrating Boston universities (i.e. Harvard and Northeastern, as you say the MIT deadline has passed). There is a decent safety community there, in part due to excellent safety-focussed student groups. Toronto is also especially strong on safety imo.

Generally, I would advise thinking more about advisors with aligned interests over universities (this relates to Neel's... (read more)

Is there a way for UK taxpayers to tax-efficiently donate (e.g. via Gift Aid)?

5habryka
I am working on making that happen right now. I am pretty sure we can arrange something, but it depends a bit on getting a large enough volume to make it worth it for one of our UK friend-orgs to put in the work to do an equivalence determination.  Can you let me know how much you are thinking of giving (either here or in a DM)?

Agreed. A related thought is that we might only need to be able to interpret a single model at a particular capability level to unlock the safety benefits, as long as we can make a sufficient case that we should use that model. We don't care inherently about interpreting GPT-4, we care about there existing a GPT-4 level model that we can interpret.

Tangentially relevant: this paper by Jacob Andreas' lab shows you can get pretty far on some algorithmic tasks by just training a randomly initialized network's embedding parameters. This is in some sense the opposite to experiment 2.

I don't think it's great for post age-60 actually, as compared with a regular pension, see my reply. The comment on asset tests is useful though, thanks. Roughly LISA assets count towards many tests, while pensions don't. More details here for those interested: https://www.moneysavingexpert.com/savings/lifetime-isas/

Couple more things I didn't explain:

  1. The LISA is a tax free investment account. There are no capital gains taxes on it. This is similar to the regular ISA (which you can put up to £20k in per year, doesn't have a 25% bonus, and can be used for anything - the £4k LISA cap contributes to this £20k). I omitted this as I was implicitly viewing using this account as the counterfactual.
  2. The LISA is often strictly worse than a workplace pension for saving for retirement, if you are employed. This is because you invest in a LISA post-(income)tax, while pension contributions are calculated pre-tax. Even if the bonus approximately makes up for tax you pay, employer contributions tip the balance towards the pension.

Should you invest in a Lifetime ISA? (UK)

The Lifetime Individual Savings Account (LISA) is a government saving scheme in the UK intended primarily to help individuals between the ages of 18 and 50 buy their first home (among a few other things). You can hold your money either as cash or in stocks and shares.

The unique selling point of the scheme is that the government will add a 25% bonus on all savings up to £4000 per year. However, this comes with several restrictions. The account is intended to only be used for the following purposes:
1) to buy your firs... (read more)

1bilalchughtai
Couple more things I didn't explain: 1. The LISA is a tax free investment account. There are no capital gains taxes on it. This is similar to the regular ISA (which you can put up to £20k in per year, doesn't have a 25% bonus, and can be used for anything - the £4k LISA cap contributes to this £20k). I omitted this as I was implicitly viewing using this account as the counterfactual. 2. The LISA is often strictly worse than a workplace pension for saving for retirement, if you are employed. This is because you invest in a LISA post-(income)tax, while pension contributions are calculated pre-tax. Even if the bonus approximately makes up for tax you pay, employer contributions tip the balance towards the pension.
3Zac Hatfield-Dodds
Well, it seems like a no-brainer to store money you intend to spend after age 60 in such an account; for other purposes it does seem less universally useful. I'd also check the treatment of capital gains, and whether it's included in various assets tests; both can be situationally useful and included in some analogues elsewhere.
3Dagon
(not in the UK, first I'd heard of this) That 6.25% net penalty is less than the US penalty for tax-protected savings (401k), which is 10%  And the US Government doesn't even kick in (some employers do, and the deferred taxation is significant over many years).  The chance that you'll buy a house doesn't even need to enter into your calculations, if you include the chance that you'll live to age 60, it seems like a very good deal.