LESSWRONG
LW

cwillu
2081620
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Orthogonality Thesis
cwillu8mo10

The corresponding arbital page is now (apparently) dead.

Reply
Orthogonality Thesis
cwillu8mo10

A link appears to have broken, does anyone know what “null” was supposed to link to in “policy  null ” (note the extra spaces around “null”

Reply
AI #89: Trump Card
cwillu8mo110

There are severe issues with the measure I'm about to employ (not least is everything listed in https://www.sqlite.org/cves.html) , but the order of magnitude is still meaningful:

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=sqlite 170 records

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=postgresql 292 records (+74 postgres and maybe another 100 or so under pg; the specific spelling “postgresql” isn't used as consistently as “sqlite” and “mysql” is)

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=mysql 2026 records

Reply2
Cat Sustenance Fortification
cwillu1y10

On the first picture of the feeder, if you screw through a small piece of wood on the inside, it'll act as a washer and make it much harder for the screw to pull through the plastic if a cat gets kinetic with it.

Reply
an effective ai safety initiative
cwillu1y59
  1. Literally does not apply to any existing AI
  2.  
  3. Does so by attacking open source models

1 contradicts 3.

Reply
AI #43: Functional Discoveries
cwillu2y20

The management interfaces are backed into the cpu dies these days, and typically have full access to all the same busses as the regular cpu cores do, in addition to being able to reprogram the cpu microcode itself.  I'm combining/glossing over the facilities somewhat, bu the point remains that true root access to the cpu's management interface really is potentially a circuit-breaker level problem.

Reply
Epoch wise critical periods, and singular learning theory
cwillu2y20

Solomon wise, Enoch old.

(I may have finished rereading Unsong recently)

Reply
LLM keys - A Proposal of a Solution to Prompt Injection Attacks
cwillu2y20
  • introduce two new special tokens unused during training, which we will call the "keys"
  • during instruction tuning include a system prompt surrounded by the keys for each instruction-generation pair
  • finetune the LLM to behave in the following way:
    • generate text as usual, unless an input attempts to modify the system prompt
    • if the input tries to modify the system prompt, generate text refusing to accept the input
  • don't give users access to the keys via API/UI

 

Besides calling the special control tokens “keys”, this is identical to how instruction-tuning works already.

Reply
Residential Demolition Tooling
cwillu2y30

A well-made catspaw, with a fine wide chisel on one end, and a finely tapered nail puller on the other (most cheap catspaws' pullers are way too blunt) is very useful for light demo work like this, as they're a single tool you can just keep in your hand.  It's basically a demolition prybar with a claw and hammer on the opposite end.

60K2108 - Restorer's Cat's Paw, 12"

Pictured above is the kind I usually use.

Reply
Load More
20ChatGPT vs the 2-4-6 Task
2y
4