Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
tmeanen30

Does LessWrong have a strategy for getting the ideas posted on this site out to other communities (e.g. academia, decision-makers at frontier labs, policy circles, etc)? My impression is that there are a whole lot of potentially impactful ideas floating around on this site, such as important macrostrategic considerations for short-timelines, fast-ish takeoff worlds. Do the LW mods have a strategy to get the right people hearing these ideas, or do we just wait until someone important stumbles across the site?

tmeanen71

fab will check design plans and inform designers what can and can't be manufactured

I'm curious - what are the most common reasons for this? I.e: what are the common requests that designers make that fabs can't manufacture? 

tmeanen10

Seems like a useful resource to have out there. Some other information that would be nice to have are details about the security of the data center - but there's probably limited information that could be included [1]

  1. ^

    Because you probably don't want too many details about your infosec protocols out there for the entire internet to see. 

tmeanen10

Reconnaissance might be a candidate for one of the first uses of powerful A(G)I systems by militaries - if this isn't already the case. There's already an abundance of satellite data (likely exabytes in the next decade) that could be thrown into training datasets. It's also less inflammatory than using AI systems for autonomous weapon design, say, and politically more feasible. So there's a future in which A(G)I-powered reconnaissance systems have some transformative military applications, the military high-ups take note, and things snowball from there. 

tmeanen5-4

But if the core difficulty in solving alignment is developing some difficult mathematical formalism and figuring out relevant proofs then I think we won't suffer from the problems with the technologies above. In other words, I would feel comfortable delegating and overseeing a team of AIs that have been tasked with solving the Riemann hypothesis - and I think this is what a large part of solving alignment might look like.

tmeanen10

I've been in a number of arguments where people say things like "why is 90% doom such a strong claim? That assumes that survival is the default! "

Am I misunderstandng this sentence? How do "90% doom" and the assumption that survival is the default square with one another? 

tmeanen61

“keyboard and monitor I’m using right now, a stack of books, a tupperware, waterbottle, flip-flops, carpet, desk and chair, refrigerator, sink, etc. Under my models, if I pick one of these objects at random and do a deep dive researching that object, it will usually turn out to be bad in ways which were either nonobvious or nonsalient to me, but unambiguously make my life worse"

But, I think the negative impacts that these goods have on you are (mostly) realized on longer timescales - say, years to decades. If you’re using a chair that is bad for your posture, the impacts of this are usually seen years down the line when your back starts aching. Or if you keep microwaving tupperware, you may end up with some pretty nasty medical problems, but again, decades down the line. 

The property of an action having long horizons until it can be verified as good or bad for you makes delegating to smarter-than-you systems dangerous. My intuition is that there are lots of tasks that could significantly accelerate alignment research that don’t have this property, examples being codebase writing (unit tests can provide quick feedback), proof verification etc. In fact, I can’t think of many research tasks in technical fields that have month/year/decade horizons until they can be verified - though maybe I’ve just not given it enough thought.

tmeanen30

Plausibly one technology that arrives soon after superintelligence is powerful surveillance technology that makes enforcing commitments significantly easier than it historically has been. Leaving aside the potential for this to be misused for authoritarian government, advocating for this to be developed before powerful technologies of mass destruction may be a strategy.  

tmeanen20

Nice, I like this concept of rogue deployment as it highlights two distinct features that are both required for a safety method to be considered 'successful'. I'm understanding catastrophe with rogue deployment as having good enough safety measures but these safety measures were bypassed/turned off, whereas catastrophe without rogue deployment involves having safety measures that were fully operational the whole time but insufficient to prevent a model/human actor from causing a catastrophe.  

So for example, we could get really great mech. interp tools, but avoiding catastrophe isn't guaranteed if all of these mech. interp tools are running on a single server (making them very easy to disable). To prevent rogue deployment we’d want multiple servers running these mech. interp tools to provide redundancy in case one goes down/gets hacked etc. So there's a concept here of the raw effectiveness of a safety method as well as its reliability. I'm sure others can probably think of more nuanced examples too.