Wiki Contributions

Comments

I think you get very different answers depending on whether your question is "what is an example of a policy that makes it illegal in the United States to do research with the explicit intent of creating AGI" or whether it is "what is an example of a policy that results in nobody, including intelligence agencies, doing AI research that could lead to AGI, anywhere in the world".

For the former, something like updates to export administration regulations could maybe make it de-facto illegal to develop AI aimed at the international market. Historically, that was successful at making it illegal to intentionally export software which implemented strong encryption for a bit. It didn't actually prevent the export, but it did arguably make that export unlawful. I'd recommend reading that article in full, actually, to give you an idea of how "what the law says" and "what ends up happening" can diverge.

I think the answer to the question of how well realistic NN-like systems with finite compute approximate the results of hypothetical utility maximizers with infinite compute is "not very well at all".

So the MIRI train of thought, as I understand it, goes something like

  1. You cannot predict the specific moves that a superhuman chess-playing AI will make, but you can predict that the final board state will be one in which the chess-playing AI has won.
  2. The chess AI is able to do this because it can accurately predict the likely outcomes of its own actions, and so is able to compute the utility of each of its possible actions and then effectively do an argmax over them to pick the best one, which results in the best outcome according to its utility function.
  3. Similarly, you will not be able to predict the specific actions that a "sufficiently powerful" utility maximizer will make, but you can predict that its utility function will be maximized.
  4. For most utility functions about things in the real world, the configuration of matter that maximizes that utility function is not a configuration of matter that supports human life.
  5. Actual future AI systems that will show up in the real world in the next few decades will be "sufficiently powerful" utility maximizers, and so this is a useful and predictive model of what the near future will look like.

I think the last few years in ML have made points 2 and 5 look particularly shaky here. For example, the actual architecture of the SOTA chess-playing systems doesn't particularly resemble a cheaper version of the optimal-with-infinite-compute thing of "minmax over tree search", but instead seems to be a different thing of "pile a bunch of situation-specific heuristics on top of each other, and then tweak the heuristics based on how well they do in practice".

Which, for me at least, suggests that looking at what the optimal-with-infinite-compute thing would do might not be very informative for what actual systems which will show up in the next few decades will do.

Can you give a concrete example of a safety property of the sort that are you envisioning automated testing for? Or am I misunderstanding what you're hoping to see?

For example a human can to an extent inspect what they are going to say before they say or write it. Before saying Gary Marcus was "inspired by his pet chicken, Henrietta" a human may temporarily store the next words they plan to say elsewhere in the brain, and evaluate it.

Transformer-based also internally represent the tokens they are likely to emit in future steps. Demonstrated rigorously in Future Lens: Anticipating Subsequent Tokens from a Single Hidden State, though perhaps the simpler demonstration is simply that LLMs can reliably complete the sentence "Alice likes apples, Bob likes bananas, and Aaron likes apricots, so when I went to the store I bought Alice an apple and I got [Bob/Aaron]" with the appropriate "a/an" token.

I think the answer pretty much has to be "yes", for the following reasons.

  1. As noted in the above post, weather is chaotic.
  2. Elections are sometimes close. For example, the winner of the 2000 presidential election came down to a margin of 537 votes in Florida.
  3. Geographic location correlates reasonably strongly with party preference.
  4. Weather affects specific geographic areas.
  5. Weather influences voter turnout[1] --

During the 2000 election, in Okaloosa County, Florida (at the western tip of the panhandle), 71k of the county's 171k residents voted, with 52186 votes going to Bush and 16989 votes going to Gore, for a 42% turnout rate.

On the day of November 7, 2000, there was no significant rainfall in Pensacola (which is the closest weather station I could find with records going back that far). A storm which dropped 2 inches of rain on the tip of the Florida panhandle that day would have reduced voter turnout by 1.8%,[1] which would have resulted in a margin that leaned 634 votes closer to Gore. Which would have tipped Florida, which would in turn have tipped the election.

Now, November is the "dry" season in Florida, so heavy rains like that are not incredibly common. Still, they can happen. For example, on 2015-11-02, 2.34 inches of rain fell.[2] That was only one day, out of the 140 days I looked at, which would have flipped the 2000 election, and the 2000 election was, to my knowledge, the closest of the 59 US presidential elections so far. Still, there are a number of other tracks that a storm could have taken, which would also have flipped the 2000 election.[3] And in the 1976 election, somewhat worse weather in the great lakes region would likely have flipped Ohio and Wisconsin, where Carter beat Ford by narrow margins.[4]

So I think "weather, on election day specifically, flips the 2028 election in a way that cannot be foreseen now" is already well over 0.1%. And that's not even getting into other weather stuff like "how many hurricanes hit the gulf coast in 2028, and where exactly do they land?".

  1. ^

    Gomez, B. T., Hansford, T. G., & Krause, G. A. (2007). The Republicans should pray for rain: Weather, turnout, and voting in US presidential elections.

    "The results indicate that if a county experiences an inch of rain more than what is normal for the county for that election date, the percentage of the voting age population that turns out to vote decreases by approximately .9%.".

  2. ^

    I pulled the weather for the week before and after November 7 for the past 10 years from the weather.gov api and that was the highest rainfall date.

    var precipByDate = {}  
    for (var y = 2014; y < 2024; y++) {  
       var res = await fetch('https://api.weather.com/v1/location/KPNS:9:US/observations/historical.json?apiKey=<redacted>&units=e&startDate='+y+'1101&endDate='+y+'1114').then(r => r.json());  
       res.observations.forEach(obs => {  
           var d = new Date(obs.valid_time_gmt*1000);  
           var ds = d.getFullYear()+'-'+(d.getMonth()+1)+'-'+d.getDate();  
           if (!(ds in precipByDate)) { precipByDate[ds] = 0; }  
           if (obs.precip_total) { precipByDate[ds] += obs.precip_total }  
       });  
    }  
    Object.entries(precipByDate).sort((a, b) => b[1] - a[1])[0]
  3. ^

    Looking at the 2000 election map in Florida, any good thunderstorm in the panhandle, in the northeast corner of the state, or on  the west-middle-south of the peninsula would have done the trick.

  4. ^

    https://en.wikipedia.org/wiki/1976_United_States_presidential_election -- Carter won Ohio and Wisconsin by 11k and 35k votes, respectively.

An attorney rather than the police, I think.

Also "provably safe" is a property a system can have relative to a specific threat model. Many vulnerabilities come from the engineer having an incomplete or incorrect threat model, though (most obviously the multitude of types of side-channel attack).

Counterpoint: Sydney Bing was wildly unaligned, to the extent that it is even possible for an LLM to be aligned, and people thought it was cute / cool.

The two examples everyone loves to use to demonstrate that massive top-down engineering projects can sometimes be a viable alternative to iterative design (the Manhattan Project and the Apollo Program) were both government-led initiatives, rather than single very smart people working alone in their garages. I think it's reasonable to conclude that governments have considerably more capacity to steer outcomes than individuals, and are the most powerful optimizers that exist at this time.

I think restricting the term "superintelligence" to "only that which can create functional self-replicators with nano-scale components" is misleading. Concretely, that definition of "superintelligence" says that natural selection is superintelligent, while the most capable groups of humans are nowhere close, even with computerized tooling.

Looking at the AlphaZero paper

Our new method uses a deep neural network fθ with parameters θ. This neural network takes as an input the raw board representation s of the position and its history, and outputs both move probabilities and a value, (p, v) = fθ(s). The vector of move probabilities p represents the probability of selecting each move a (including pass), pa = Pr(a| s). The value v is a scalar evaluation, estimating the probability of the current player winning from position s. This neural network combines the roles of both policy network and value network12 into a single architecture. The neural network consists of many residual blocks4 of convolutional layers16,17 with batch normalization18 and rectifier nonlinearities19 (see Methods).

So if I'm interpreting that correctly, the NN is used for both position evaluation and also for the search part.

Load More