Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

# The power of information?

0 13 October 2009 01:07AM

I'm thinking about how to model an ecosystem of recursively self-improving computer programs.  The model I have in mind assumes finite CPU cycles/second and finite memory as resources, and that these resources are already allocated at time zero.  It models the rate of production of new information by a program given its current resources of information, CPU cycles, and memory; the conversion of information into power to take resources from other programs; and a decision rule by which a program chooses which other program to take resources from.  The objective is to study the system dynamics, in particular looking for attractors and bifurcations/catastrophes, and to see what range of initial conditions don't lead to a singleton.

(A more elaborate model would also represent the fraction of ownership one program had of another program, that being a weight to use to blend the decision rules of the owning programs with the decision rule of the owned program.  It may also be desirable to model trade of information.  I think that modeling Moore's law wrt CPU speed and memory size would make little difference, if we assume the technologies developed would be equally available to all agents.  I'm interested in the shapes of the attractors, not the rate of convergence.)

Problem: I don't know how to model power as a function of information.

I have a rough model of how information grows over time; so I can estimate the relative amounts of information in a single real historical society at two points in time.  If I can say that society X had tech level T at time A, and society Y had tech level T at time B, I can use this model to estimate what tech level society Y had at time A.

Therefore, I can gather historical data about military conflicts between societies at different tech levels, estimate the information ratio between those societies, and relate it to the manpower ratios between the armies involved and the outcome of the conflict, giving a system of inequalities.

You can help me in 3 ways:

• Tell me why this is a bad idea.
• Tell me some better way to relate information to power.
• Gather historical data of the type I just described.

If you choose the last option, choose a historical conflict between sides of uneven tech level, and post here as many as you can find of the following details:

• Identifying details: Year, opponents involved, name of conflict
• Number of combatants on each side
• Duration of the conflict
• Outcome of the conflict
• Identification of technological advantages of either side, especially ones that proved key (this can include non-material technologies, such as training or ideologies)
• How long the side with a technological advantage had possessed that technology at the time of the battle
• Estimation of the tech levels of each side in terms of a "standard" Western-Equivalent era (WE) attached to the most-advanced "Western" nation of the time (or in the conflict), and the facts used to make these estimates.  (We might trace the WE timeline starting with Sumeria, then Egypt, Greece, Rome, etc.  If enough datapoints are available for the 2 societies involved to model their information growth, they need not be mapped to WE.)

For example:

• Battle of Agincourt, English vs. French, October 1415.
• English combatants: Given as 5900, or 8000
• French: Given as 20,000-36,000, or 12,000
• Duration: 3 hours
• English win; 4,000–10,000 French dead, with up to 1,600 English dead
• Most historians identify the English longbow as the key technology responsible for the English win (reference needed).  The English longbow was used at least as early as 1346.  However, the Wikipedia article on Agincourt, as well as this site, indicate that the crucial technology was not just the longbow, but the use of palings (long stakes driven into the ground to protect bowmen from cavalry).  A third key innovation of the English was being dedicated to winning battles rather than to fighting for glory; the French disdained longbows as unchivalrous, and were given to drawing up battle plans that would give them personal glory in the retelling rather than battle plans that would win (described briefly here, and in more detail in a book, I think "A Distant Mirror" by Barbara Tuchman.)  This may be why the French deployed their archers behind their cavalry, where they were unable to take part in most of the battle.
• I assign the French a tech-level of 1346 WE, based on this being the date when the English first used the longbow heavily in a major battle.

Using the two dates 1415 and 1346 leads to some tech-level (or information) ratio R.  For example, under a simple model assuming that tech level doubled every 70 years in this era, we would give the English a tech-level ratio over the French of 2, and then say that the tech-level ratio enjoyed by the English produced a power multiplier greater than the manpower ratio enjoyed by the french:   P(2) > 30000/5900.  This ignores the many advances shared by the English and French between 1346 and 1415; but most of them were not relevant to the battle.  It also ignores the claim that the main factor was that the French had heavy armour, which was a disadvantage rather than an advantage in the deep mud on that rainy day.  Oh well.  (Let's hope for enough data that the law of large numbers kicks in.)

After gathering a few dozen datapoints, it may be possible to discern a shape for the function P.  (Making P a multiplying force that is a function of a ratio assumes P is linear, since eg. P(8) = P(8/4)*P(4/2)*P(2/1) = 4*P(2); the data can reject this assumption.)  There may be a way to factor the battle duration and the casualty outcome into the equation as well; or at least to see if they correlate with the distance of the datapoint's manpower ratio from the estimated value of P(information ratio) for that datapoint.

(I tried to construct another example from the Battle of Little Bighorn to show a case where the lower-level technology won, but found that the Indians had more rifles than the Army did, and that there is no agreement as to whether the Indians' repeating rifles or the Army's longer-ranged single-shot Springfield rifles were better.)

Sort By: Best
Comment author: 13 October 2009 04:09:00PM 5 points [-]

Am I the only one who is reminded of game theory reading this post. In fact it basically sounds like given a set of agents engaged in competitive behavior how does "information" (however you define it, which I think others are right to ask for clarification) effect the likely outcome? Though I am confused by the overly simple military examples. I would wonder if one could find a simpler system to use? I also am confused about what general principles with this system of derived inequalities you want to find?

Comment author: 13 October 2009 04:41:33PM 4 points [-]

I think the relation of information to power, historically, will be too complicated and too intertwined with other variables for you to discover. Even if you count those other variables as part of the starting conditions technology has a tendency to make different aspects of the starting conditions salient such that you can't evaluate the effect starting conditions have on power until you know the path of technology. Moreover, power is a function of memetic technologies as much as physical technologies and the former will be even more difficult to quantify. You'd be better off making the function of information to power a variable in your models.

But if you persist you should keep in mind the distinction between offense-dominant systems and defense dominant systems. Offense-dominant systems occur when technology is good at invading/destroying enemies but bad at protecting you from them. Defense-dominant is the reverse. Knowing whether a technology is defensive or offensive is crucial for understanding its impact. Some of history's biggest military blunders occurred when one side misapprehended their system. For example, in WWI the Germans thought they were still in a offense-dominant system so they thought they executed the Schlieffen Plan and tried to beat France quickly and then shift their forces east to avoid a two-front war. But the invention of the machine gun/trench war technology meant that it was much, much harder to take territory than anyone had expected and so the Germans couldn't get take France fast enough (there were other reasons). Germany wasn't alone in making this mistake, lots of countries thought the war would finish quickly. Then in WWII much of Europe was still thinking in terms of the defense-dominant system of WWI and so were shocked at the speed of German progress... but such progress was inevitable given advances in tank and aircraft that rendered trench warfare tactics useless.

The more agents perceive that their system is offense-dominant the more unstable the system is since agents estimate the benefits of doing well in a conflict, and the costs of doing poorly, to be high. Mutual second-strike nuclear capabilities is actually an extremely stable system for this reason. And mutual first-strike capabilities is about as bad as it gets. Anyway, it seems this distinction would be important for any modeling of power.

Comment author: 13 October 2009 08:18:01PM 0 points [-]

Interesting point about offensive/defensive power.

You'd be better off making the function of information to power a variable in your models.

Given an amount of information, I need to compute a corresponding amount of power. "Make it a variable" doesn't help. It's already a variable. I have too many variables. That's why I want to make it a function.

Comment author: 14 October 2009 12:25:47AM 1 point [-]

Right, I understand why you want to make it a function. But as I see it your practical options are, 1) make a gross generalization from insufficient data that might have no relation to the function of information to power in the future and hope that you get close enough to reality that your model has at least some accuracy in its predictions OR 2) come up with 4-5 plausible but as-different-as-possible functions relating information to power and model with each of them. The result 4-5 times more predictions to sort through and instead of conclusions like "Starting conditions x,y,z, lead to a singleton" you'll get conclusions like "Starting conditions x,y,z, given assumptions a,b,c about the relation of information to power, lead to a singleton." The second option is harder and less conclusive. But it is also less wrong.

One more thing about the offense/defense distinction. One implication of the theory is that technological advancement can actually undermine an agents position in a multi-polar system. If Agent A develops an offensive weapon that guarantees victory if Agent A strikes first then other agents are basically forced to attack preemptively and likely gang up on Agent A. Given this particular input of more information, the function of information to power seems to output less power.

Comment author: 14 October 2009 01:47:44AM 1 point [-]

instead of conclusions like "Starting conditions x,y,z, lead to a singleton" you'll get conclusions like "Starting conditions x,y,z, given assumptions a,b,c about the relation of information to power, lead to a singleton." The second option is harder and less conclusive. But it is also less wrong.

Okay; good point. I would still want to gather the data, though, to compare to the model results.

One implication of the theory is that technological advancement can actually undermine an agents position in a multi-polar system. If Agent A develops an offensive weapon that guarantees victory if Agent A strikes first then other agents are basically forced to attack preemptively and likely gang up on Agent A.

Tell that to the Iranians.

Comment author: 13 October 2009 03:20:05PM 3 points [-]

Information is a word loaded with associations (including Shannon information, Kolmogorov information, and various lesser-known variants). I would suggest switching to a different, less-loaded term. You seem to be using "information" to mean something like "tech level" in a game like Civilization.

Regarding "tech level" - Using only one dimension for this notion may install blinkers on your thinking. I've previously argued that many people are blinkered by using only one dimension for "intelligence", which is used in futurist rhetoric in much the same way as your "tech level".

Comment author: 13 October 2009 08:05:38PM 1 point [-]

I define "raw information", as used in other parts of the model, more precisely, in ways that are supposed to map onto Shannon-information or Kolmogorov information. I used the phrase "tech level" because my initial expectation is that power is proportional to the log of raw information. Some of my data concerning the rate of progress instead uses something with a meaning more like "perceived social change" or "useful information", which I called "tech level", and seems to be the log of raw information.

It may be that "useful information" really is Shannon information, and "raw information" is uncompressed, redundant information; and that this accounts for observations that "useful information" appears to be the log of "raw information". For instance, we have an exponential increase in the number of genes sequenced; but probably a much-less-than-linear increase in the number of types of genes known. We have an exponential increase in journal articles published; but the amount of independent, surprising information in each article may be going down.

Comment author: 13 October 2009 08:17:35PM 3 points [-]

A (thermal, say) random number generator is easy to build and a good source of both Shannon and algorithmic (Kolmogorov) information. Having lots of information in these senses is not helpful for winning battles.

Comment author: 14 October 2009 01:49:57AM 1 point [-]

True. However, I'm considering information that's not at all random, so I don't think that's a problem.

Comment author: 14 October 2009 04:24:11AM *  1 point [-]

probably a much-less-than-linear increase in the number of types of genes known

I should clarify: We still have an exponential increase in the number of protein families known; but a less-than-linear increase in the number of protein domains known. Proteins are composed of modules called "domains"; a protein contains from 1 to dozens of domains. Most "new" genes code for proteins that recombine previously-known domains in different orders.

A digression: Almost all of the information content of an organism resides in the amino-acid sequence of these domains; and a lower bound of about 64% of domains (and 84% of those found in eukaryotes) evolved before eukaryotes (which include all multicellular organisms) split from prokaryotes about 2 billion years ago. (One source: Michael Levitt, PNAS July 7 2009, "Nature of the protein universe".) So it's accurate to say that most of evolution occurred in the first billion years; the development of more-complex organisms seems to have nearly frozen evolution of the basic components. We would likely be more complex today if those ancient prokaryotes had been able to hold off evolving into eukaryotes for another billion years, so that they could develop more protein domains first. There's a lesson for aspiring singularians in there somewhere.

(Similarly, most evolution within eukaryotes seems to have occurred during a period of about 50 million years, just before the Cambrian explosion, half a billion years ago. Evolution has been slowing down in information-theoretic terms, while speeding up in terms of intelligence produced.)

Comment author: 13 October 2009 03:44:41PM 1 point [-]

I want to echo Johnnicholas; your first task is to nail down your definition of information.

Beyond that, I can't help, but I'm trying something sorta-similar. I'm trying to model life and evolution at the information-theoretic level by starting from a simple control system. I think of the control system in terms of its negentropy/entropy flows and allow mechanisms by which it can become more complex, and I try to include a role for its "understanding of its environment" which I represent by the smallness of its KL divergence of its (implicit) assumptions about the environment from the environment's true distribution.

Comment author: 13 October 2009 08:07:04PM 0 points [-]

So, you have a simulator in which you implement its control system? This sounds elaborate. I'd like to hear more about it.

Comment author: 13 October 2009 08:49:59PM 1 point [-]

Heh, I don't "have" anything yet; I'm just as the formalism stage. But the idea is that there are units (the control systems) operating within an environment, the latter of which is drawing its state from a lawful distribution (like nature does), which then affects what the units sense, as well as their structure and integrity. Depending on what the units do with the sensory data, they can be effective at controlling certain aspects of themselves, or instead go unstable. The plan is to also allow for modification of the structure of the control systems and their replication (to see evolution at work).

As for modeling the control systems, my focus is first on being able to express what's going on at the information-theoretic level, where it really gets interesting: there's a comparator, which must generate sufficient mutual information with the parameter it's trying to control, else it's "comparing" to a meaningless value. There are the disturbances, which introduce entropy and destroy mutual information with the environment. There's the controller, which must use up some negentropy source to maintain the system's order and keep it from equilibrium (as life and other dissipative systems must). And there's the system's implicit model of its environment (including the other control systems), whose accuracy is represented by the KL divergence between the distributions.

I don't expect I'll make something completely new, but at least for me, it would integrate my understanding of thermodynamics, life, information theory, and intelligence, and perhaps shed light on each.

Comment author: 13 October 2009 07:25:03AM 3 points [-]

If your "ecosystem of recursively self-improving computer programs" is intended to model real-world polities, then perhaps the type of empirical study you have in mind would be useful. However, if it is intended to more broadly model conflicting (and cooperating) entities, perhaps a more abstract model would be appropriate.

The problem of developing a general conflict success function has been rather extensively studied by political scientists; one good introduction can be found in The Journal of Conflict Resolution, Vol. 44, No. 6 (Dec., 2000), pp. 773-792, an article entitled "The Macrotechnology of Conflict," by Jack Hirshleifer. I've found his framework useful in my own work.

Comment author: 13 October 2009 08:21:16PM 0 points [-]

I don't think I can get any more abstract than the model is now. The model doesn't talk about military conflicts. I want to use historical data to parameterize my model. I'm hoping that there is a broad pattern relating information to power that is fundamental.

Comment author: 13 October 2009 09:31:59PM *  2 points [-]

The article suggests two major formats for conflict success functions that might be relevant. Notationally, let Ix be the information held by entity x, and assume that information is the only relevant determinant of power.

One possibility is to have P(x wins) = (Ix)^s / ((Ix)^s + (Iy)^s), the "ratio form," where y is the opposing entity and s is a parameter. You could run your model for a variety of s-values, which may be easier than trying to determine the effect of information on real conflicts. Alternatively, you could estimate s from the total resource input into real conflicts, rather than restricting it to information, with all the questions about intelligence and strategy that are endogenous to that concept.

The other major conflict success function is P(x wins) = exp(s.Ix) / (exp(s.Ix)+exp(s.Iy)), the "difference" form (periods indicate multiplication). The mechanics are the same, with the estimation of s the primary challenge.

For what it's worth, the US military has long operated on the belief that they need a 3:1 force advantage in the relevant theater to produce victory (prior, one imagines, to the advent of guerrilla warfare). Let us say that P(x wins | Ix = Iy * 3) = .9. Then s = 2.

Seems like a decent starting point to say that the probability of each side winning is equal to the ratio of the square of its power/resources/information/whatever to the sum of the squares.

Comment author: 14 October 2009 04:37:20AM *  1 point [-]

That's great - adding the ^s gives some flexibility to the function. I was worried because the form I specified (s=1) might not fit the data.

I'm dubious of the difference form. If you used it on "raw information", it could predict the same relative advantage for a 2009 WE opponent over a 2005 WE opponent, as for (say) a 1600 WE opponent over a 1000 WE opponent, because the difference in information is a small percent of the information either side has in the first case, but a large percentage in the second case.

Got the article; hope to read it soon.

Comment author: 13 October 2009 09:06:42AM 2 points [-]

There was that one time that the Zulus beat the British forces...

Comment author: 13 October 2009 08:16:34PM 1 point [-]

Responding to several people at once: Some people consider an AI singleton to be an inevitable outcome after the Singularity; others believe there are only 2 possible outcomes, a singleton or incessant war. I want to find out if there are stable non-singleton states in some model of post-singularity conflict; and, if so, what assumptions are needed to produce them. I think only the qualitative, not the quantitative, results will be useful. So I'm only trying to model details that would introduce new kinds of behaviors and stable states and transitions, not ones that would only make the system more accurate.

Comment author: 13 October 2009 10:19:39AM 1 point [-]

If you are using an Agent based system, then determining power could be computed after outcomes based on the modeling attributes you have determined are important.

I would recommend 'Growing Artificial Societies: Social Science from the Bottom Up (Complex Adaptive Systems) ' by Epstein

Comment author: 13 October 2009 12:59:29PM *  0 points [-]

If you are using an Agent based system, then determining power could be computed after outcomes based on the modeling attributes you have determined are important.

I don't understand. What would the computation be?

EDIT: You mean, run the system, and then see who wins contests, and back-compute what that function is? Can't do that. That would just validate whatever arbitrary assumption I wrote into the simulator initially.

Comment author: 14 October 2009 10:16:33AM 0 points [-]

I guess this depends on your view of the world. I would say that if you simply write a power function then that would indicate an arbitrary assumption to begin with, that has had to simplify a number of significant factors. Writing a power function might be simple, but I am not sure that it would be significant.

For example one view of the world would be at the surface layer, where you see the end result of a combination of small events. This is what I think you are doing with your power function, although I may be misunderstanding. Another view says that you will not worry about the surface layer, and will instead come up with a number of simple rules (some based on probabilities) for the various actions & interactions that can take place. The execution of the rules by the Agents over multiple turns gives the emergent behavior, or what I called the surface layer. If the surface layer emerges that you would expect (guns are better than knives in a war for example), then this indicates the model is hopefully not grossly off. So instead of getting one big function right, you instead have a number of small rules that determine actions and probable outcomes.

You could even play some games with determining probable power functions after running a number of these, by representing them as genetic strings and then doing standard genetic algorithms to see what gives the closest match over all the outcomes for the different scenarios/times. I think this is more powerful than starting with the power function because your assumptions are at a lower level that is easier to get right, not to mention simpler. This is also why I mentioned Epsteins book, its a great example of using simple rules to get emergent behavior.

Comment author: 14 October 2009 01:07:05PM 0 points [-]

If the surface layer emerges that you would expect (guns are better than knives in a war for example), then this indicates the model is hopefully not grossly off.

The surface layer is so abstract that I have almost no expectations.

Comment author: 16 October 2009 09:36:49AM 0 points [-]

I did not state that very well, the surface layer is the aggregate result of all the behaviors/rules. I am guessing that your power function is extracting some attribute(s) of the surface layer.