I'm rather baffled by how I would use speech to paint into a Photoshop window. Force feedback motion already exists for 2D painting -- graphics tablets are standard equipment for artists.
You wouldn't "paint into a Photoshop window". I'd imagine saying e.g. "put a circular animation of growing fern around the center of the pulsating ball" and then tweaking via force feedback some of the parameters of the fern or its growing.
Disclaimer: The views expressed here are speculative. I don't have a claim to expertise in this area. I welcome pushback and anticipate there's a reasonable chance I'll change my mind in light of new considerations.
One of the interesting ways that many 20th century forecasts made of the future went wrong is that they posited huge physical changes in the way life was organized. For instance, they posited huge changes in these dimensions:
At the same time, they underestimated to quite an extent the informational changes in the world:
My LessWrong post on megamistakes discusses these themes somewhat in #1 (the technological wonderland and timing point) and #2 (the exceptional case of computing).
What about predictions within the informational realm? I detect a similar bias. It seems that prognosticators and forecasters tend to give undue weight to heavyweight technologies (such as 3D videoconferencing) and ignore the fact that the bulk of the production and innovation has been focused on text (with a little bit in images to augment and interweave with the text), and, to a somewhat lesser extent, images. In this article, I lay the pro-text position. I don't have high confidence in the views expressed here, and I look forward to critical pushback that changes my mind.
Text: easier to produce
One great thing about text is its lower production costs. To the extent that production is quantitatively little and dominated by a few big players, high-quality video and audio play an important role. But as the Internet "democratizes" content production, it's a lot easier for a lot of people to contribute text than to contribute audio or video content.
Some advantages of text from the creation perspective:
Text: easier to consume and share
Text is also easier to consume and share.
On the flip side, reading text requires you to have your eyes glued to the screen, which reduces your flexibility of movement. But because you can take breaks at your will, it's not a big issue. Audiobooks do offer the advantage that you can move around (e.g., cook in the kitchen) while listening, and some people who work from home are quite fond of audiobooks for that purpose. In general, the benefits of text seem to outweigh the costs.
Text generates more flow-through effects
Holding willingness to pay on the part of consumers the same, text-based content is likely to generate greater flow-through effects because of its ability to foster more discussion and criticism and to be modified and reused for other purposes. This is related to the point that video and audio consumption on the Internet generally tends to substitute for TV and cinema trips, which are largely pure consumption rather than intermediate steps to further production. Text, on the other hand, has a bigger role in work-related stuff.
Augmented text
When I say that text plays a major role, I don't mean that long ASCII strings are the be-all-and-end-all of computing and the Internet. Rather, more creative and innovative ways of interweaving a richer set of expressive and semantically powerful symbols in text is very important to harnessing its full power. It really is a lot different to read The New York Times in HTML than it would be to read the plain text of the article on a monochrome screen. The presence of hyperlinks, share buttons, the occasional image, sidebars with more related content, etc. add a lot of value.
Consider Facebook posts. These are text-based, but they allow text to be augmented in many ways:
Consider the actions that people reading the posts can perform:
If you think about it, this system, although it basically relies on text, has augmented text in a lot of ways with the intent of facilitating more meaningful communication. You may find some of the augmentations of little use to you, but each feature probably has at least a few hundred thousand people who greatly benefit from it. (If nobody uses a feature, Facebook axes it).
I suspect that the world in ten years from now will feature text that is richly augmented relative to how text is now in a similar manner that the text of today is richly augmented compared to what it was back in 2006. Unfortunately, I can't predict any very specific innovations (if I could, I'd be busy programming them, not writing a post on LessWrong). And it might very well be the case that the low-hanging fruit with respect to augmenting text is already taken.
Why didn't all the text augmentation happen at once? None of the augmentations are hard to program in principle. The probable reasons are:
Images
Images play an important role along with text. Indeed, websites such as 9GAG rely on images, and others like Buzzfeed heavily mix texts and images.
I think images will continue to grow in importance on the Internet. But the vision of images as it is likely to unfold is probably quite different from the vision as futurists generally envisage. We're not talking of a future dominated by professionally done (or even amateurly done) 16 megapixel photography. Rather, we're talking of images that are used to convey basic information or make a memetic point. Consider that many of the most widely shared images are the standard images for memes. The number of meme images is much smaller than the number of meme pictures. Meme creators just use a standard image and their own contribution is the text at the top and bottom of the meme. Thus, even while the Internet uses images, the production at the margin largely involves text. The picture is scaffolding. Webcomics (I'm personally most familiar with SMBC and XKCD, but there are other more popular ones) are at the more professional end, but they too illustrate a similar point: it's often the value of the ideas being creatively expressed, rather than the realism of the imagery, that delivers value.
One trend that was big in the early days of the Internet, then died down, and now seems to be reviving is the animated GIF. Animated GIFs allow people to convey simple ideas that cannot be captured in still images, without having to create a video. They also use a lot less bandwidth for consumers and web hosts than videos. Again, we see that the future is about economically using simple representations to convey ideas or memes rather than technologically awesome photography.
Quantitative estimates
Here's what Martin Hilbert wrote in How Much Information is There in the "Information Society" (p. 3):
I had come across this quote as part of a preliminary investigation for MIRI into the world's distribution of computation (though I had not highlighted the quote in the investigation since it was relatively less important to the investigation). As another data point, Facebook claims that it needed 700 TB (as of October 2013) to store all the text-based status updates and comments plus relevant semantic information on users that would be indexed by Facebook Graph Search once it was extended to posts and comments. Contrast this with a few petabytes of storage needed for all their photos (see also here), despite the fact that one photo takes up a lot more space than one text-based update.
Beautiful text
The Internet looks a lot more beautiful today than it did ten years ago. Why? Small, incremental changes in the way that text is displayed have played a role. New fonts, new WordPress themes, a new Wikipedia or Facebook layout, all conspire to provide a combination of greater usability and greater aesthetic appeal. Also, as processors and bandwidth have improved, some layouts that may have been impractical earlier have been made possible. The block tile layout for websites has caught on quite a bit, inspired by an attempt to create a unified smooth browsing experience across a range of different devices (from small iPhone screens to large monitors used by programmers and data analysts).
Notice that it's the versatility of text that allowed it to be upgraded. Videos created an old way would have to be redone in order to avail of new display technologies. But since text is stored as text, it can be rendered in a new font easily.
The wonders of machine learning
I've noticed personally, and some friends have remarked to me, that Google Search, GMail, and Facebook have gotten a lot better in recent years in many small incremental ways despite no big leaps in the overall layout and functioning of the services. Facebook shows more relevant ads, makes better friend suggestions, and has a much more relevant news feed. Google Search is scarily good at autocompletion. GMail search is improving at autocompletion too, and the interface continues to improve. Many of these improvements are the results of continuous incremental improvement, but there's some reason to believe that the more recent changes are driven in part by application of the wonders of machine learning (see here and here for instance).
Futurists tend to think of the benefits of machine learning in terms of qualitatively new technologies, such as image recognition, video recognition, object recognition, audio transcription, etc. And these are likely to happen, eventually. But my intuition is that futurists underestimate the proportion of the value from machine learning that is intermediated through improvement in the existing interfaces that people already use (and that high-productivity people use more than average), such as their Facebook news feed or GMail or Google Search.
A place for video
Video will continue to be good for many purposes. The watching of movies will continue to migrate from TV and the cinema hall to the Internet, and the quantity watched may also increase because people have to spend less in money and time costs. Educational and entertainment videos will continue to be watched in increasing numbers. Note that these effects are largely in terms of substitution of one medium, plus a raw increase in quantity, for another rather than paradigm shifts in the nature of people's activities.
Video chatting, through tools such as Skype or Google Talk/Hangouts, will probably continue to grow. These will serve as important complements to text-based communication. People do want to see their friends' faces from time to time, even if they carry out the bulk of their conversation in text. As Internet speeds improve around the world, the trivial inconveniences in the way of video communication will reduce.
But these will not drive the bulk of people's value-added from having computing devices or being connected to the Internet. And they will in particular be an even smaller fraction of the value-added for the most productive people or the activities with maximum flow-through effects. Simply put, video just doesn't deliver higher information per unit bandwidth and human inconvenience.
Progress in video may be similar to progress in memes and animated GIFs: there may be more use of animation to quickly create videos expressing simple ideas. Animated video hasn't taken off yet. Xtranormal shut down. The RSA Animate style made waves in some circles, but hasn't caught on widely. It may be that the code for simple video creation hasn't yet been cracked. Or it may be that if people are bothering to watch video, they might as well watch something that delivers video's unique benefits, and animated video offers little advantage over text, memes, animated GIFs, and webcomics. This remains to be seen. I've also heard of Vine (a service owned by Twitter for sharing very short videos), and that might be another direction for video growth, but I don't know enough about Vine to comment.
What about 3D video?
High definition video has made good progress in relative terms, as cameras, Internet bandwidth, and computer video playing abilities have improved. It'll be increasingly common to watch high definition videos on one's computer screen or (for those who can afford it) on a large flatscreen TV.
What about 3D video? If full-blown 3D video could magically appear all of a sudden with a low-cost implementation for both creators and consumers, I believe it would be a smashing success. In practice, however, the path to getting there would be more tortuous. And the relevant question is whether intermediate milestones in that direction would be rewarding enough to producers and consumers to make the investments worth it. I doubt that they would, which is why it seems to me that, despite the fact that a lot of 3D video stuff is technically feasible today, it will still probably take several decades (I'm guessing at least 20 years, probably more than 30 years) to become one of the standard methods of producing and consuming content. For it to even begin, it's necessary that improvements in hardware continue apace to the point that initial big investments in 3D video start becoming worthwhile. And then, once started, we need an ever-growing market to incentivize successive investments in improving the price-performance tradeoff (see #4 in my earlier article on supply, demand, and technological progress). Note also that there may be a gap of a few years, perhaps even a decade or more, between 3D video becoming mainstream for big budget productions (such as movies) and 3D video being common for Skype or Google Hangouts or their equivalent in the later era.
Fractional value estimates
I recently asked my Facebook friends for their thoughts on the fraction of the value they derived from the Internet that was attributable to the ability to play and download videos. I received some interesting comments there that helped confirm initial aspects of my hypothesis. I would welcome thoughts from LessWrongers on the question.
Thanks to some of my Facebook friends who commented on the thread and offered their thoughts on parts of this draft via private messaging.