(my reply wound up over 8kb long, but I don't think it's general enough to turn into a discussion article.)
Reading this and its comments immediately made me think of the current status of braille, where technology has completely failed to keep up with the mainstream, and now many people are claiming that braille is outdated and they'll just use text-to-speech for everything. (Disclaimer: I was taught braille starting from kindergarten, and picked it up fast and thoroughly. A lot of anti-braille people appear to have had a very hard time learning it and can't actually read any quicker than I could read large print when my vision was at its best. So I have to acknowledge some kinda privilege when talking about the subject. I'll also acknowledge that mastery of braille and financial/academic/etc success are positively correlated among blind Americans, according to all the not-incredibly-transparent sources I've found.)
Some of the points made here about text in general apply to braille, some are just the opposite, and some depend entirely on the audio/video/tactual affinities of the specific user. For example:
•You can't play background music while having a video conversation or recording audio or video content.
Is one of the arguments I use in favor of braille whenever the subject comes up, and it's easily extended to consumption: noise and reading, noise and writing, privacy, the need/lack for headphones, all the different environments in which one can work, etc.
This, though:
•Low storage and bandwidth costs make it easy to consume over poor Internet connections and on a range of devices.
Is only technically true for braille, since braille technology is so far behind that devices are almost always bulky, expensive, cumbersome, in addition to the mainstream device to which they connect, and only the most expensive and bulkiest models display more than 40 characters at a time (so like half of one print line... and most people will get an even smaller model, because even 40 characters is bulky, expensive, and generally more than trivially inconvenient to use on anything but a dedicated device like the PACMate Omni). The base format, digitally, is the same and can be transmited and stored easily, but everyone who can hear but not see is going to convert it to speech anyway.
•Text can be read at the user's own pace. People who are slow at grasping the content can take time. People who are fast can read very quickly.
Applies to text-to-speech as well; most TTS has adjustable speaking rates, and it's possible to go back and reread things (however, checking the spelling of words is a trivial inconvenience which almost no one that relies on a TTS ever uses. Ever seen me misspell something? That'll be why. On the other hand, what types of errors count as obvious visually vs obvious audibly differ, so blind and sighted text speak tend to differ simply for readability's sake.).
•Text is easier to search (this refers both to searching within a given piece of text and to locating a text based on some part of it or some attributes of it).
Even people who don't care for braille in general will agree that it helps loads with math and programming for exactly this reason. If hooking a braille display to a laptop were not so bloody inconvenient (and did not require so much desk-space), I'd have one connected pretty much all the time for this alone.
•You can't play background music while consuming audio-based content, but you can do it while consuming text.
People usually reply to this with "Use the Windows volume mixer." (I disagree with said reply under most conditions. If I have to have music quieter than a screen reader, then a lot of the impact is reduced. And screenreader + conversation is just plain impractical.)
On the flip side, reading text requires you to have your eyes glued to the screen, which reduces your flexibility of movement. But because you can take breaks at your will, it's not a big issue.
Depending on the device, the opposite can be true for braille; a small display, or something smaller than a novel (braille novels are enormous) can be read while walking without compromising one's awareness of one's surroundings (especially if one can read one-handed). Audio is dependant on the device as well, however; walking around with bulky headphones on is a terrible idea (compare texting while driving), but external speakers while going about other business in the same room is fine.
The presence of hyperlinks, share buttons, the occasional image, sidebars with more related content, etc. add a lot of value.
Braille does not do formatting well, but neither does audio, and I've never had access to a braille device that can actually perform the equivalent to clicking or tapping a hyperlink. This is an improvement I thought of the first time I actually had a braille display for more than 5 minutes: every braille display I've ever seen includes cursor-routing keys, which are basically buttons above each cell that will move the cursor to that position when clicked. The obvious thing to do is to double-click one of those to simulate a mouse-click at that spot, yet I've never heard of this being implemented.
There's also such a thing as 8-dot braille, which is typically used for unicode characters, to indicate capitalization, or to indicate the position of the cursor or highlighted text. Even most braille-using techies don't learn 8-dot unicode (and from what I can tell, that isn't even standardized, so it'd only matter for the specific hardware/software combination that one studied with), so it's a little disappointing that using the two extra dots for formatting or HTML effects hasn't really caught on.
(As an example of how braille and screen readers handle HTML elements, we have links: a screen reader reads Lesswrong.com as "link Lesswrong dot com", and on a braille display, it shows up as "lnk Lesswrong.com". I consider the latter to be more problematic, in that it costs 4 whole cells, which is anywhere from 5% to 33% of the display!)
•One can tag friends and Facebook groups and pages, subject to some restrictions. For friends tagged, the anchor text can be shortened to any one word in their name.
Side complaint: Facebook accessibility is mixed, and blind people tend to use the mobile site, where in-line friend-tagging is not possible. (Yes, the main Facebook page is bad enough that this is more than a reasonable tradeoff.)
•Training users: The augmented text features need a loyal userbase that supports and implements them. So each augmentation needs to be introduced gradually in order to give users onboarding time.
This is so obviously applicable to anything accessibility-related that I momentarily considered not including it here.
•Performance in terms of speed and reliability: Each augmentation adds an extra layer of code, reducing the performance in terms of speed and reliability. As computers and software have gotten faster and more powerful, and the Internet companies' revenue has increased (giving them more leeway to spend more for server space), investments in these have become more worthwhile.
Referring back to m.facebook.com Vs facebook.com: it's very hard for accessibility technology, an extremely tiny market with little funding and lots of coordination problems due to size, to keep pace with all these augmentations. The more powerful stuff on Facebook.com gives me lag that ends in me queerying my brain for incidents in the early 2000s to try and find something comparable.
For another example: Lesswrong is usually pretty responsive to screen readers, but if a post has a large number of comments (I've noticed that 80 or more tends to be a good predicter), there might be enough lag in reading or loading to be inconvenient, and there is a particular feature that is actually annoying: occasionally, while reading comments, I'll be notified of a comment's percent positive karma, at which point the screen reader takes a whole second to get back to reading, adds more spoken formatting information ("clickable", mostly; bold/italics/font size are almost never spoken, but screen readers are getting better about those), and once this happens once, it will almost definitely repeat if I keep scrolling. (My solution so far has been to switch to "just read everything from the cursor down" if this happens. How more or less convenient this method is depends on the screen reader. And I'm using the free one, because I'd rather not incentivize charging $800 for a screen reader.)
However, when I've tried using a braille display and text-to-speech simultaneously, I've found that, frequently, a page that will take several seconds to get a response from TTS will start displaying braille much more quickly. Considering that the screen reader is managing both, this is a little bizarre; it'd imply that the lag is in the TTS program, rather than the screen reader itself, yet different screen readers seem to render speech faster or slower on the same websites.
Thanks for commenting! This is an insightful perspective.
Disclaimer: The views expressed here are speculative. I don't have a claim to expertise in this area. I welcome pushback and anticipate there's a reasonable chance I'll change my mind in light of new considerations.
One of the interesting ways that many 20th century forecasts made of the future went wrong is that they posited huge physical changes in the way life was organized. For instance, they posited huge changes in these dimensions:
At the same time, they underestimated to quite an extent the informational changes in the world:
My LessWrong post on megamistakes discusses these themes somewhat in #1 (the technological wonderland and timing point) and #2 (the exceptional case of computing).
What about predictions within the informational realm? I detect a similar bias. It seems that prognosticators and forecasters tend to give undue weight to heavyweight technologies (such as 3D videoconferencing) and ignore the fact that the bulk of the production and innovation has been focused on text (with a little bit in images to augment and interweave with the text), and, to a somewhat lesser extent, images. In this article, I lay the pro-text position. I don't have high confidence in the views expressed here, and I look forward to critical pushback that changes my mind.
Text: easier to produce
One great thing about text is its lower production costs. To the extent that production is quantitatively little and dominated by a few big players, high-quality video and audio play an important role. But as the Internet "democratizes" content production, it's a lot easier for a lot of people to contribute text than to contribute audio or video content.
Some advantages of text from the creation perspective:
Text: easier to consume and share
Text is also easier to consume and share.
On the flip side, reading text requires you to have your eyes glued to the screen, which reduces your flexibility of movement. But because you can take breaks at your will, it's not a big issue. Audiobooks do offer the advantage that you can move around (e.g., cook in the kitchen) while listening, and some people who work from home are quite fond of audiobooks for that purpose. In general, the benefits of text seem to outweigh the costs.
Text generates more flow-through effects
Holding willingness to pay on the part of consumers the same, text-based content is likely to generate greater flow-through effects because of its ability to foster more discussion and criticism and to be modified and reused for other purposes. This is related to the point that video and audio consumption on the Internet generally tends to substitute for TV and cinema trips, which are largely pure consumption rather than intermediate steps to further production. Text, on the other hand, has a bigger role in work-related stuff.
Augmented text
When I say that text plays a major role, I don't mean that long ASCII strings are the be-all-and-end-all of computing and the Internet. Rather, more creative and innovative ways of interweaving a richer set of expressive and semantically powerful symbols in text is very important to harnessing its full power. It really is a lot different to read The New York Times in HTML than it would be to read the plain text of the article on a monochrome screen. The presence of hyperlinks, share buttons, the occasional image, sidebars with more related content, etc. add a lot of value.
Consider Facebook posts. These are text-based, but they allow text to be augmented in many ways:
Consider the actions that people reading the posts can perform:
If you think about it, this system, although it basically relies on text, has augmented text in a lot of ways with the intent of facilitating more meaningful communication. You may find some of the augmentations of little use to you, but each feature probably has at least a few hundred thousand people who greatly benefit from it. (If nobody uses a feature, Facebook axes it).
I suspect that the world in ten years from now will feature text that is richly augmented relative to how text is now in a similar manner that the text of today is richly augmented compared to what it was back in 2006. Unfortunately, I can't predict any very specific innovations (if I could, I'd be busy programming them, not writing a post on LessWrong). And it might very well be the case that the low-hanging fruit with respect to augmenting text is already taken.
Why didn't all the text augmentation happen at once? None of the augmentations are hard to program in principle. The probable reasons are:
Images
Images play an important role along with text. Indeed, websites such as 9GAG rely on images, and others like Buzzfeed heavily mix texts and images.
I think images will continue to grow in importance on the Internet. But the vision of images as it is likely to unfold is probably quite different from the vision as futurists generally envisage. We're not talking of a future dominated by professionally done (or even amateurly done) 16 megapixel photography. Rather, we're talking of images that are used to convey basic information or make a memetic point. Consider that many of the most widely shared images are the standard images for memes. The number of meme images is much smaller than the number of meme pictures. Meme creators just use a standard image and their own contribution is the text at the top and bottom of the meme. Thus, even while the Internet uses images, the production at the margin largely involves text. The picture is scaffolding. Webcomics (I'm personally most familiar with SMBC and XKCD, but there are other more popular ones) are at the more professional end, but they too illustrate a similar point: it's often the value of the ideas being creatively expressed, rather than the realism of the imagery, that delivers value.
One trend that was big in the early days of the Internet, then died down, and now seems to be reviving is the animated GIF. Animated GIFs allow people to convey simple ideas that cannot be captured in still images, without having to create a video. They also use a lot less bandwidth for consumers and web hosts than videos. Again, we see that the future is about economically using simple representations to convey ideas or memes rather than technologically awesome photography.
Quantitative estimates
Here's what Martin Hilbert wrote in How Much Information is There in the "Information Society" (p. 3):
I had come across this quote as part of a preliminary investigation for MIRI into the world's distribution of computation (though I had not highlighted the quote in the investigation since it was relatively less important to the investigation). As another data point, Facebook claims that it needed 700 TB (as of October 2013) to store all the text-based status updates and comments plus relevant semantic information on users that would be indexed by Facebook Graph Search once it was extended to posts and comments. Contrast this with a few petabytes of storage needed for all their photos (see also here), despite the fact that one photo takes up a lot more space than one text-based update.
Beautiful text
The Internet looks a lot more beautiful today than it did ten years ago. Why? Small, incremental changes in the way that text is displayed have played a role. New fonts, new WordPress themes, a new Wikipedia or Facebook layout, all conspire to provide a combination of greater usability and greater aesthetic appeal. Also, as processors and bandwidth have improved, some layouts that may have been impractical earlier have been made possible. The block tile layout for websites has caught on quite a bit, inspired by an attempt to create a unified smooth browsing experience across a range of different devices (from small iPhone screens to large monitors used by programmers and data analysts).
Notice that it's the versatility of text that allowed it to be upgraded. Videos created an old way would have to be redone in order to avail of new display technologies. But since text is stored as text, it can be rendered in a new font easily.
The wonders of machine learning
I've noticed personally, and some friends have remarked to me, that Google Search, GMail, and Facebook have gotten a lot better in recent years in many small incremental ways despite no big leaps in the overall layout and functioning of the services. Facebook shows more relevant ads, makes better friend suggestions, and has a much more relevant news feed. Google Search is scarily good at autocompletion. GMail search is improving at autocompletion too, and the interface continues to improve. Many of these improvements are the results of continuous incremental improvement, but there's some reason to believe that the more recent changes are driven in part by application of the wonders of machine learning (see here and here for instance).
Futurists tend to think of the benefits of machine learning in terms of qualitatively new technologies, such as image recognition, video recognition, object recognition, audio transcription, etc. And these are likely to happen, eventually. But my intuition is that futurists underestimate the proportion of the value from machine learning that is intermediated through improvement in the existing interfaces that people already use (and that high-productivity people use more than average), such as their Facebook news feed or GMail or Google Search.
A place for video
Video will continue to be good for many purposes. The watching of movies will continue to migrate from TV and the cinema hall to the Internet, and the quantity watched may also increase because people have to spend less in money and time costs. Educational and entertainment videos will continue to be watched in increasing numbers. Note that these effects are largely in terms of substitution of one medium, plus a raw increase in quantity, for another rather than paradigm shifts in the nature of people's activities.
Video chatting, through tools such as Skype or Google Talk/Hangouts, will probably continue to grow. These will serve as important complements to text-based communication. People do want to see their friends' faces from time to time, even if they carry out the bulk of their conversation in text. As Internet speeds improve around the world, the trivial inconveniences in the way of video communication will reduce.
But these will not drive the bulk of people's value-added from having computing devices or being connected to the Internet. And they will in particular be an even smaller fraction of the value-added for the most productive people or the activities with maximum flow-through effects. Simply put, video just doesn't deliver higher information per unit bandwidth and human inconvenience.
Progress in video may be similar to progress in memes and animated GIFs: there may be more use of animation to quickly create videos expressing simple ideas. Animated video hasn't taken off yet. Xtranormal shut down. The RSA Animate style made waves in some circles, but hasn't caught on widely. It may be that the code for simple video creation hasn't yet been cracked. Or it may be that if people are bothering to watch video, they might as well watch something that delivers video's unique benefits, and animated video offers little advantage over text, memes, animated GIFs, and webcomics. This remains to be seen. I've also heard of Vine (a service owned by Twitter for sharing very short videos), and that might be another direction for video growth, but I don't know enough about Vine to comment.
What about 3D video?
High definition video has made good progress in relative terms, as cameras, Internet bandwidth, and computer video playing abilities have improved. It'll be increasingly common to watch high definition videos on one's computer screen or (for those who can afford it) on a large flatscreen TV.
What about 3D video? If full-blown 3D video could magically appear all of a sudden with a low-cost implementation for both creators and consumers, I believe it would be a smashing success. In practice, however, the path to getting there would be more tortuous. And the relevant question is whether intermediate milestones in that direction would be rewarding enough to producers and consumers to make the investments worth it. I doubt that they would, which is why it seems to me that, despite the fact that a lot of 3D video stuff is technically feasible today, it will still probably take several decades (I'm guessing at least 20 years, probably more than 30 years) to become one of the standard methods of producing and consuming content. For it to even begin, it's necessary that improvements in hardware continue apace to the point that initial big investments in 3D video start becoming worthwhile. And then, once started, we need an ever-growing market to incentivize successive investments in improving the price-performance tradeoff (see #4 in my earlier article on supply, demand, and technological progress). Note also that there may be a gap of a few years, perhaps even a decade or more, between 3D video becoming mainstream for big budget productions (such as movies) and 3D video being common for Skype or Google Hangouts or their equivalent in the later era.
Fractional value estimates
I recently asked my Facebook friends for their thoughts on the fraction of the value they derived from the Internet that was attributable to the ability to play and download videos. I received some interesting comments there that helped confirm initial aspects of my hypothesis. I would welcome thoughts from LessWrongers on the question.
Thanks to some of my Facebook friends who commented on the thread and offered their thoughts on parts of this draft via private messaging.