WebBundles and URL Randomization

jefftk

Disclosure: I work on ads at Google; this is a personal post.

Peter Snyder of Brave recently wrote a post on the proposed WebBundle specification arguing that it makes URL randomization easier and so is harmful to content blocking. The core of the post is incorrect, however, and based on a misunderstanding of what can be done today and what WebBundles make easier. Snyder and I discussed this some in on HN and I wanted to take a step back and try to write up the issues more clearly.

A WebBundle (see explainer) allows you to serve a single file that the browser can treat as multiple files. This solves many long-standing problems, some of which I've been working around for a long time. A classic one is that many sites will have a large number of small files. Everyone who visits the site will need essentially all of these files, and you lose a lot of performance in individually requesting each one. When I worked on open source web page optimization software (mod_pagespeed) it could combine CSS, JS, and images so you could request a single 'bundle' for each, but it was not ideal:

Instead of one bundle containing the resources the site needs, you have at least three: CSS, JS, images.
There are differences in error handling and recovery that mean you can't just concatenate CSS or JS files. Even with code to work around these issues there were still differences that meant automatic bundling was a bit risky.
Image combining (spriting) Is especially difficult, and our tool was less of a fully automatic image combiner than a tool to take the toil out of doing it manually.

Instead of using a dynamic tool like mod_pagespeed, you can also do this kind of concatenation at build time (webpack, browserfy, parcel, rollup, etc) but this doesn't fix any of these problems.

A WebBundle is a much more straight-forward approach, where you tell the browser explicitly about your bundling instead of trying to glue a bunch of things together and pass them off to the browser as one.

I'm no longer working on mod_pagespeed, but these issues still come up; in my work now I'm interested in using WebBundles to allow a single ad response to provide multiple ads.

My understanding of Snyder's view, from their post, HN comments, and spec comments, is that bundles make it much easier to randomize URLs, and bypass URL-based content blockers. Specifically, that if you deliver a single bundle with both HTML and resources, it becomes very easy for you to randomize URLs.

This claim is based, however, on a misunderstanding of what bundling makes easier. Sites that deliver both the content and the ads, such as Facebook, often already use this kind of randomization to make things harder for ad blockers. Since they're delivering the HTML, they can randomize all their URLs on every deploy, and it's hard for blockers to keep up. If blockers did catch up, they could use cookies to randomize their URLs on a per-user basis, or even encrypt them before sending to the client. All of these approaches work much better for them than sending a bundle with the HTML and JS in a single file, because that would mean the JS could not be cached on the client. (If they were willing to give up on caching they could already use HTTP/2's server push or plain inlining to send the JS and HTML at once.)

On the other hand, sites like this are few. It is worth it for Facebook or Google Search to run their own ads, but most sites instead use an ad network. These networks typically integrate client-side, with JS. For example, to put AdSense ads on your page you include something like:

<script async
   src="https://pagead2.googlesyndication.com/
        pagead/js/adsbygoogle.js"></script>
<ins class="adsbygoogle"
     style="display:block"
     data-ad-client="ca-pub-NNNNNNNN"
     data-ad-slot="NNNNNNNN"
     data-ad-format="auto"
     data-full-width-responsive="true">
</ins>

This loads adsbygoogle.js (the ad JS), which reads configuration from the <ins> and handles requesting and rendering the ads. Ad blockers recognize adsbygoogle.js by its URL and prevent it from loading and running.

Neither the publisher site nor the ad network can randomize the URL of the ad JS on their own: it's a cross-party integration point. If they worked closely together, perhaps by moving integration from client-side to server-side, then they could randomize URLs easily even with today's technology. A server-side integration is much more technically difficult, however, which is why I think we rarely see them. Bundles don't change the nature of this situation: a client-side integration still needs a consistent (and, hence, blockable) URL to load the ad JS, while a server-side integration doesn't become any easier.

Snyder is concerned about a world in which ad blockers aren't able to operate because they can't recognize ad resources by URL. While I think this is a reasonable concern, the WebBundle proposal is orthogonal to this problem, and does not bring us any closer to that sort of world.

Comment via: facebook

[-]Dagon4y40

Interesting - bundles seem useful for simpler handling of offline pages, and perhaps for transacted versioning of resources (though there are other ways). How much actual performance difference is there, under HTTP/2 + QUIC? The overhead of multiple requests seems pretty small, compared to the implicit serialization of bundling.

Are there other features which might get some of the same benefit? CDN aliasing would allow many distinct URLs for the same content, with decent edge caching (and invalidation control!), and most of the perf benefits of bundles and all of the anti-prediction benefits of randomization.

LESSWRONG
is fundraising!
LW
$

10

WebBundles and URL Randomization

10

10