WebR WASM R package load/library benchmarking rabbit hole | R bloggers

- Advertisement -


I have a post coming up on using base and {ggplot2} plots in VanillaJS WebR, but after posting a few bits on social media about how to deal with how slow {ggplot2} is, I have some “demonstration” There was a related inquiry that, to my dismay, was a rabbit hole into which I am, now, dragging you all down as well.

First, a preview of the above plot/graphics:

- Advertisement -

I encourage you to load both before continuing to see why I was curious about package load times.

Getting a package in WebR: a look at {ggplot2}

- Advertisement -

If we remove all cruft, this is the main way to install a package in WebR and make it available in a freshly minted WebR context:

import {webr} from ‘/webr/webr.mjs’; globalThis.webR = new WebR({ WEBR_URL: “/webr/”, SW_URL: “/w/bench/”, }); global wait it out. webR.init(); wait globalThis.webR.installPackages([‘PACKAGE’]) wait globalThis.webR.evalRVoid(‘library(package)’)

- Advertisement -

Let’s see what happens in the browser during the installPackages() call when the package is ggplot2 :

Screen capture of DevTools showing ggplot2 dependent packages being loaded.

Dependent libraries are loaded sequentially until we finally get to ggplot2 (preceded {} from now on). There are 28 packages for ggplot2 (including myself) and they have a really skewed package size distribution:

Min. : 6K 1st Qu.: 108K Median: 481K Mean: 950K 3rd Qu.: 1.2M Max. : 5.4M

The good thing, though, is that the browser will cache them (for a while) so that they don’t have to be downloaded again every time you need them. Because of this, we’re going to ignore download times from consideration because they’re all, as we’ll see below, yanked into the single-digit milliseconds of the form cache.

The R code gets executed when you call the library (package), and that takes time. On modern desktops with a local R install, you almost never notice the passage of time for this. This is not the case with WebR:

Screen capture of the ggplot2 package loading portion of the Developer Tools waterfall chart.

The matrix, mgcv, and farvar packages kick things off. You realized that if you hit the example at the beginning of the post. Cruel. painful. Horrible.

This got me curious about all the other packages that are available for WebR (93 as of the date of this post).

Closer to R package load/library benchmarking in a browser

Like the skewed package file size distribution of currently available R WASM packages, the per-package dependency distribution is also highly skewed:

Minimum : 1 First part : 1 Median : 1 Mean : 2 Third part : 2 Maximum : 15

This is good! This means you are mostly safe to have fun with WebR and not have to focus on working around the initial slowdown. Still, that didn’t deter me from drowning in time.

I had to figure out a way to individually test each WASM R installation/library packaged independently in a new WebR context.

One obvious way is to create 93 HTML files and load them all by hand.

Oh Oh

There had to be a better way, and I immediately turned to “iframes” as a solution.

While I could script the proper creation of HTML 93 iframes to be inserted into a page, this is not a good idea for several reasons:

This will crash every modern browser: far too many child iframes, all with their own DOM contexts looks awful 93 “simultaneous” WebR initialization will consume all browser resources and DoS when tabs are loaded “simultaneously” will skew timing results, even when package files are cached

The solution was to use dynamically created iframes. A possible “caught” for this could have been the modern browser security model. Thanks to some dangerous hardware-level vulnerabilities that were discovered and exploited a few years ago, Chrome and other browsers sidestep the security contracts between iframes and native pages. Not doing so could have allowed the attackers to have some fun at your expense.

If you’ve been following along for the past week, to get the best performance with WebR, you need to make sure that certain HTTP headers are present so that the browser can trust what you’re doing to mitigate some of the restrictions. are doing. Dynamically created iframes don’t have a “header”, per-se, but the clever folks who make browser bits for a living have come up with a way to handle this. We just need to mark the frame as credentialless and we will get good performance (Please read the link to get more context).

So, we can run a slightly extended version of the JavaScript code (way) above to get the timer stats, but how do we collect them?

Well, the parent of the iframe can talk to the iframe and vice versa via postMessage() so all we need to do is make sure the iframe sends the data back to the parent when it is done . It’s also a hint that we can kill the child iframe, freeing up resources, and then move on to the next one.

an unexpected turn

It turns out that some WASM-ified R packages are busted. especially:

fs Hmisc latticeExtra pkgLoad

Some of the functions in each of them are required by one or more other packages, but – as you’ll see if you run the benchmark site – they fail to library() after installation.

It was a “caught”, all I had to do was wrap a try/catch block, and return information from it as well.

Put it all together

You can run your own benchmarks on this playground page. View-source on the page to see the code (there’s only index.html and style.css). You can also find it on the WebR Experiments repo.

When the page loads, it receives the final produced copy of https://rud.is/data/webr-packages.json . This is a JSON file I’m generating every night that contains all the packages available in “WASM notCRAN”. It just steals the PACKAGES.rds every day and serializes it to JSON. Feel free to use it (if you get a CORS error I know; you shouldn’t but it’s an odd year).

The very first thing that is likely to catch your eye is: “ Context is cross-origin isolated!”. When I was doing some initial debugging of WebR performance issues, George (the godfather of WebR) noted that we needed some headers to loosen the above security restrictions a bit . You can test the global crossOriginIsolated variable to see if you’ve set the headers correctly and read more about it when you have time. While it’s not needed on that page, I left it Gave me so that I could write this article.

You will see the “Download Result?” checkbox that is un-checked by default. If checked, you’ll get a JSON file with all the results in a dynamically created table.

After you tap “Start Benchmark”, you can get a matcha and come back.

You’ll see the results in a table and a surprise observable plot histogram (post’s featured image).

I disable controls after run because you really should close the tab and start a new one (not just a reload) to get a clean context.

If you use the site and download the JSON, you can hit this Observable Notebook and fork the JSON into it. I also wouldn’t mind if you could post your JSON as an issue in the WebR Experiments repo and include the browser and system configuration you’re using at the time.

Wing

It was a fun distraction, and shows that you can use most of the currently available WebR packages without worry.

Be sure to check back for those WebR graphics posts!

Connected



Source link

- Advertisement -

Recent Articles

Related Stories