JS-SIMD for asm.js in Firefox Nightly

Firefox Nightly now has an initial optimized implementation of the new JS-SIMD API, available to asm.js code, a major step in bringing the performance of SIMD parallelism to the web. You can download it and try out our demos right now.

In the current Firefox Nightly releases, the JS-SIMD API is currently only supported in asm.js code. This is the first step, and in the future we plan to support the JS-SIMD API in non-asm.js JavaScript in Firefox also, and we’re optimistic that it will achieve a similar level of performance.

Try out the Demos

To get started, head over to our IDF 2014 SIMD presentation page.

With Firefox Nightly, you can try out the asm.js demos. Be sure to check out the initial performance, and then click the button to enable SIMD.

Also being demoed on this page is a branch of Chromium containing an implementation of the SIMD API. While the Firefox Nightly being demoed here requires the use of asm.js for the SIMD demos, this Chromium build supports the SIMD demos without this limitation.

Moving forward, we’ll be working on filling out Firefox Nightly’s SIMD implementation, including implementing more of the API and supporting non-asm.js mode. And we’re also looking forward to the porting of the SIMD patches from the branch into upstream Chromium.

Standards

The JS-SIMD API is under development. The current working specification is John McCutchan ecmascript_simd repository. This repository also contains a polyfill implementation, which implements the SIMD API on browsers which do not support it natively (though of course it is unoptimized). The polyfill API is fully usable today, for anyone looking to start prototyping with it.

Intel, Google, and Mozilla are working together to propose this SIMD API for standardization. It is likely that there will be some changes in the API during this process, though the core concepts and the basic components are likely to remain close to their current form.

Take the helm

While it’s still quite early, there is already enough support in Firefox Nightly in place to support simple JS-SIMD asm.js kernels. If you’re familiar with JS programming and with basic SIMD concepts, take a look at the code in the Mandelbrot Demo. The following is a quick explanation of the major ingredients used in that demo for doing SIMD in asm.js:

 var toF = global.Math.fround;
 var i4 = global.SIMD.int32x4;
 var f4 = global.SIMD.float32x4;
 var i4add = i4.add;
 var i4and = i4.and;
 var f4add = f4.add;
 var f4sub = f4.sub;
 var f4mul = f4.mul;
 var f4lessThanOrEqual = f4.lessThanOrEqual;
 var f4splat = f4.splat;
 var imul = global.Math.imul;

In order to simplify the actual code so that we don’t have to write out the full names everywhere (and so that the compiler doesn’t have to re-interpret them everywhere), we create convenient names for the API functions that we’ll be using. This is required for asm.js mode, and it’s handy to do besides to keep the main code uncluttered.

 const one4 = i4(1,1,1,1), two4 = f4(2,2,2,2), four4 = f4(4,4,4,4);

This syntax declares constants that can then be referred to by name. Note that this is using the shortened names for the SIMD constructors that we declared above.

 for (i = 0; (i | 0) < (max_iterations | 0); i = (i + 1) | 0) {
   z_re24 = f4mul(z_re4, z_re4);
   z_im24 = f4mul(z_im4, z_im4);

   mi4 = f4lessThanOrEqual(f4add(z_re24, z_im24), four4);
   // If all 4 values are greater than 4.0, there's no reason to continue.
   if ((mi4.signMask | 0) == 0x00)
     break;

   new_re4 = f4sub(z_re24, z_im24);
   new_im4 = f4mul(f4mul(two4, z_re4), z_im4);
   z_re4   = f4add(c_re4, new_re4);
   z_im4   = f4add(c_im4, new_im4);
   count4  = i4add(count4, i4and(mi4, one4));
 }

Here we can see see the main API functions we imported above in action. If you’re familiar with SIMD programming, this code should be fairly self-explanatory.

One thing that may not be clear is the use of the signMask property. This function returns an integer value containing the sign bits of each of the elements of its float32x4 operand, and is a convenient way to test whether a SIMD comparison is true in all lanes at once.

(For the curious, signMask closely resembles the movmskps instruction on x86, and for this reason, it is one of the more likely things to evolve as the API is standardized, to make it easier to implement on other platforms.)

Emscripten Support

Work is also under way to add support to Emscripten for producing JS output that uses the JS-SIMD API. This will allow it to implement popular SIMD mechanisms used in C and C++, including a subset of the widely-used API, Emscripten's own SIMD API, and auto-vectorization, and connect then with the optimized SIMD implementations that are being built into browsers.

Why a fixed-width explicit SIMD API?

In a world of AVX on one hand, and 64-bit SIMD units on another, and let’s not forget about GPUs, why is JS doing a fixed-width 128-bit explicit SIMD API?

Summary

In summary, JS-SIMD is starting to arrive. You can try it out today!