## Comparing Sequences Without Sorting

This is a post about a silly (and mostly pointless) optimization.  To motivate this, consider the following problem which you see all the time in mesh processing:

Given a collection of triangles represented as ordered tuples of indices, find all topological duplicates.

By a topological duplicate, we mean that the two faces have the same vertices.  Now you aren’t allowed to mutate the faces themselves (since orientation matters), but changing their order in the array is ok.  We could also consider a related problem for edges in a graph, or for tetrahedra in mesh.

There are of course many easy ways to solve this, but in the end it basically boils down to finding some function or other for comparing faces up to permutation, and then using either sorting or hashing to check for duplicates.  It is that comparison step that I want to talk about today, or in other words:

How do we check if two faces are the same?

In fact, what I really want to talk about is the following more general question where we let the lengths of our faces become arbitrarily large:

Given a pair of arrays, A and B, find a total order $\leq$ such $A \leq B \: \mathrm{and} B \leq A$ if and only if $A = \pi(B)$ up to some permutation $\pi$.

For example, we should require that:

[0, 1, 2] $\leq$ [2, 0, 1]

[2, 0, 1] $\leq$ [0, 1, 2]

And for two sequences which are not equal, like [0,1,2] and [3, 4, 5] we want the ordering to be invariant under permutation.

## Obvious Solution

An obvious solution to this problem is to sort the arrays element-wise and return the result:

function compareSimple(a, b) {
if(a.length !== b.length) {
return a.length - b.length;
}
var as = a.slice(0)
, bs = b.slice(0);
as.sort();
bs.sort();
for(var i=0; i<a.length; ++i) {
var d = as[i] - bs[i];
if(d) {
return d;
}
}
return 0;
}

This follows the standard C/JavaScript convention for comparison operators, where it returns -1 if a comes before b, +1 if a comes after b and 0 if they are equal.  For large enough sequences, this isn’t a bad solution.  It takes $O(n \log(n))$ time and is pretty straightforward to implement.  However, for smaller sequences it is somewhat suboptimal.  Most notably, it makes a copy of a and b, which introduces some overhead into the computation and in JavaScript may even trigger a garbage collection event (bad news).  If we need to do this on a large mesh, it could slow things down a lot.

An obvious way to fix this would be to try inlining the sorting function for small values of n (which is all we really care about), but doing this yourself is pure punishment.  Here is an optimized version for length 2 sets:

function compare2Brutal(a, b) {
if(a[0] < a[1]) {
if(b[0] < b[1]) {
if(a[0] === b[0]) {
return a[1] - b[1];
}
return a[0] - b[0];
} else {
if(a[0] === b[1]) {
return a[1] - b[0];
}
return a[0] - b[1];
}
} else {
if(b[0] < b[1]) {
if(a[1] === b[0]) {
return a[0] - b[1];
}
return a[1] - b[0];
} else {
if(a[1] === b[1]) {
return a[0] - b[0];
}
return a[1] - b[1];
}
}
}

If you have any aesthetic sensibility at all, then that code probably makes you want to vomit.   And that’s just for length two arrays!  You really don’t want to see what the version for length 3 arrays looks like.  But it is faster by a wide margin.  I ran a benchmark to try comparing how fast these two approaches were at sorting an array of 1000 randomly generated tuples, and here are the results:

• compareSimple:  5864 µs
• compareBrutal: 404 µs

That is a pretty good speed up, but it would be nice if there was a prettier way to do this.

Symmetry

The starting point for our next approach is the following screwy idea:  what if we could find a hash function for each face that was invariant under permutations?  Or even better, if the function was injective up to permutations, then we could just use the symmetrized hash compare two sets.  At first this may seem like a tall order, but if you know a little algebra then there is an obvious way to do this; namely you can use the (elementary) symmetric polynomials.  Here is how they are defined:

Given a collection of $n$ numbers, $a_0, a_1, ..., a_{n-1}$ the kth elementary symmetric polynomial, $S_{n,k}$ is the coefficient of $x^{n-k-1}$ in the polynomial:

$(x + a_0) (x + a_1) ... (x + a_{n-1}) = S_{n,n-1} + S_{n,n-2} x + ... + S_{n,0} x^{n-1} + x^n$

For example, if $n = 2$, then the symmetric polynomials are just:

$S_{2,0} = a_0 + a_1$

$S_{2,1} = a_0 a_1$

And for $n = 3$ we get:

$S_{3,0} = a_0 + a_1 + a_2$

$S_{3,1} = a_0 a_1 + a_0 a_2 + a_1 a_2$

$S_{3,2} = a_0 a_1 a_2$

The neat thing about these functions is that they contain enough data to uniquely determine $a_0, a_1, ..., a_{n-1}$ up to permutation.  This is a consequence of the fundamental theorem for elementary symmetric polynomials, which basically says that these $S_{n,k}$ polynomials form a complete independent basis for the ring of all symmetric polynomials.  Using this trick, we can formulate a simplified version of the sequence comparison for $n=2$:

function compare2Sym(a, b) {
var d = a[0] + a[1] - b[0] - b[1];
if(d) {
return d;
}
return a[0] * a[1] - b[0] * b[1];
}

Not only is this way shorter than the brutal method, but it also turns out to be a little bit faster.  On the same test, I got:

• compare2Sym: 336 µs

Which is about a 25% improvement over brute force inlining.  The same idea extends to higher n, for example here is the result for $n=3$:

function compare3Sym(a, b) {
var d = a[0] + a[1] + a[2] - b[0] - b[1] - b[2];
if(d) {
return d;
}
d = a[0] * a[1] + a[0] * a[2] + a[1] * a[2] - b[0] * b[1] - b[0] * b[2] - b[1] * b[2];
if(d) {
return d;
}
return a[0] * a[1] * a[2] - b[0] * b[1] * b[2];
}

Running the sort-1000-items test against the simple method gives the following results:

• compareSimple: 7637 µs
• compare3Sym:  352 µs

### Computing Symmetric Polynomials

This is all well and good, and it avoids making a copy of the arrays like we used in the basic sorting method.  However, it is also not very efficient.  If one were to compute the coefficients of a symmetric polynomial directly using the naive method we just wrote, then you would quickly end up with $O(2^n)$ terms!  That is because the number of terms in $\# S_{n,k} = { n \choose k+1 }$, and so the binomial theorem tells us that:

$\#S_{n,0} + \#S_{n,1} + \#S_{n,2} + ... + \#S_{n,n-1} = 2^n - 1$

A slightly better way to compute them is to use the polynomial formula and apply the FOIL method.  That is, we just expand the symmetric polynomials using multiplication.  This dynamic programming algorithm speeds up the time complexity to $O(n^2)$.  For example, here is an optimized version of the $n=3$ case:

function compare3SymOpt(a,b) {
var l0 = a[0] + a[1]
, m0 = b[0] + b[1]
, d  = l0 + a[2] - m0 - b[2];
if(d) {
return d;
}
var l1 = a[0] * a[1]
, m1 = b[0] * b[1];
d = l1 * a[2] - m1 * b[2];
if(d) {
return d;
}
return l0 * a[2] + l1 - m0 * b[2] - m1;
}

For comparison, the first version of compare3 used 11 adds and 10 multiplies, while this new version only uses 9 adds and 6 multiplies, and also has the option to early out more often.  This may seem like an improvement, but it turns out that in practice the difference isn’t so great.  Based on my experiments, the reordered version ends up taking about the same amount of time overall, more or less:

• compare3SymOpt: 356 µs

Which isn’t very good.  Part of the reason for the discrepancy most likely has something to do with the way compare3Sym gets optimized.  One possible explanation is that the expressions in compare3Sym might be easier to vectorize than those in compare3SymOpt, though I must admit this is pretty speculative.

But there is also a deeper question of can we do better than $O(n^2)$ asumptotically?  It turns out the answer is yes, and it requires the following observation:

Polynomial multiplication is convolution.

Using a fast convolution algorithm, we can multiply two polynomials together in $O(n \log(n))$ time.  Combined with a basic divide and conquer strategy, this gives an $O(n \log^2(n))$ algorithm for computing all the elementary symmetric functions. However, this is still worse than sorting the sequences!  It feels like we ought to be able to do better, but further progress escapes me.  I’ll pose the following question to you readers:

Question: Can we compute the n elementary symmetric polynomials in $O(n \log(n))$ time or better?

### Overflow

Now there is still a small problem with using symmetric polynomials for this purpose: namely integer overflow.  If you have any indices in your tuple that go over 1000 then you are going to run into this problem once you start multiplying them together.  Fortunately, we can fix this problem by just working in a ring large enough to contain all our elements.  In languages with unsigned 32-bit integers, the natural choice is $\mathbb{Z}_{2^{32}}$, and we get these operations for free just using ordinary arithmetic.

But life is not so simple in the weird world of JavaScript!  It turns out for reasons that can charitably be described as “arbitrary” JavaScript does not support a proper integer data type, and so every number type in the language gets promoted to a double when it overflows whether you like it or not (mostly not).  The net result:  the above approach messes up.  One way to fix this is to try applying a modulus operator at each step, but the results are pretty bad.  Here are the timings for a modified version of compare2Sym that enforces bounds:

That’s more than a 5-fold increase in running time, all because we added a few innocent bit masks!  How frustrating!

Semirings

The weirdness of JavaScript suggests that we need to find a better approach.  But in the world of commutative rings, it doesn’t seem like there are any immediately good alternatives.  And so we must cast the net wider.  One interesting possibility is to extend our approach to include semirings.  A semiring is basically a ring where we don’t require addition to be invertible, hence they are sometimes called “rigs” (which is short for a “ring without negatives”, get it? haha).

Just like a ring, a semiring is basically a set $S$ with a pair of operators $\oplus, \otimes$ that act as generalized addition and multiplication.  You also have a pair of elements, $0,1 \in S$ which act as the additive and multiplicative identity.  These things then have to satisfy a list of axioms which are directly analogous to those of the natural numbers (ie nonnegative integers).  Here are a few examples of semirings, which you may or may not be familiar with:

• The complex numbers are semiring (and more generally, so is every field)
• The integers are a semiring (and so is every other ring)
• The natural numbers are a semiring (but not a ring)
• The set of Boolean truth values, $\mathbb{B} = \{ \mathrm{true}, \mathrm{false} \}$ under the operations OR, AND is a semiring (but is definitely not a ring)
• The set of reals under the operations min,max is a semiring (and more generally so is any distributive lattice)
• The tropical semiring, which is the set of reals under (max, +) is a semiring.

In many ways, semirings are more fundamental than rings.  After all, we learn to count stuff using the natural numbers long before we learn about integers.  But they are also much more diverse, and some of our favorite definitions from ring theory don’t necessarily translate well.  For those who are familiar with algebra, probably the most shocking thing is that the concept of an ideal in a ring does not really have a good analogue in the language of semirings, much like how monoids don’t really have any workable generalization of a normal subgroup.

### Symmetric Polynomials in Semirings

However, for the modest purposes of this blog post, the most important thing is that we can define polynomials in a semiring (just like we do in a ring) and that we therefore have a good notion of an elementary symmetric polynomial.  The way this works is pretty much the same as before:

Let $R$ be a semiring under $\oplus, \otimes$; then we have the symmetric functions.  Then for two variables, we have the symmetric functions:

$S_{2,0} = a_0 \oplus a_1$

$S_{2,1} = a_0 \otimes a_1$

And for $n=3$ we get:

$S_{3,0} = a_0 \oplus a_1 \oplus a_2$

$S_{3,1} = (a_0 \otimes a_1) \oplus (a_0 \otimes a_2) \oplus (a_1 \otimes a_2)$

$S_{3,2} = a_0 \otimes a_1 \otimes a_2$

And so on on…

### Rank Functions and (min,max)

Let’s look at what happens if we fix a particular semiring, say the (min,max) lattice.  This is a semiring over the extended real line $\mathbb{R} \cup \{ - \infty, \infty \}$ where:

• $\oplus \mapsto \mathrm{min}$
• $\otimes \mapsto \mathrm{max}$
• $0 \mapsto \infty$
• $1 \mapsto -\infty$

Now, here is a neat puzzle:

Question: What are the elementary symmetric polynomials in this semiring?

Here is a hint:

$S_{2,0} = \min(a_0, a_1)$

$S_{2,1} = \max(a_0, a_1)$

And…

$S_{3,0} = \min(a_0, a_1, a_2)$

$S_{3,1} = \min( \max(a_0, a_1), \max(a_0, a_2), \max(a_1, a_2) )$

$S_{3,2} = \max(a_0, a_1, a_2)$

Give up?  These are the rank functions!

Theorem: Over the min,max semiring, $S_{n,k}$ is the kth element of the sorted sequence $a_0, a_1, ...$

In other words, evaluating the symmetric polynomials over the min/max semiring is the same thing as sorting.  It also suggests a more streamlined way to do the brutal inlining of a sort:

function compare2Rank(a, b) {
var d = Math.min(a[0],a[1]) - Math.min(b[0],b[1]);
if(d) {
return d;
}
return Math.max(a[0],a[1]) - Math.max(b[0],b[1]);
}

Slick!  We went from 25 lines down to just 5.  Unfortunately, this approach is a bit less efficient since it does the comparison between a and b twice, a fact which is reflected in the timing measurements:

• compare2Rank: 495 µs

And we can also easily extend this technique to triples as well:

function compare3Rank(a,b) {
var l0 = Math.min(a[0], a[1])
, m0 = Math.min(b[0], b[1])
, d  = Math.min(l0, a[2]) - Math.min(m0, b[2]);
if(d) {
return d;
}
var l1 = Math.max(a[0], a[1])
, m1 = Math.max(b[0], b[1]);
d = Math.max(l1, a[2]) - Math.max(m1, b[2]);
if(d) {
return d;
}
return Math.min(Math.max(l0, a[2]), l1) - Math.min(Math.max(m0, b[2]), m1);
}
• compare3Rank: 618.71 microseconds

### The Tropical Alternative

Using rank functions to sort the elements turns out to be much simpler than doing a selection sort, and it is way faster than calling the native JS sort on small arrays while avoiding the overflow problems of integer symmetric functions.  However, it is still quite a bit slower than the integer approach.

The final method which I will now describe (and the one which I believe to be best suited to the vagaries of JS) is to compute the symmetric functions over the (max,+), or tropical semiring.  It is basically a semiring over the half-extended real line, $\mathbb{R} \cup \{ - \infty \}$ with the following operators:

• $\oplus \mapsto \max$
• $\otimes \mapsto +$
• $0 \mapsto -\infty$
• $1 \mapsto 0$

There is a cute story for why the tropical semiring has such a colorful name, which is that it was popularized at an algebraic geometry conference in Brazil.  Of course people have known about (max,+) for quite some time before that and most of the pioneering research into it was done by the mathematician Victor P. Maslov in the 50s.  The (max,+) semiring is actually quite interesting and plays an important role in the algebra of distance transforms, numerical optimization and the transition from quantum systems to classical systems.

This is because the (max,+) semiring works in many ways like the large scale limit of the complex numbers under addition and multiplication.  For example, we all know that:

$\log(a b) = \log(a) + \log(b)$

But did you also know that:

$\log(a + b) = \max(\log(a), \log(b)) + O(1)$

This basically a formalization of that time honored engineering philosophy that once your numbers get big enough, you can start to think about them on a log scale.  If you do this systematically, then you eventually will end up doing arithmetic in the (max,+)-semiring.  Masolv asserts that this is essentially what happens in quantum mechanics when we go from small isolated systems to very big things.  A brief overview of this philosophy that has been translated to English can be found here:

Litvinov, G. “The Maslov dequantization, idempotent and tropical mathematics: a very brief introduction” (2006) arXiv:math/0501038

The more detailed explanations of this are unfortunately all store in thick, inscrutable Russian text books (but you can find an English translation if you look hard enough):

Maslov, V. P.; Fedoriuk, M. V. “Semiclassical approximation in quantum mechanics”

But enough of that digression!  Let’s apply (max,+) to the problem at hand of comparing sequences.  If we expand the symmetric polynomials in the (max,+) world, here is what we get:

$S_{2,0} = \max(a_0, a_1)$

$S_{2,1} = a_0 + a_1$

And for $n = 3$:

$S_{3,0} = \max(a_0, a_1, a_2)$

$S_{3,1} = \max(a_0+a_1, a_0+a_2, a_1+a_2)$

$S_{3,2} = a_0+a_1+a_2$

If you stare at this enough, I am sure you can spot the pattern:

Theorem: The elementary symmetric polynomials on the (max,+) semiring are the partial sums of the sorted sequence.

This means that if we want to compute the (max,+) symmetric polynomials, we can do it in $O(n \log(n))$ by sorting and folding.

Working the (max,+) solves most of our woes about overflow, since adding numbers is much less likely to go beyond INT_MAX.  However, we will tweak just one thing: instead of using max, we’ll flip it around and use min so that the values stay small.  Both theoretically and practically, it doesn’t save much but it gives us a maybe a fraction of a bit of extra address space to use for indices.  Here is an implementation for pairs:

function compare2MinP(a, b) {
var d = a[0]+a[1]-b[0]-b[1];
if(d) {
return d;
}
return Math.min(a[0],a[1]) - Math.min(b[0],b[1]);
}

And it clocks in at:

• compare2MinP: 354 µs

Which is a bit slower than the symmetric functions, but still way faster than ranking.  We can also play the same game for $n=3$:

function compare3MinP(a, b) {
var l1 = a[0]+a[1]
, m1 = b[0]+b[1];
d = l1+a[2] - (m1+b[2]);
if(d) {
return d;
}
var l0 = Math.min(a[0], a[1])
, m0 = Math.min(b[0], b[1])
, d  = Math.min(l0, a[2]) - Math.min(m0, b[2]);
if(d) {
return d;
}
return Math.min(l0+a[2], l1) - Math.min(m0+b[2], m1);
}

Which hits:

• compare3MinP: 382 µs

Again, not quite as fast as integers, but pretty good for JavaScript.

Summary

You can get all the code to run these experiments on github:

https://github.com/mikolalysenko/set-compare

And here are the results that I got from running the experiment, all collated for easy reading:

### Dimension = 2

• compareSimple: 5982 µs
• compare2Brutal: 415 µs
• compare2Sym: 352 µs
• compare2Rank: 498 µs
• compare2MinP: 369 µs

### Dimension = 3

• compareSimple: 7737 µs
• compare3Sym: 361 µs
• compare3Sym_opt: 362 µs
• compare3Rank: 612 µs
• compare3MinP: 377 µs

As you can see, the (min,+) solution is nearly as fast as the symmetric version without having the same overflow problems.

I hope you enjoyed reading this as much as I enjoyed tinkering around!  Of course I still don’t know of an optimal way to compare two lists.  As a final puzzle, I leave you with the following:

Question: Is there any method which can test if two unordered sequences are equal in linear time and at most log space?

Frankly, I don’t know the answer to this question and it may very well be impossible.  If you have any thoughts or opinions on this, please leave a comment!

Posted in Mathematics, Programming | 20 Comments

## CommonJS: Why and How

As I said last time, I want to start moving more of my stuff into npm, and so I think perhaps today would be a good time to explain why I think this is the right way to go, and how you can use it in your own JS projects.

The big picture reason that all of this stuff is important, is that there a huge trend these days towards single page browser based applications and JavaScript development in general.  Many factors have contributed to this situation, like the rise of fast JavaScript interpreters; widespread frustration and distrust of web plugins like SilverlightJava Applets and Flash; and the emergence of powerful new APIs like HTML 5 and WebGL.  Today, JavaScript runs on pretty much everything including desktops, cell phones, tablets and even game consoles.  Ignoring it is something you do at your own risk.

## Module Systems

However, while there are many good reasons to use JavaScript, it has a reputation for scaling poorly with project size.  In my opinion, the main cause of this is that JavaScript lacks anything even remotely resembling a coherent module system.  This omission makes it inordinately difficult to apply sensible practices like:

• Hiding implementation details behind interfaces
• Splitting large projects into multiple files
• Reusing functionality from libraries and other code bases

Ignoring these problems isn’t an option.  But despite its flaws, JavaScript deserves at least some credit for being a very flexible language, and with some ingenuity one can work around the lack of modules.  There are of course many ways you can do this, and for me at least as I’ve been learning JS I’ve found the bewildering array of solutions pretty confusing.  So in this post I’m going to summarize how I understand the situation right now, and give my assessment of the various options:

The obvious way to emulate a module in JavaScript would be to use a closure.  There is a fancy name for this in the JS community, and it is called the module design pattern:

var MyModule = (function() {
var exports = {};
//Export foo to the outside world
exports.foo = function() {
// ...
}

//Keep bar private
var bar = { /* ... */ };

//Expose interface to outside world
return exports;
})();

Here MyModule would become the interface for the module and the implementation of the module would be hidden away within the function() block.  This approach, when combined with features like “use strict”; can go a long way to mitigating the danger of the global-namespace free-for-all; and if applied with discipline it basically solves the first problem on our big list.

Unfortunately, it doesn’t really do much to fix our multiple files or importing problems.  Without bringing in third party tools or more complex module loaders, you are basically limited to two ways to do this within a browser:

• Or you can just concatenate all your files in a static build process.

The latter approach is generally more efficient since it requires fewer HTTP requests to load and there are tools like Google’s closure compiler which support this process.  The former approach is more dynamic since you can develop and test without having to do a full rebuild.  Unfortunately, neither method lets you to do selective imports, and they don’t handle external dependencies across libraries very well.

This style of coding is sometimes called monolithic, since the pressure to avoid code dependencies tends to push projects written in this fashion towards monumental proportions.  As an example, look at popular libraries like jQuery, YUI or THREE.js, each of which is packed with loads of features.  Now I don’t mean to especially pick on any of these projects, since if you aren’t going to accept external tooling their approach is actually quite reasonable.  Packaging extra functionality into a library (in the absence of a proper importing system) means you get a lot more features per <script>.  Sufficiently popular libraries (like jQuery) can leverage this using their own content distribution networks and can be cached across multiple sits.  This is great for load times, since users then don’t have to re-download all those libraries that their browser already has cached when they go from site-to-site.

The cost though is pretty obvious.  Because <script> tags have no mechanism to handle dependencies, the default way to include some other library in your project is copy-paste.  Heavy use of this style is a great way to proliferate bugs, since if you include a library with your project via this mechanism, you usually have no good way to automatically update it.

## CommonJS

The CommonJS module system solves all these problems.   The way it works is that instead of running your JavaScript code from a global scope, CommonJS starts out each of your JavaScript files in their own unique module context (just like wrapping it in a closure).  In this scope, it adds two new variables which you can use to import and export other modules: module.exports and require.  The first of these can be used to expose variables to other libraries.  For example, here is how to create a library that exports the variable “foo”:

//In library.js
exports.foo = function() {
//... do stuff
}

And you can import it in another module using the “require” function:

var lib = require("./library.js");
lib.foo();

The addition of this functionality effectively solves the module problem in JavaScript, in what I would consider the most minimally invasive way.  Today, CommonJS is probably the most widely adopted module system in JavaScript, in part due to the support from node.js.  The other major factor which has contributed to its success is the centralized npm package manager, which makes it easy to download, install and configure libraries written using CommonJS.

CommonJS packages should be lean and mean, with as little extraneous functionality as possible.  Compare for example popular modules like async or request to monolithic libraries like underscore.js or jQuery.  An interesting discussion of this philosophy can be found at James Halliday (substack)’s blog:

J. Halliday, “the node.js aesthetic” (2012)

Based on the success of node.js, this approach seems to be paying off for writing server side applications.  But what about browsers?

## CommonJS in the Browser

It turns out that doing this directly isn’t too hard if you are willing to bring in an extra build process.  The most tool to do this is browserify.  When you run browserify, it crawls all your source code starting from some fixed entry point and packages it up into a single .js file, which you can then minify afterwords (for example, using uglify.js).  Doing this reduces the number of http requests required to start up a page and reduces the overall size of your code, thus improving page load times.  The way it works is pretty automatic, just take your main script file and do:

browserify index.js > bundle.js

And then in your web page, you just add a single <script> tag:

<script src="bundle.js"></script>

browserify is compatible with npm and implements most of the node.js standard library (with the exception of some obviously impossible operating system specific stuff, like process.spawn).

### Rapid Development with browserify

The above solution is pretty good when you are looking to package up and distribute your program, but what if you just want to code and debug stuff yourself with minimal hassle? You can automate the bulk of this process using a Makefile, but one could argue reasonably that having to re-run make every time you edit something is an unnecessary distraction.  Fortunately, this is a trivial thing to automate, and so I threw together a tiny ~100-line script that runs browserify in a loop.  You can get it here:

serverify – Continuous development with browserify

To use it, you can just run the file from your project’s root directory and it will server up static HTML on port 8080 from ./www and bundle up your project starting at ./index.js. This behavior can of course be changed by command line options or configuration files.  As a result, you get both the advantages of a proper module system and the ease of continuous deployment for when you are testing stuff!

Other Formats

As it stands today, CommonJS/npm (especially when combined with tools like browserify) is an efficient and  comprehensive solution to the module problem in JavaScript.  In my opinion, it is definitely the best option if you are starting out from scratch.  But I’d be remiss if I didn’t at least mention some of the other stuff that’s out there, and how the situation may change over time:

• Looking ahead to the distant future, ECMA Script 6 has a proposal for some new module semantics that seems promising.  Unfortunately, we aren’t there yet, and no one really knows exactly what the final spec will look and how far out it will be until it gets there.  However, I hope that someday they will finally standardize all of this stuff and converge on a sane, language-level solution to the module problem.  For a balanced discussion of the relative merits of the current proposal,  I’d recommend reading the following post by Isaac Schlueter (current maintainer of node.js):

Schlueter, I. “On ES6 Modules” (2012)

1. Special syntax for specifying module imports (must be done at the top of each script)
2. No tooling required to use, works within browsers out of the box.

These choices are motivated by a preceived need to get modules to work within browsers with no additional tooling.  The standard reference implementation of AMD is the RequireJS library, and you can install it on your sever easily by just copy/pasting the script into your wwwroot.

The fact that AMD uses no tools is both its greatest strength and weakness.  On the one hand, you can use it in a browser right away without having to install anything. On the other hand, AMD is often much slower than CommonJS, since you have to do many async http requests to load all your scripts.  A pretty good critique of the AMD format can be found here:

Dale, T. “AMD is not the answer

I generally agree with his points, though I would also like to add that a bigger problem is that AMD does not have a robust package manager.  As a result, the code written for AMD is scattered across the web and relies on copy-paste code reuse.  There are some efforts under way to fix this, but they are nowhere near as advanced as npm.

## Conclusion

I’ve now given you my summary of the state of modules in JavaScript as I understand it, and hopefully explained my rationale for why I am picking CommonJS for the immediate future.  If you disagree with something I’ve said, have any better references or know of something important that I’ve missed, please leave a comment!

Posted in Miscellaneous, Programming | Tagged , , | 3 Comments

## New Year’s Resolution: Post more stuff on npm

First thing, I’d like to help announce/promote a project which I think is pretty cool (but am not directly involved in) which is voxeljs.  It is being developed by @maxogden and @substack, both of which are very active in the node.js community.  Internally, it uses some of the code from this blog, and it looks super fun to hack around with.  I highly recommend you check it out.

Second, I’d like to say that from here on out I’m going to try to put more of my code on npm so that it can be reused more easily.  I’m going to try to focus on making more frequent, smaller and focused libraries, starting with an npm version of my isosurface code:

https://github.com/mikolalysenko/isosurface

In the future, I hope to start moving more of my stuff on to npm, piece by piece.  Stay tuned for more details.

Posted in Miscellaneous | Tagged , , | 1 Comment

## Shapes and Coordinates

In a previous post, I talked a bit about solid modeling and discussed at a fairly high level what a solid object is.  This time I’m going to talk in generalities about how one actually represent shapes in practice.  My goal here is to say what these things are, and to introduce a bit of physical language which I think can be helpful when talking about these structures and their representations.

The parallel that I want to draw is that we should think of shape representations in much the same way that we think about coordinates.  A choice of coordinate system does not change the underlying reality of the thing we are talking about, but it can reveal to us different aspects of the nature of whatever it is we are studying.  Being conscious of the biases inherent in any particular point of view allows one to be more flexible when trying to solve some problem.

## Two Types of Coordinates

I believe that there are two general ways to represent shapes: Eulerian and LagrangianI’ll explain what this all means shortly, but as a brief preview here is a small table which contrasts these two approaches:

 Eulerian Lagrangian Coordinate System Global/fixed Local/moving Modeling Paradigm Implicit Parametric Transforms Passive Active

## Eulerian (Implicit)

The first of these methods, or the Eulerian representation, seeks to describe shapes in terms of the space they embedded in.  That is shapes are described by maps from an ambient space into a classifying set:

$f : \mathrm{Ambient}\:\mathrm{space} \to \mathrm{Classifying}\:\mathrm{space}$

More concretely, suppose we are working in n-dimensional Euclidean space $\mathbb{R}^n$.  As we have already seen, it is possible to describe solids using sublevel sets of functions.  At a high level, the way this works is that we have some map $f : \mathbb{R}^n \to S$ where $S$ is some set of labels, for example the real line $\mathbb{R}$.  Then we can define our shape as the preimage of some subset of these labels, $S_0 \subset S$:

$X = f^{-1}(S_0)$

Again, if $S = \mathbb{R}$ and $S_0 = (-\infty, 0]$, then we get the usual definition of a sublevel set:

$X = f^{-1}([-\infty, 0)) = \{ x \in \mathbb{R}^n : f(x) < 0 \} = V_0(f)$

But these are of course somewhat arbitrary choices.  We could for example replace $\mathbb{R}$ with the set $S = \{ 0,1 \}$ of truth values, and we might consider defining our shape as the preimage of 1:

$X = f^{-1}(1)$

Which would be the set of all points in $\mathbb{R}^n$ where f evaluates to 1.  In this situation f is what we could all point membership function.

### Motion in an Eulerian System

Say we have some Eulerian representation of our shape and we want to move it around.  One simple way to think about this is that we have the shape and we simply apply some map $w : \mathbb{R}^n \to \mathbb{R}^n$ to warp it into a new position:

Moving a shape by a map

I probably don’t need to convince you too much that motions are extremely useful in animation and physics, but they can also be a practical way to model shapes.  In fact, one simple way to construct more complicated shapes from simpler ones is to deform them along some kind of motion.  Now here is an important question:

Question: How do we apply a motion in an Eulerian coordinate system?

Or more formally, suppose that we have a set $S \subseteq \mathbb{R}^n$ and we have some warp $w : \mathbb{R}^n \to \mathbb{R}^n$.  If $S = f^{-1}( [-\infty, 0)$ is the 0-sublevel set of some potential function $f$, can we find another potential function $h$ such that,

$h^{-1}( [ -\infty, 0 ) ) = w(S)$

Done?  Ok, here is how we can arrive at the answer by pure algebraic manipulation.  Starting with the definition for $S$, we get:

$h^{-1}( . ) = w( f^{-1}( . ) )$

And taking inverses on both sides,

$h( x ) = f( w^{-1}( x ) )$

Which is of course only valid when $w$ is invertible.  So in summary,

In an Eulerian coordinate system, to move a shape $S = V_0(f)$ by $w$, we must precompose $f$ with $w^{-1}$:

$w(S) = V_0(f \circ w^{-1})$

Or said another way:

Eulerian coordinate systems transform passively.

### Pros and Cons

There are many advantages to an Eulerian coordinate system.  For example, point wise operations like Boolean set operations are particularly easy to compute.  (Just taking the min/max is sufficient to evaluate set intersection/union.)  On the other hand, because Eulerian coordinates are passive, operations like rendering or projection (which are secretly a kind of motion) can be much more expensive, since they require solving an inverse problem.  In general, this is as hard as root finding and so it is a non-trivial amount of work.  In 3D, there is also the bigger problem of storage, where representations like voxels can consume an enormous amount of memory.  However, in applications like image processing the advantages often outweigh the costs and so Eulerian representations are still widely used.

## Lagrangian (Parametric)

Lagrangian coordinates are in many ways precisely the dual of Eulerian coordinates.  Instead of representing a shape as a map from an ambient space to some set of labels, they instead map a parametric space into the ambient space:

$p : \mathrm{Parameter}\: \mathrm{space} \to \mathrm{Ambient}\: \mathrm{space}$

The simplest example of a Lagrangian representation is a parametric curve, which typically acts by mapping an interval (like $[0,1]$) into some ambient space, say $\mathbb{R}^n$:

$p(t) = ( x(t), y(t), z(t) )$

But there are of course more sophisticated examples.  Probably the most common place where people bump into Lagrangian coordinates is in the representation of triangulated meshes.  A mesh is typically composed of two pieces of data:

1. A cell complex (typically specified by the vertex-face incidence data)
2. And an embedding fo the cell complex into $\mathbb{R}^3$

The embedding is commonly taken to be piecewise linear over each face.  There are of course many examples of Lagrangian representations:

### Motion in Lagrangian Coordinates

Unlike in an Eulerian coordinate system, it is relatively easy to move an object in a Lagrangian system, since we can just apply a motion directly.  To see why, let $p : P \to \mathbb{R}^n$ be a parameterization of a shape $S$ and let $w : \mathbb{R}^n \to \mathbb{R}^n$ be a motion; then obviously:

$w(S) = w(p(P)) = (w \circ p)(P)$

And so we conclude,

Lagrangian coordinate systems transform actively.

### Pros and Cons

The main advantage to a Lagrangian formulation is that it is much more compact than the equivalent Eulerian form when the shape we are representing is lower dimensional than the ambient space.  This is almost always the case in graphics, where curves and surfaces are of course lower dimension than 3-space itself.  The other big advantage to Lagrangian coordinates is that they transform directly, which explains why they are much preferred in applications like 3D graphics.

On the other hand, Lagrangian coordinates are very bad at performing pointwise tests.  There is also a more subtle problem of validity.  While it is impossible to transform an Eulerian representation by a non-invertible transformation, this is quite easy to do in a Lagrangian coordinate system.  As a result, shapes can become non-manifold or self-intersecting which may break certain invariants and cause algorithms to fail.  This tends to make code that operates on Lagrangian systems much more complex than that which deals with Eulerian representations.

## Changed WordPress Theme

I was getting a bit tired of the old theme I was using (Manifest) and decided to try something different.  One nice thing with this new version is that it has a sidebar which makes it a bit easier to navigate around the blog.  I’m also going to be adding some links to other blogs and resources on the sidebar over time.

Posted in Miscellaneous | 1 Comment

## Conway’s Game of Life for Curved Surfaces (Part 2)

(This is the sequel to the following post on SmoothLife.  For background information go there, or read Stephan Rafler’s paper on SmoothLife here.)

Last time, we talked about an interesting generalization of Conway’s Game of Life and walked through the details of how it was derived, and investigated some strategies for discretizing it.  Today, let’s go even further and finally come to the subject discussed in the title: Conway’s Game of Life for curved surfaces!

## Quick Review: SmoothLife

To understand how this works, let’s first review Rafler’s equations for SmoothLife.  Recall that $f : \mathbb{R}^n \to \mathbb{R}$ is the state field and that we have two effective fields which are computed from f:

$M(x,t) = \frac{1}{2 \pi h^2} \iint \limits_{|r| \leq h} f(x-r,t) dr$

$N(x,t) = \frac{1}{8 \pi h^2} \iint \limits_{h \leq |r| \leq 3h} f(x-r,t) dr$

And that the next state of f is computed via the update rule:

$f(x,t+1) = S(N(x,t), M(x,t))$

Where:

$\sigma(x, a, \alpha) = \frac{1}{1 + \exp(-\frac{4}{\alpha}(x - a))}$

$\sigma_n(n, a, b) = \sigma(n, a, \alpha_n) (1 - \sigma(n,b, \alpha_n)$

$\sigma_m(m, a, b) = (1 - \sigma(m, 0.5, \alpha_m) )a + \sigma(m, 0.5, \alpha_m) b$

$S(n,m)=\sigma_n(n, \sigma_m(m, b_1, d_1), \sigma_m(m, b_2, d_2))$

And we have 6 (or maybe 7, depending on how you count) parameters that determine the behavior of the automata:

• $[b_1, b_2]$: The fraction of living neighbors required for a cell to stay alive (typically $[\frac{2}{8}, \frac{3}{8}]$).
• $[d_1, d_2]$: The fraction of living neighbors required for a cell to be born (typically $\frac{3}{8}$).
• $\alpha_m$:  The transition smoothness from live to dead (arbitrary, but Rafler uses $\alpha_m \approx 0.148$).
• $\alpha_n$: Transition smoothness from interval boundary (again, arbitrary but usually about $\alpha_n \approx 0.028$).
• $h$: The size of the effective neighborhood (this is a simulation dependent scale parameter, and should not effect the asymptotic behavior of the system).

## Non-Euclidean Geometry

Looking at the above equations, the only place where geometry comes into the picture is in the computation of the M and N fields.  So it seems reasonable that if we could define the neighborhoods in some more generic way, then we could obtain a generalization of the above equations.  Indeed, a similar idea was proposed in Rafler’s original work to extend SmoothLife to spherical domains; but why should we stop there?

### Geodesic Distance

The basic idea behind all of this is that we want to generalize the concept of a sphere to include curved surfaces.  Recall the usual definition of a sphere is that it is the set of all points within some distance of a given point.  To extend this concept to surfaces, we merely have to change our definition of distance somewhat.  This proceeds in two steps.

First, think about how the distance between two points is defined.  In ordinary Euclidean geometry, it is defined using the Pythagorean theorem.  That is if we fix a Cartesian coordinate system the distance between a pair of points $p=(x_0, y_0), q=(x_1, y_1)$ is just:

$d(p,q) = |p-q| =\sqrt{ (x_0-x_1)^2 + (y_0-y_1)^2 }$

But this isn’t the only way we could define this distance.  While the above formula is pretty easy to calculate, it by far not the only way such a distance could be defined.  Another method is that we can formulate the path length variationally.  That is we describe the distance between points p and q as a type of optimization problem; that is it is the arc length of the shortest path connecting the two points:

$d(p,q) = \min \limits_{f : [0,1] \to \mathbb{R}^2\\f(0)=p, f(1)=q} \int \limits_0^1 |\partial_t f(t)| dt$

At a glance this second statement may seem a bit silly.  After all, solving an infinite dimensional optimization problem is much harder than just subtracting and squaring a few numbers.  There is also something about the defining Euclidean distance in terms of arclength that seems viciously circular — since the definition of arclength is basically an infinitesimal version of the above equation — but please try not to worry about that too much:

Computing arclength by subdivisions. Image (c) Wikipedia 2007. Author: Mike Peel

Instead, the main advantage of working in the variational formulation is that it becomes possible to define distances in spaces where the shortest path between two points is no longer a straight line, as is the case on the surface of the Earth for example:

The shortest distance between any two cities on Earth is a circle — not a straight line!

Of course to define the arclength for something like a curve along a sphere, we need a little extra information.  Looking at the definition of arclength, the missing ingredient is that we need some way to measure the length of a tangent vector.  The most common way to do this is to introduce what is known as a Riemannian metric, and the data it stores is precisely that!  Given a smooth manifold $\Omega$, a Riemannian metric continuously assigns to every point $p \in \Omega$ a symmetric bilinear form $g_p : T_p \Omega \times T_p \Omega \to \mathbb{R}$ on the tangent space of $\Omega$ at p.  To see how this works let us, suppose we were given some curve $f : [0,1] \to M$, then we can just define the arclength of f to be:

$\mathrm{arclength}(f) = \int_0^1 \sqrt{g_{f(t)} \left( \frac{d f}{dt}, \frac{d f}{dt} \right)} dt$

Armed with this new generalization arclength, it is now pretty clear how we should define the distance between points on a Riemannian manifold (that is any differentiable manifold equipped with a Riemannian metric).  Namely, it is:

$d(p,q)=\min \limits_{\begin{array}{c} f : [0,1] \to M, \\ f(0) = p, f(1)=q\end{array}} \int_0^1 \sqrt{g_{f(t)} \left( \frac{d f}{dt}, \frac{d f}{dt} \right)} dt$

This is called the geodesic distance between p and q, after the concept of a geodesic which is a fancy name for the shortest path between two points’.  The fact that concatenation of paths is associative implies that geodesic distance satisfies the triangle inequality, and so it is technically a metric.  The fact that our metric is allowed to vary from point to point allows us to handle much more flexible topologies and and surfaces.  For example, parallel or offset curves on a Riemannian manifold may not be straight lines.  This leads to violations of Euclid’s parallel postulate and so the study of Riemannian manifolds is sometimes called non-Euclidean geometry.  (It also has a few famous applications in physics!)

### Geodesic Balls

Now that we have a definition of distance that works for any surface (with a Riemannian metric), let’s apply it to Rafler’s equations.  I claim that all we need to do is replace the spherical neighborhoods in SmoothLife with some suitably constructed geodesic neighborhoods.  In particular, we can define these neighborhoods to be balls of some appropriate radius.  Given a point p, we define the geodesic ball of radius h centered at p to be the set of all points within distance at most h of p:

$B_{p,h} = \{ x \in \Omega : d(p,x) \leq h \}$

Which naturally leads to defining the M and N fields as follows:

$M(x,t) = \frac{1}{\mu(B_{x,h})} \int_{y \in B_{x,h}} f(y,t) dy$

$N(x,t) = \frac{1}{\mu(B_{x,3h}) - \mu(B_{x,h})}\int_{y \in B_{y, 3h} \setminus B_{y,h}} f(y,t) dy$

That these equations constitute a generalization of SmoothLife to curved surfaces should be pretty clear (since we barely changed anything at all!).  One can recover the original SmoothLife equations by setting $\Omega = \mathbb{R}^2$ and taking a flat metric.  However, there is a major departure in that the very same equations can now be used to describe the dynamics of life on curved surfaces as well!

## Discrete Riemannian Manifolds

Generalizing SmoothLife to surfaces didn’t change the equations much, but conceptually there is now a whole lot more to unpack.  To actually implement this stuff, we need to solve the following three problems:

1. First, we need to figure out what metric to use.
2. Then, we need some way to compute geodesic balls on a surface.
3. Finally, we have to discretize the fields on the surface in some way.

The last part follows along more or less the same sort of reasoning as we applied in the periodic/flat case, (though the details are a bit different).  However,  items 1 and 2 are new to SmoothLife on curved surfaces and requires some special explanation.

### Metric Embeddings

Up to this point, we’ve been deliberately abstract in our dealings with metrics, and haven’t really said much about where they come from.  In the most general setting, we can think of a metric as something which is intrinsic to a surface — ie it is part of the very data which describes it.  However, in practical applications — especially those arising in computer graphics and engineering — metrics usually come from an embedding.  Here’s how it works:

Given a pair of differential manifolds $M, N$ and a smooth map $f : M \to N$ with a metric $g$ on $N$, then $f$ induces a metric on $M$ by a pullback.  That is, $df$ be the differential of $f$, then there ia metric $g^*$ on $M$ for every point $p \in M$:

$g^*_p( u, v ) = g_{f(p)} ( df_p(u), df_p(v) )$

To make this very concrete, let us suppose that our surface is sitting inside 3D Euclidean space.  That is we have some smooth map $h : \Omega \to \mathbb{R}^3$ that takes our surface and sticks it into 3-space with the ordinary flat Euclidean metric.  Then the geodesic distance is just the length of the shortest path in the surface:

$d(p,q) = \min_{f : [0,1] \to \Omega \\ f(0)=p, f(1)=q} \int_0^1 | \frac{d h(f(t))}{dt} | dt$

In the case of a triangulated mesh, we can let the embedding map just be the ordinary piecewise linear embedding of each triangle into 3-space.

### Geodesics on Triangulated Surfaces

Ok, so now that we have a Riemannian metric, the next thing we need to is figure out how to calculate the geodesic distance.  As far as I am aware, the first comprehensive solution to this problem was given by Mitchell, Mount and Papadimitrou back in 1987:

J. Mitchell, D. Mount and C. Papadimitrou. (1987) “The Discrete Geodesic Problem” SIAM Journal of Computing

The basic idea behind their algorithm is pretty straightforward, though the details get a little hairy.  It basically amounts to a modified version of Dijkstra’s algorithm that computes the single source shortest path to every point along each edge in the mesh.  The basic reason this works is that within the interior of each face the actual distance field (for any point source) is always piecewise quadratic.  However, there is a catch: the distance along each edge in the mesh might not be quadratic.  Instead you may have to subdivide each edge in order to get a correct description of the distance field.  Even worse, in that paper Mitchell et al. show that the exact distance field for a mesh with $O(n)$ vertices could have $O(n)$ distinct monotone components per edge, which gives their algorithm a worst case running time of $O(n^2 \log(n))$, (the extra log factor is due to the cost associated with maintaining an ordered list for visiting the edges.)

Quadratic time complexity is pretty bad — especially for large meshes — but it is really about the best you can hope for if you are trying to solve the problem properly.  In fact, a distance field on a piecewise linear triangular surface can have up to $O(n^2)$ distinct piecewise polynomial components, and so you need $O(n^2)$ bits just to encode the field anyway.  However, it has been observed that in practice the situation is usually not so bad and that it is often possible to solve the problem faster if you are willing to make do with an approximation. Over the years, many alternatives and approximations to geodesic distance have been proposed, each making various trade offs.  Here is a quick sampling of a few I found interesting:

K. Polthier and M. Schmies. (1998) “Straightest Edge Geodesic” Mathematical Visualization

R. Kimmel and J.A. Sethian. (1998) “Computing Geodesic Paths on Manifolds” PNAS

V. Surazhky, T. Surazhky, D. Kirsanov, S. Gortler, H. Hoppe. (2005) “Fast Exact and Approximate Geodesics on Meshes” SIGGRAPH 2005

The first two papers are important in that they describe different ways to look at approximating geodesic distance.  Polthier and Schmies give an alternative definition of geodesic distance which is easier to calculate, while Kimmel and Sethian describe a method that approximates the surface itself.  Both methods run in $O(n \log(n))$ time instead of $O(n^2)$ — though one needs to be a bit careful here, since the $n$ in Kimmel and Sethian (aka number of voxels) could actually be exponentially larger than the $n$ (number of vertices) which is used in the other papers (in practice this is usually not much of a problem).  The last paper is also important as it is one of the most cited in this area, and discusses a simple a modification of Mitchell et al.’s method which gives some ability to make a tradeoff between performance and accuracy.  I also like this paper for its clarity, since the figures are much easier to follow than the original 1987 work.

One conclusion that can be drawn from all this is that computing geodesic distance is a hairy problem.  Digging around it is easy to get lost in a black hole of different possibilities.  Since this is not the main point of this post (and because I wrote most of this code while I was on an airplane without access to the internet), I decided to try just hacking something together and ended up going with the first thing that worked.  My approach was to use a modification of the Bellman-Ford algorithm and compute the geodesic distance in two phases.  First, I did an iteration of Dijkstra’s algorithm on the edges of the mesh to get an upper bound on the distance, using the Euclidean distance lower bound to trim out far away vertices.  Then I iterated unfolding the faces to try to refine the distance field until it converged.

This approach may be novel, but only because it is s obviously terrible compared to other techniques that no one would ever bother to publish it.  Nonetheless, in my experiments I found that it gave a passable approximation to the distance field, provided that the mesh had enough subdivisions.  The downside though is that the Bellman-Ford algorithm and its variants are quite slow. If I ever have the time, I’d like to revisit this problem some day and try out some better techniques and see how they compare, but that will have to be the subject of another entry.  For now, this approach will have to do.

## The Finite Element Method

Supposing that we now have some reasonable approximation to geodesic distance (and thus the shape of a geodesic ball) on our mesh, we now need to come up with a way to represent an approximation of the state field on the mesh.  Just like in the flat version of SmoothLife, we must use the Galerkin method to discretize the system, but this time there is a major difference.  When we were on a flat/periodic domain, we had the possibility of representing the field by a sum of Dirichlet or sinc elements, which turned out to make it really easy to calculate the effective fields M and N using the Fourier transform.  However on an arbitrary curved surface (or mesh) we usually don’t have any sort of translational symmetry and so such a representation is not always possible.

Instead, we will have to make do with a somewhat more general, but less efficient, set of basis functions.  In engineering applications, one popular choice is to write the field as a sum of splines, (aka smooth compactly supported piecewise polynomial functions).  In fact, this idea of using the Galerkin method with splines is so widely used that it even has its own special name: the Finite Element Method (FEM).  FEM is basically the default choice for discretizing something when you don’t really know what else to do, since it works for pretty much any sort of topology or boundary condition.  The tradeoff is that it can be a lot less efficient than other more specialized bases, and that implementing it is usually much more complicated.

### Piecewise Linear Elements

As a first attempt at creating a finite element discretization for SmoothLife, I opted to go with $C^0$ piecewise linear or tent’ functions for my basis functions:

Illustration of a piecewise linear approximation by $C^0$ tent functions. Image (c) Wikipedia 2012. Author: Krishnavedala

For a 2D triangulated mesh, the simplest piecewise linear basis functions are the Barycentric coordinates for each face.  To explain how this works, let us first describe how we can use these functions to parameterize a single triangle.  In a Barycentric coordinate system, a triangle is described as a subset of $\Delta \subset \mathbb{R}^3$:

$\Delta=\left\{ \alpha, \beta, \gamma \in \mathbb{R}: \begin{array}{c}0\leq \alpha, \beta, \gamma \leq 1 \\ \alpha+\beta+\gamma=1\end{array}\right\}$

The three vertices of the triangle correspond to the coordinates $(\alpha, \beta, \gamma) = (1, 0, 0), (0, 1, 0), (0, 0, 1)$ respectively.  Using this coordinate system it is really easy to define a linear scalar function on $\Delta$.  Pick any 3 weights $w_0, w_1, w_2 \in \mathbb{R}$ and define a map $f : \Delta \to \mathbb{R}$ according to the rule:

$f(\alpha, \beta, \gamma) = w_0 \alpha + w_1 \beta + w_2 \gamma$

This formula should be pretty familiar if you know a bit about computer graphics.  This is the same way a triangle gets mapped into 3D!  To get a map from the triangle to any vector space, you can just repeat this type of process for each component separately.

Ok, now that we have some idea of how an embedding works for a single triangle let’s figure out how it goes for a complete mesh.  Again we start by describing the parameter space for the mesh.  Let us suppose that our mesh is abstractly represented by a collection of $n$ vertices, $V$ and $m$ faces, $F$ where each face, $(i_0, i_1, i_2) \in F$ is a 3-tuple of indices into $V$ (in other words, it is an index array).   To each vertex we associate a scalar  $\alpha_0, \alpha_1, ... \alpha_{n-1} \in [0,1]$ and for every face we add a constraint:

$\alpha_{i_0} + \alpha_{i_1} + \alpha_{i_2} = 1$

The solution set of all of these constraints cuts out a subset of $K \subset \mathbb{R}^n$ which is the parameter space for our mesh.  To define a piecewise linear scalar field on $K$, we just need to pick $n$ different weights, one per each vertex.  Then we write any piecewise linear scalar field, $f : K \to \mathbb{R}$, analgous to the triangular case:

$f(\alpha_0, \alpha_1, ..., \alpha_{n-1}) = \sum_{j=0}^{n-1} w_{j} \alpha_j$

In other words, the basis or shape’ functions in our discretization are just the Barycentric coordinates $\alpha_i$. To see what this looks like visually, here is another helpful picture from wikipedia:

The 2D linear finite element shape functions visualized for a planar triangulation. The field is plotted above while the topology of the mesh is visualized on the bottom. Image (c) Wikipedia 2007. Author Oleg Alexandrov

### Integration on Meshes

Now that we know our basis functions, we need to say a bit about how we integrate them.  To do this, let us suppose that we have a piecewise linear embedding of our mesh which is specified (again in the linear shape functions) by giving a collection of vectors $v_0, v_1, ... v_{n-1} \in \mathbb{R}^3$.  These vertex weights determine an embedding of our mesh in 3D space according to the rule $\phi : K \to \mathbb{R}^3$

$\phi( \alpha_0, \alpha_1, ... \alpha_{n-1} ) = \sum_{i=0}^{n-1} \alpha_i v_i$

To integrate a scalar function $f : K \to \mathbb{R}$ over the embedded surface $\phi(K)$, we want to compute:

$\int_{\phi(K)} f( \phi^{-1}(x) ) d A$

This seems a little scary, but we can simplify it a lot using the fact that $\phi$ is piecewise linear over the mesh and instead performing the integration face-by-face.  Integrating by substitution, we get:

$\sum\limits_{(i_0, i_1, i_2)\in F}\int_{0}^{1}\int_{0}^{1-\alpha_{i_0}}f(0,0,...,\alpha_{i_0},\alpha_{i_1},1-\alpha_{i_0}-\alpha_{i_1},...)|(v_0-v_2)\times(v_1-v_2)|d \alpha_{i_1}d\alpha_{i_0}$

The way you should read that $f(...)$ is that every term is 0 except for the $i_0, i_1, i_2$ components, which are set to $\alpha_0, \alpha_1, 1-\alpha_0-\alpha_1$ respectively.

### Computing the Effective Fields

Now let’s suppose that we have a state field $f$ and that we want to compute the effective fields $late M$ and $N$ as defined above.  Then by definition:

$M(x,t) = \frac{1}{\mu(B_{x,h})}\int \limits_{B_{x,h}} f(y) dy = \int_K 1_{B_{x,h}}(y) f(y) dy$

The function $1_{B_{x,h}}$ is just the indicator function for the geodesic ball centered at x with radius h, and is computed using the geodesic distance field we described above.  To actually evaluate the integral, let us suppose that $f(...)$ is a weighted sum of linear shape functions:

$f(\alpha_0, ..., \alpha_{n-1}) = \sum_{i=0}^{n-1} w_i \alpha_i$

Then we apply the expansion we just described in the previous section:

$M(x,t)= \frac{1}{\mu(B_{x,h})}\begin{array}{r}\sum \limits_{(i_0, i_1, i_2)\in F}\int \limits_{0}^{1}\int \limits_{0}^{1-\alpha_{i_0}} \left(\Delta w_{i_0 i_2}\alpha_{i_0}+\Delta w_{i_1 i_2}\alpha_{i_1} \right)1_{B_{x,h}}(\Delta v_{i_0 i_2}\alpha_{i_0}+\Delta v_{i_1 i_2}\alpha_{i_1})... \\ \left|\Delta v_{i_0 i_2}\times\Delta v_{i_1 i_2}\right|d\alpha_{i_1} d\alpha_{i_0}\end{array}$

Where:

$\Delta w_{ij} = w_i - w_j$

$\Delta v_{ij} = v_i-v_j$

This integral is pretty horrible to compute analytically, but fortunately we don’t have to do that.  Instead, we can approximate the true result by using a numerical quadrature rule.  The way quadrature rules work is you just sample the integrand and take linear combinations of the results.  For cleverly selected weights and sample points, this can even turn out to be exact — though working out exactly the right values is something of a black art.  Fortunately, a detailed understanding of this process turns out to be quite unnecessary, since you can just look up the sample points and weights in a book and then plug in the numbers:

J. Burkardt. “Quadrature Rules for Triangles

It should be obvious that the same trick works for computing $N$, just by replacing $1_{B_{x,h}}$ with $1_{B_{x,3h}} - 1_{B_{x,h}}$.

### Matrix Formulation and Implementation

To compute the next state of $f$, we can just sample the field at each vertex.  To do this, let us define:

$m_i = M( v_i, t)$

$n_i = N(v_i, t)$

Then we can update $f$ according to the rule:

$w_i \leftarrow S(m_i, n_i)$

While this would work, it would also be very slow, since computing $m_i, n_i$ at each vertex requires doing quadratic work.  Instead, it is better if we precalculate a bit to avoid doing so many redundant computations.  To understand how this works, observe that our rule for computing the quantities $m_i$ is linear in the vector $w_i$.  This means that we can write it as a matrix:

$m = K_M w$

Where $m, w$ are the vectors of coefficients and $K_M$ is the matrix which we precalculate whose entries using the formula:

${K_M}_{jk}=\frac{1}{\mu(B_{v_j,h})}\begin{array}{r}\sum \limits_{(i_0, i_1, i_2)\in F_k}\int \limits_{0}^{1}\int \limits_{0}^{1-\alpha_{i_0}} \left(\Delta w_{i_0 i_2}\alpha_{i_0}+\Delta w_{i_1 i_2}\alpha_{i_1} \right)1_{B_{v_j,h}}(\Delta v_{i_0 i_2}\alpha_{i_0}+\Delta v_{i_1 i_2}\alpha_{i_1})... \\ \left|\Delta v_{i_0 i_2}\times\Delta v_{i_1 i_2}\right|d\alpha_{i_1} d\alpha_{i_0}\end{array}$

That is, it is a sum over all the faces incident to vertex $k$ (which I wrote as $F_k$ to save space), with quadrature computed as described above.  Since the outer radius for the cells are finite, $K_M$ ends up being pretty sparse and so it is relatively easy to process and store.  Again, the same idea applies to the effective neighborhoods which we store in the matrix $K_N$.  Putting this together, we get the following algorithm:

• Use geodesic distance to precompute matrices $K_M, K_N$
• Initialize state vector $w$
• For each time step:
• Set $m = K_M w$
• Set $n = K_N w$
• Set $w_i = S(m_i, n_i)$

## Demo

If you want to try playing around with this stuff yourself, I made a WebGL demo that should work in Chrome (though I haven’t tested it in Firefox).  You can check it out here:

http://mikolalysenko.github.com/MeshLife

Be warned though!  It is very slow due to the inefficient calculation of geodesic distances.  Of course once it has finished assembling the matrices $K_M, K_N$ it runs pretty well in the browser.  If you want to look at the code for the demo, you can try it out on github:

https://github.com/mikolalysenko/RiemannLife

The project is written using node.js and has dependencies on two other libraries I made, namely trimesh.js and meshdata — both of which you can install via npm.

## Conclusions

In conclusion, we’ve demonstrated the extended the domain of Rafler’s SmoothLife equations to curved surfaces.  While our current implementation is somewhat crude, it does suffice to illustrate that the basic theoretical concepts are sound.  In the future, it would be good to investigate alternative methods for computing geodesic distance.  At the moment, this is the main bottleneck in the pipeline, and solving it would make further investigation of SmoothLife and related phenomena much easier.

It would also be interesting to attempt a more detailed analysis of the dynamics of SmoothLife.  One interesting puzzle would be to try working out the stability of various solitons/gliders in the presence of curvature.  It seems that they tend to destabilize and bifurcate near regions of high curvature.  This is probably due to the fact that the soliton interacts with itself in these regions.  For small curvatures, the trajectory of the solitons are deflected, causing gravitational like effects on their motion.

## Conway’s Game of Life for Curved Surfaces (Part 1)

Conway’s Game of Life is one of the most popular and iconic cellular automata.  It is so famous that googling it loads up a working simulation right in your browser!  The rules for the Game of Life (GoL) can be stated in a few lines:

1. The world is an infinite rectangular array of cells, each of which can be in one of two states: alive or dead; and the neighborhood of each cell is a 3×3 grid centered on the cell.
2. The state of the world advances in discrete synchronous time steps according to the following two rules:
3. Cells which are alive remain alive if and only if they have between 2 and 3 neighbors; otherwise they die.
4. Cells which are dead become alive if they have exactly 3 neighbors; or else they stay dead.

Despite the superficial simplicity of these rules, the Game of Life can produce many unexpected and fascinating recurring patterns; like the glider soliton:

A 3×3 glider in Conway’s Game of Life. Black represents a live cell; white represents dead. (c) 2008, Wikipedia. Author: Dhatfield

The fact that the rules for the Game of Life are so short and clear makes it a great project for initiating novice programmers.  Since Life was invented by John Conway in 1970, it has distracted many amateur and recreational mathematicians, leading to the discovery of many interesting patterns.  For example, Paul Chapman showed that Life is Turing complete, and so it is in principle possible to translate any computer into a set of initial conditions.     As an amusing example of this concept, here is an implementation of Conway’s Game of Life in the Game of Life created by Brice Due.  Another cool pattern is the famous Gemini spaceship which is a complex self replicating system, and is perhaps the first true organism’ to be discovered in life!  If any of this sounds interesting to you, you can play around with the Game of Life (and some other cellular automata) and look at a large and interesting library of patterns collected in the community developed Golly software package.

## SmoothLife

Now, if you’ve been programming for any reasonable length of time you’ve probably come across the Game of Life at least a couple of times before and so all the stuff above is nothing new.  However, a few weeks ago I saw an incredibly awesome video on reddit which inspired me to write this post:

This video is of a certain generalization of Conway’s Game of Life to smooth spaces proposed by Stephan Rafler:

S. Rafler, “Generalization of Conway’s “Game of Life” to a continuous domain – SmoothLife” (2011) Arxiv: 1111.1567

You can read this paper for yourself, but I’ll also summarize the key concepts here.  Basically, there are two main ideas in Rafler’s generalization:  First, the infinite grid of cells is replaced with an effective grid’ that is obtained by averaging over a disk.  Second, the transition functions are replaced by a series of differential equations derived from a smooth interpolation of the rules for the discrete Game of Life.

### Effective Grids

To explain how an effective grid’ works, first consider what would happen if we replaced the infinite discrete grid in the game of life with a time-dependent continuous real scalar field, $f : \mathbb{R}^2 \times \mathbb{R} \to \mathbb{R}$ on the plane.  Now here is the trick:  Instead of thinking of this is an infinite grid of infinitesimal cells, we give the cells a small — but finite! — length.  To do this, pick any small positive real number h which will act as the radius of a single cell (ie “the Planck length” for the simulation).  Then we define the state of the effective cell‘ at a point x as the average of the field over a radius h neighborhood around x, (which we call M(x,t) following the conventions in Rafler’s paper):

$M(x, t) = \frac{1}{\pi h^2} \int \limits_{0 \leq |y| \leq h} f(x-y,t) dy$

Now for each cell, we also want to compute the effective “number of cells” in its neighborhood.  To do this, we use the same averaging process, only over a larger annulus centered at x. By analogy to the rules in the original GoL, a reasonable value for the outer radius is about 3h. Again, following Rafler’s conventions we call this quantity N(x,t):

$N(x, t) = \frac{1}{8 \pi h^2} \int \limits_{h \leq |y| \leq 3 h} f(x-y,t) dy$

To illustrate how these neighborhoods fit together, I made a small figure:

Now there are two things to notice about M and N:

1. They are invariant under continuous shifts.
2. They are invariant under rotations.

This means that if we define the state of our system in terms of these quantities, the resulting dynamics should also be invariant under rotations and shifts!  (Note:  These quantities are NOT invariant under scaling, since they are dependent on the choice of h.)

### Smooth Transition Functions

Of course getting rid of the grid is only the first step towards a smooth version of Life.  The second (and more difficult) part is that we also need to come up with a generalization of the rules for the game of life that works for continuous values.  There are of course many ways to do this, and if you’ve taken a course on real analysis you may already have some idea on how to do this.  However, for the sake of exposition, let’s try to follow along Rafler’s original derivation.

As a starting point, let’s first write the rules for the regular discrete Game of Life in a different way.  By some abuse of notation, suppose that our field  was a discrete grid (ie $f: \mathbb{Z}^3 \to \{ 0, 1 \}$) and that N, M, give the state of each cell and the number of live cells in the 1-radius Moore neighborhood.  Then we could describe the rules of Conway’s Game of Life using the equation:

$f(x, t+1) = S(N(x,t), M(x,t))$

Where represents the transition function of the Game of Life, and is defined to be:

$S(N,M) = \left \{ \begin{array}{cc} 1 & \mathrm{if } \: 3 - M \leq N \: \mathrm{ and }\: N \leq 3 \\ 0 & \mathrm{otherwise} \end{array} \right.$

Since S is a function of two variables, we can graph it by letting the x-axis be the number cells in the neighborhood and the y-axis be the state of the cell:

Plot of state transition for GoL. Y-axis is N, X-axis is M, red=alive, blue=dead.

Now given such a representation, our task is to smooth out’ S somehow.  Again to make things consistent with Rafler, we first linearly rescale N, M so that they are in the range [0,1] (instead of from [0,9), [0,2) respectively).  Our goal is to build up a smooth approximation to S using sigmoid functions to represent state transitions and thresholding (this is kind of like how artificial neural networks approximate discrete logical functions).  Of course the term sigmoid' is not very precise since a sigmoid can be any function which is the integral of a bump.   Therefore, to make things concrete (and again stay consistent with Rafler's paper) we will use the logistic sigmoid:

$\sigma(x,a) = \frac{1}{1 + \exp(-\frac{4}{\alpha} (x-a))}$

It is a bit easier to understand what $\sigma(x,a)$ does if we plot it, taking $\alpha=4, a=1$:

Plot of the logistic sigmoid $\sigma(x,0)$ from x = -6 to 6. Image (c) Wikipedia 2008. Author Geoff Richards.

Intuitively, $\sigma(x,a)$ smoothly goes to 0 if x is less than a, and goes to 1 if x is greater than a.  The parameter $\alpha$ determines how quickly the sigmoid steps’ from 0 to 1, while the parameter a shifts the sigmoid. Using a sigmoid, we can represent 0/1, alive/dead state of a cell in the effective field by thresholding:

$\mathrm{alive}(x,t) \approx \sigma(M(x,t), 0.5)$

The idea is that we think of effective field values greater than 0.5 as a live, and those less than 0.5 as dead.  We can also adapt this idea to smoothly switch between any two different values depending on whether a cell is alive or dead:

$\sigma_m(a,b) = a (1 - \sigma(m,0.5)) + b \sigma(m,0.5)$

The idea is that $\sigma_m$ selects between a and b depending on whether the cell is dead or alive respectively.  The other thing we will need is a way to smoothly threshold an interval.  Fortunately, we can also do this using $\sigma$.  For example, we could define:

$\sigma_n(a, b) = \sigma(n,a) (1 - \sigma(n,b))$

Plot of $\sigma_n(0.25, 0.75)$ for n=0 to 1

Putting it all together, let’s now make a state transition function which selects the corresponding interval based on whether the state of the cell is currently alive or dead:

$S(n, m) = \sigma_n(\sigma_m(b_1, b_2), \sigma_m(d_1, d_2))$

Where $[b_1, d_1], [b_2, d_2]$ are a pair of intervals which are selected based on user specified parameters.  Based on the game of life, a reasonable set of choices should be $b_1 \approx \frac{2}{8}, b_2 \approx d_1 \approx d_2 \approx \frac{3}{8}$, which if plotted gives a chart that looks something like this:

This is the transition function for SmoothLife as defined by Rafler.  The squishing of this plot relative to the original plot is just a side effect of rescaling the axes to the unit interval, but even so the similarity is unmistakable.  The only thing left is to pick the value of $\alpha$.  In the original paper on SmoothLife, Rafler allows for two different values of $\alpha$; $\alpha_n, \alpha_m$ for their appearance in $\sigma_n, \sigma_m$ respectively.  Along with the bounds for the the life/death intervals $b_1, b_2, d_1, d_2$, there are 6 total parameters to choose from in building a cellular automaton for SmoothLife.

### Timestepping

Now, there is still one nagging detail.  I have not yet told you how the time works in this new version of life.  In Rafler’s paper he gives a couple of possibilities.  One simple option is to just do discrete time stepping, for example:

$f(x,t+1) = S(N(x,t), M(x,t))$

However, this has the disadvantage that the time stepping is now discretized, and so it violates the spirit of SmoothLife somewhat.  Another possibility (advocated in Rafler’s paper) is to rewrite this as a differential equation, giving the following smooth update rule:

$\frac{d f(x,t)}{dt} = 2 S(N(x,t), M(x,t))-1$

I found this scheme to give poor results since it tends to diverge whenever the state of a cell is relatively stable (leading to field values outside the range [0,1]). (UPDATE: I just got an email from Stephan Rafler, apparently the equation I wrote earlier was wrong.  Also, he recommends that the field values be clamped to [0,1] to keep them from going out of bounds.)  A slightly better method which I extracted from the source code of SmoothLife is the following technique that pushes the state of the effective field towards S exponentially:

$\frac{d f(x,t)}{dt}= S(N(x,t), M(x,t))-f(x,t)$

It is also possible to replace f(x,t) with M(x,t) on the right hand side of the above equations and get similar results.  It is claimed that each of these rules can produce interesting patterns, though personally I’ve only observed the first and last choices actually working…

## Discretizing SmoothLife

Now that we have the rules for SmoothLife, we need to figure out how to actually implement it, and to do this we need to discretize the field f somehow.  One common way to do this is to apply the Galerkin method.  The general idea is that for a fixed t, we write f as a weighted sum of simpler basis functions $\phi_i$ (which I will deliberately leave unspecified for now):

$f(x,t) = \sum \limits_i w_i \phi_i(x)$

The undetermined weights $w_i$ are the degrees of freedom we use to represent $f(x,t)$.  For example, we could represent f as a polynomial series expansion, in which case $\phi_i(x) = x^i$ and the weights would just be the coefficients, $d^i f(x,t) / d x^i$.  But there is no reason to stop at polynomials, we could approximate f by whatever functions we find convenient (and later on we’ll discuss a few reasonable choices.)  What does this buy us?  Well if we use a finite number of basis elements, $\phi_i$, then we can represent our field f as a finite collection of numbers $w_i$!  Cool!  So, how do we use this in our simulation?

Well given that $f(x,t) = \sum w_i \phi_i(x)$ at time t, and supposing that at time t+1 we have $f(x,t+1) = \sum v_i \phi_i(x)$, all we need to do is solve for the coefficients $v_i$.  This could be done (hypothetically) by just plugging in f(x,t), f(x,t+1) into both sides of Rafler’s SmoothLife update equation:

$f(x,t+1) = S(N(x,t), M(x,t))$

To compute M we just plug in f(x,t):

$M(x,t) = \int \limits_{|x-y| \leq h} \sum \limits_i w_i \phi_i(x-y) dy$

$= \sum \limits_i w_i \int \limits_{|x-y| \leq h} \phi_i(x-y) dy$

And a similar formula holds for N as well.  Therefore, we have the equation:

$\sum \limits_i v_i \phi_i(x) = S \left ( \sum \limits_i w_i \int \limits_{|y| \leq h} \phi_i(x-y) dy, \sum \limits_i w_i \int \limits_{h \leq |y| \leq 3h} \phi_i(x-y) dy \right)$

Unfortunately, it is usually not the case that there is an exact solution to this equation.  The underlying reason the above idea may fail is that it might not be possible to represent our solution in the basis functions $\phi_i$ exactly.  So we have to modify the equation somewhat.  Instead of trying to get a perfect solution (which is generally impossible for arbitrary nonlinear PDEs or crazy boundary conditions), we’ll settle for approximating the solution as best as possible.  The way we do this that we seek to find a choice of $w_i$ which minimizes an $L^2$ error in some suitable metric.  For example, we could try to solve the following the following optimization problem instead:

Minimize $\int (f(x,t+1) - S(N(x,t), M(x,t)))^2 dx$

This type of system is sometimes called the weak formulation of the boundary value problem, and by standard linear algebra arguments finding the solution amounts to doing a subspace projection.  In general this could be pretty annoying, but if we suppose that each of our $\phi_i$ were orthonormal — that is,

$\int \phi_i(x) \phi_j(x) dx = \left \{ \begin{array}{cc} 1 & \mathrm{if} \: i=j \\ 0 & \mathrm{otherwise} \end{array} \right.$

Then we know that,

$v_i = \int \phi_i(x) S(N(x,t), M(x,t)) dx$

and so all we have to do to compute the next state is integrate S(N,M) against each $\phi_i$.  A similar derivation applies for the smooth time stepping rules as well (modulo a small technical condition of $\phi_i$ that they must be differentiable).

### Boundary Conditions

Ok, so we know (at least theoretically) what must be done to discretize the system.  But how can we possibly hope to represent an infinite space using a finite number of basis functions?  One simple solution is that we can make the space finite again by adding a periodicity condition.  That is, let us require that for all x,y:

$f(x,y) = f( 2\pi + x, 2 \pi +y)$

This means that we can parameterize the state of the infinite grid by just describing what happens within the region $[0,2 \pi)^2$.  This square is compact and so we have at least some hope of being able to approximate a periodic f by some finite number of functions covering this square.

### Sinc Interpolation

Finally, we come to the point where we must pick a basis in order to go any further.  There is some level at which the particular choice of representative functions $\phi_i$ is arbitrary and so one could reasonably justify almost any decision.  For example, I could use B-splines, polynomials, sine waves, fractals or whatever I felt like (at least as long as I can integrate it to compute N and M).  What I am going to describe here is by no means the only way to proceed.  For example, in his paper Rafler uses a very simple bilinear expansion in terms of quadrilateral elements to get a decent discretization.  (And if you want the details scroll up and read his write up).

However, because our system is periodic and translationally invariant (although not linear) there is one basis which has at least a somewhat more special status, and that is the Dirichlet or aliased sinc basis.  Let R be the resolution of a uniform grid, and define:

$\mathrm{asinc}_{R}( x ) = \frac{ \sin( R x / 2) }{ R \sin(x / w) }$

Then we index our basis functions by a pair of indices $0 \leq i,j < R$ with:

$\phi_{i,j}(x,y)=\mathrm{asinc}_{R} (x-\frac{2\pi i}{R})\mathrm{asinc}_{R}(y-\frac{2\pi j}{R})$.

It may seem surprising at first, but these basis functions are actually orthogonal.  If you’ve never seen something like this before, please don’t be alarmed!  That definition (which seemed to come out of left field) is really a manifestation of the famous (and often misunderstood) Nyquist sampling theorem.  It basically says that if we write any periodic, band limited function as a sum of sincs:

$f(x,y) = \sum_{i=0}^R \sum_{j=0}^R w_{i,j} \phi_{i,j}(x,y)$

Then the weights $w_{i,j}$ are just the values of the function at the grid points:

$f(\frac{2 \pi }{R} i, \frac{2 \pi}{R} j) = w_{i,j}$

This means that for any suitably smooth f, we can project it to our basis by just sampling it:

$\int f(x,y) \phi_{i,j}(x,y) dx dy \approx f(i,j)$

This makes our code way simpler, since we can avoid doing some messy numerical integration to get the weights.  Assuming that S(N,M) is smooth enough (which in practice it is), then all we have to do to perform a discrete update is figure out how to compute M(x,y), N(x,y) for any given sinc expansion of f.  In other words, we want to compute:

$M(x,y) = \int_{|p^2 + q^2| \leq h^2} f(x-p, y-q) dp dq$

But this is just a convolution convolution with the indicator function of the unit disk:

$M(x,y,t) = \int_{0}^{2 \pi} \int_{0}^{2 \pi} 1_{B_h}(p,q) f(x-p, y-q, t) dp dq$

If you know a bit of Fourier analysis, you should know what comes next.  Just take a Fourier transform and apply the convolution theorem to both sides, giving:

$\hat{M}(\omega_x,\omega_y,t)=\hat{1_{B_h}}(\omega_x,\omega_y)\hat{f}(\omega_x,\omega_y)$

Which we can actually solve exactly, since the Nyquist theorem implies that $\hat{f}$ is a finite sum. Therefore:

$M(x,y)=\sum \limits_{i=0}^{R}\sum \limits_{j=0}^{R}\hat{1_{B_h}}(i,j)\hat{f}(i,j)e^{2\pi\sqrt{-1}(ix+jy)}$

And by a bit of calculus, the Fourier transform of $1_{B_h}$ can be computed in closed form using Bessel functions.  This means that:

$M(x,y)=\sum \limits_{i,j=0}^R \frac{\sqrt{3 h}}{4 \sqrt{i^2+j^2}} J_1(2 \pi h \sqrt{i^2+j^2}) e^{2 \pi \sqrt{-1}(ix+jy)}\widehat{f}(i,j)$

A similar formula for N(x,y,t) can be obtained by writing:

$N(x,y) = \int (1_{B_{3h}} - 1_{B_{h}})(p,q) f(x-p,y-q,t) dp dq$

Putting it all together, let $w'_{i,j}$ be the new weights for f at time t+1.  Then to do a discrete time step we just set:

$S(N(2 \pi i /R, 2 \pi j/R), M(2 \pi i/R, 2 \pi j / R)) \mapsto w'_{i,j}$

I’ll leave it as an exercise to work out the update rules for the continuous time steps.

## Implementation Details

Great, so we have some formulas for computing the next time step, but if we do it in a dumb way it is going to be pretty slow.  The reason is that each of the above steps requires summing up all the coefficients of f to calculate M and N at each point, which is going to take linear work with respect to the number of terms in our summation.  Spread out over all the points in the grid, this makes the total update $O(n^2)$.  However, because the above sums are all Fourier series we can speed this up a lot using a fast Fourier transform.  Basically we precalculate the weights by evaluating the Bessel function at each frequency, then at each time step we Fourier transform f, multiply by the weights, inverse transform and repeat.  Here is what it looks like:

• Initialize f
• Precalculate $\widehat{1_{B_h}}, \widehat{1_{B_{3h}}}$
• For t=1 to n
• Use FFT to compute $\widehat{f}$
• Set $M = \frac{1}{2 \pi h^2} \mathcal{F}^{-1}( \widehat{1_{B_h}} \widehat{f} )$
• Set $N = \frac{1}{8 \pi h^2} \mathcal{F}^{-1}((\widehat{1_{B_{3h}}}-\widehat{1_{B_h}})\widehat{f})$
• Set $f(i,j) = S(N(i,j), M(i,j))$

And that’s it!

## Demo

If you want to try out SmoothLife yourself in your browser, I made a few jsFiddles which illustrate the basic principle.  Here is a Fourier based implementation that follows the discussion in this article pretty closely:

http://jsfiddle.net/mikola/aj2vq/

I also made a second WebGL/GPU based implementation that uses a discretization similar to that proposed in Rafler’s original paper:

http://jsfiddle.net/mikola/2jenR/

You can run either of them in  your browser and try changing the parameters.

## Next Time…

The stuff in this article is basically background material, and most of it can be found in Rafler’s paper or is already more-or-less documented in the source code for the SmoothLife project on SourceForge.  However, in the sequel I plan to go a bit farther than the basic rules for SmoothLife and tell you how to make a version of SmoothLife for curved surfaces!

Posted in Game of Life, Mathematics, Programming | Tagged , , | 11 Comments