The problem with tessellation in DirectX 11

As you may have heard, Direct X 11 brings tessellation support, and all the hardware vendors and benchmarks are going crazy with super-finely tessellated meshes, promising us automatic level of detail and unprecedented visual fidelity. What you may not be aware of is that the Xbox 360 has very similar tessellation hardware too, which means many game developers have had the opportunity to use these same techniques for about five years now. So why aren’t all console games using tessellation pervasively? That’s what I will tell you in this blog post, as well as demonstrate a solution to the problem.

The clue is present in just about any of the dozens of tessellation demos that are now available. In case you haven’t seen one, here’s a capture I just took of one of the samples in DirectX 11 (ignore the frame rate, the capturing software interferes).

[youtube=http://www.youtube.com/watch?v=hlbaOrpMj6I&hl=en]

This looks good and all, but notice the mesh density. Even when not doing any tessellation at all this is a 50x50 grid! That’s five thousand triangles at the lowest level of detail for roughly one square metre of cobblestone. That’s plainly a ridiculous poly-count for any practical purposes in games. There are two main problems with such a high vertex density: it wastes time processing more vertices than needed, and leads to the small triangle problem.

It’s wasteful because even at a modest distance you’re not going to need 5000 triangles to represent a square metre of ground, so transforming that many vertices is just throwing precious cycles away.

The small triangle problem refers to the efficiency loss when rendering small triangles. Current GPUs rasterize pixels in small groups of at least 2x2 pixels. Whenever a triangle covers the entire “quad” of 2x2 pixels all is well, if however some of those pixels are not covered then the GPU resources associated with the unused pixels are just squandered. For example, if each triangle only covers a single pixel, then every one of those quads will be ¾ unused, leading to just 25% pixel shading efficiency.

The problem

So why are all these demos using control meshes that are already finely tessellated? It’s no accident. There are two main reasons for this. The first is that you can only specify tessellation factors on a per-edge level, so if you want to adaptively tessellate some areas more than others, you’re going to have to have a dense distribution of edges to support that variation across the surface. The bigger reason, though, is that smoothly varying tessellation on a displaced surface looks like rubbish at lower tessellation levels. See this example, which is just the previous demo with a more reasonable base mesh resolution (4x4, instead of 50x50).

[youtube=http://www.youtube.com/watch?v=A1H8UbSQpUw&hl=en]

Notice the shimmering, bucking, artefact as the tessellation slider is moved up and down. This is the dirty little secret that Xbox 360 developers have known for years: at reasonable mesh densities, continuous tessellation just looks awful and is unusable as a general strategy. We used it in Banjo Kazooie : Nuts & Bolts for water, because the shimmering artefact actually looked pretty decent on a water surface that was supposed to shimmer, but while we tried to use it on some other things we could never live with the artefacts (the solution below is something that came to me after BKNB shipped, and could in principle be implemented in a future title).

A solution

So what’s going on here? Well the problem is that as vertices get added and removed they smoothly travel to their final location by sliding across the surface, this lead to vertices rapidly bobbing up and down as they move over the surface and sample different displacements from the displacement map. This problem is not a new artefact, it’s actually just standard minification aliasing. When you do normal straight texture mapping, the texels are sampled at pixel locations. If the ratio of texel density to pixel density gets too high you get minification aliasing because each pixel “doesn’t know” which of the many texels it covers to sample from, so small changes in pixel position will lead to entirely different texels being used. For regular texture mapping we solve this by using MIP-mapping. Effectively just choosing a lower resolution version of the texture when the sample locations are too sparsely distributed to accurately reconstruct the full resolution texture.

Displacement mapping is no different. Instead of sampling at pixels we have vertices, and instead of colours the values sampled are geometric offsets so the aliasing artefacts manifest in a different way. The basic problem, though, is that we’re reading a high frequency texture (the displacement map) at too low of a sampling frequency. The solution to this problem is the same as for regular texture mapping – use MIP-mapping to choose a lower resolution texture when the sampling frequency (i.e. tessellation factor) is lower.

So how do we determine which LOD to use? Well, one simple way is to look at the length of each edge in the control mesh in texture space, and choose a MIP level for each edge so that the distance in texels for each subdivision will be no more than 0.5 texels (this is to get us under the Nyquist limit which says that the sampling frequency must be twice that of the signal frequency). In other words, if the length of the edge in pixels is L, and the edge’s tessellation factor is T, then we will get L/T pixels per subdivision. We want that to be 0.5 by choosing a MIP level, so we have to choose a MIP level M so that (L/T)*2^(-M) = 0.5 (the linear distance in texture space decreases by a factor of two for each MIP level). Solve for M and we get:  M = log2(L/T) + 1. In practice, linear interpolation isn’t a perfect reconstruction filter, so we may need a small fudge factor to boost the MIP level up slightly further. Note that although we use MIP mapping, the MIP level used doesn’t depend on viewing distance or angle, just on the tessellation factor (which in turn may depend on those factors, of course).

So, now that we have a MIP level per edge, we can simply interpolate between them in the Domain Shader to pick a suitable MIP level for each verex. It’s important that the interpolation you use has the property that when the point is on an edge, the weights for the two other edges are zero, so that edge-vertices use the same MIP level regardless of which patch they belong to. Here’s how this looks:

[youtube=http://www.youtube.com/watch?v=ePwtk_M1058&hl=en]

We can see the basic idea working here. Rather than shimmering artefacts when the tessellation level is too low to adequately represent the displacement, we just get a flatter surface instead (due to choosing a lower-res MIP level of the displacement map). However, it should be obvious that there’s a problem here if you look closely at the “spikes” visible at each vertex in the base mesh. Each one of those lies on two edges per patch, so won’t know from which to retrieve a MIP level. So what do we do? Just pick one of the two candidate edges at random? Take the average? No, neither will work work because neighbouring patches can have entirely different edges using the vertex. In fact, there can be an arbitrary number of patches associated with a vertex, and for each of them that vertex must pick the exact same MIP level in order to avoid cracks.

So what do we do for the control vertices themselves? Ideally, we’d find the average MIP level for all the edges used by a vertex, but that would be expensive since tessellation factors (and therefore MIP levels) can change each frame. The simplest solution I can think of is to store a “preferred edge” index for each control vertex. This would simply be a randomly chosen edge that uses that vertex. When you detect that you’re on a control point (by checking the barycentric coordinates), you simply check the vertex’s preferred edge index, and fetch the MIP level associated with that edge. Note that the preferred edge is not necessarily in the same patch as the current patch, but it is consistent in that every patch using that vertex will use the same preferred edge, and therefore MIP level, which eliminates cracks.

Here’s how this looks:

[youtube=http://www.youtube.com/watch?v=5tZ6YhHV4XE&hl=en]

This is much better. We’ve got rid of most of the aliasing, and the corners fall in line with a MIP level chosen from its immediate neighbourhood. Notice that there’s still some ever-so-subtle shimmering going on in the creases here. That’s because we’re approximating the MIP level based on the coarse control mesh, whereas in reality the actual triangles in the tessellated mesh vary in size and shape. The main downside to this strategy is that we need to compute edge tessellation and MIP levels outside the main draw calls to produce a buffer of per-edge tessellation factors and MIP levels, which not only adds a draw call and associated bandwidth increases, but also requires us to transform all of our control vertices several times (assuming the tessellation factor depends on viewing direction, skinning etc.).

It would be ideal if the domain shader would give you some information about adjacent patches, so that we could easily compute an average LOD for all the edges connected to a vertex.

Other solutions

There are a couple of other potential solutions to this problem, so I’m not trying to say that tessellation and displacement mapping is doomed or anything, I’m just trying to temper some of the wild enthusiasm by pointing out that the claims of “automatic LOD” etc. are vastly overstated.

For example, this could clearly be combined with a standard LOD system, where only the high-res mesh uses tessellation. Although at that point it’s unclear whether it’s even worth the effort. Just let the high res model be really high-res instead.

Another option is to use fixed-step tessellation factors, instead of the continuous ones. This causes harsher transitions between levels of details, but at least you don’t have vertices sliding across the surface causing the disturbing shimmering demonstrated above. But then again, if you’re going to use discrete tessellation levels then simply using a normal LOD system is likely to be faster, and will definitely be a lot simpler.

Using power-of-two tessellation is promising. In this mode vertices are added in a “power of two” fashion by subdividing each edge. This is attractive because the tessellation pattern is simple enough that you could do geo-morphing by computing, in the domain shader, the previous height for the current point and the actual displaced height and then interpolating between them. The downside with this is that in order to compute the previous height you have to somehow figure out where the current point lies on the surface of the mesh tessellated at a lower level. This means you’re effectively duplicating the effort of the fixed function tessellator in the domain shader, which doesn’t sit right with me.

Conclusion

Generating geometry by sampling a texture at varying frequencies is not a trivial problem, and requires careful consideration in order to avoid the same aliasing problems that we’ve already dealt with for regular texture mapping. Unlike regular texture mapping, however, we don’t have any real help from the hardware in figuring out the appropriate MIP level for a displacement map, so we have to figure out approximations ourselves, and while there are workable solutions, I haven’t been able to figure anything out that isn’t ugly or inefficient in some way. I do hope that future talks about and demos of tessellation and displacement mapping will at least acknowledge this problem, instead of just ramping up the base mesh’s polycount, sweeping the issue under the carpet.

Update

A simple variation of this idea which has the benefit of being very cheap is to compute a per-object representative MIP level on the CPU (in some approximate way), and then in the domain shader simply interpolate between the MIP level you got from the edges, and this per-object MIP level, based on how close you are to a hull vertex (e.g. use the maximum value of your barycentric coordinates). You’d probably want to ensure that only the vertices that are very close to the hull vertices get influenced by the per-object MIP level, but this would at least ensure that all hull vertices use exactly the same MIP level (the per-object one!) while keeping the surface looking smooth.

Update 2

****Obvious-in-retrospect tweak to this technique: simply store the “UV coverage” of each control vertex (basically the average UV area for all the triangles touching it divided by 3). If you know the area in UV-space for a vertex, you can compute an appropriate MIP level for it that does not depend on the triangles it’s used in. In the domain shader, detect when you’re at a control point and use the per-vertex MIP-level. This is cheap, and better than a per-object MIP level. This is looking like a pretty workable solution. It’s still not perfect because patches with a lot of internal variation has to be seriously over sampled to capture all the high frequencies.

sebastiansylvan