Chart showing the PrimitiveBatch benchmark results in XNA 4.0, clipped to 500 FPS

DynamicVertexBuffer versus DrawUserPrimitives, Round 2

More than a year ago, I did some benchmarking in XNA 3.1, comparing the vertex throughput I could achieve on my GeForce 8800 via XNA’s DynamicVertexBuffer class versus just calling GraphicsDevice.DrawUserPrimitives(). Here’s my earlier benchmark: Efficiently Rendering Dynamic Vertices.

In all cases, DrawUserPrimitives() was marginally faster than the DynmicVertexBuffer, but it appeared to be a very bad idea to use a DynamicVertexBuffer on the Xbox 360. I had a really nice discussion with Shawn Hargreaves on the XNA forums where he provided a lot of in-depth information about how things work on the Xbox 360: .

One of today’s threads on the AppHub forums reminded me if my earlier benchmarks, so I decided to dig out my old benchmark and redo it in XNA 4.0. The benchmark uses my Nuclex Framework‘s PrimitiveBatch class, which underwent some changes since then, so I repeated the XNA 3.1 benchmarks in addition to getting the new data for XNA 4.0.

Environment

I’m running these benchmarks in XNA’s default resolution, as a release build with no attached debugger. My Xbox 360 is the “Elite” variant with the 65 nm “Falcon” chipset, I believe. My PC is an old Athlon64 x2 6000+ with a GeForce 460 GTX running Windows 7 x64.

The benchmark draws vertices in batches of 8192. The UserPrimitiveBatchDrawer simply calls DrawIndexedUserPrimitives() and the DynamicBufferBatchDrawer creates a vertex buffer fitting 32768 vertices, filling it 4 times with SetDataOptions.NoOverwrite, then doing one SetDataOptions.None lock (ideally this would be a SetDataOptions.Discard lock).

All vertices are constructed from scratch each frame but constructing each quad in a local array, then letting the PrimitiveBatch copy said array into its batching buffer, which gets rendering when 8192 vertices have accumulated in it.

XNA 3.1 Benchmarks

With the latest and greatest release of my PrimitiveBatch, the performance of DrawUserPrimitives() and DynamicVertexBuffers are virtually identical:

Line chart showing the Xbox to manage about 6400 quads before going below 60 FPS

Last time, I clipped the FPS values at 550 so the interesting region around 60 FPS wasn’t compressed into just a few pixels. I’ll provide the original, unclipped chart this time as well:

Unclipped line chart with the Xbox reaching 1300+ FPS on small numbers of vertices

XNA 4.0 Benchmarks

This is the same benchmark application running in XNA 4.0. The XNA 4.0 build of my PrimitiveBatch eliminates the code paths for TriangleFans and PointLists and doesn’t need to pass the VertexDeclaration around:

Line chart showing the Xbox to manage about 6400 quads before going below 60 FPS

As before, here is the unclipped line chart. Interestingly, the DynamicVertexBuffer class overtakes DrawUserPrimitives() on the PC for very small vertex numbers.

Unclipped line chart with the Xbox reaching 1400+ FPS on small numbers of vertices

Conclusions

If you’re recreating the entire vertex and index buffer from scratch, it appears to be largely irrelevant whether you choose DrawUserPrimitives() or a DynamicVertexBuffer on both platforms.

The results without batching are again so abysmal that I don’t think it makes any sense to even think about forgoing batching (and with classes like XNA’s SpriteBatch and the Nuclex Framework’s PrimitiveBatch, there’s really no reason to not batch).

Also somewhat interesting is that a GeForce GTX 460 is about 8 times faster as the Xbox’ GPU for the operations done by this benchmark (which is GPU-bound from ~2000 vertices onwards).

You can download the source code of my benchmark suite here:
[rokdownload menuitem=”32″ downloaditem=”47″ direct_download=”true”]PrimitiveBatchBenchmark.7z[/rokdownload] (177 KiB)