DynamicVertexBuffer versus DrawUserPrimitives, Round 2
More than a year ago, I did some benchmarking in XNA 3.1, comparing the vertex throughput I could achieve on my GeForce 8800 via XNA’s DynamicVertexBuffer class versus just calling GraphicsDevice.DrawUserPrimitives(). Here’s my earlier benchmark: Efficiently Rendering Dynamic Vertices.
In all cases,
DrawUserPrimitives() was marginally faster than
DynmicVertexBuffer, but it appeared to be a very bad idea to use
DynamicVertexBuffer on the Xbox 360. I had a really nice discussion with
Shawn Hargreaves on the XNA forums where he provided a lot of in-depth information about how
things work on the Xbox 360: .
One of today’s threads on
the AppHub forums reminded me if my earlier benchmarks, so I decided to dig out my old benchmark
and redo it in XNA 4.0. The benchmark uses my Nuclex Framework‘s
which underwent some changes since then, so I repeated the XNA 3.1 benchmarks in addition to
getting the new data for XNA 4.0.
I’m running these benchmarks in XNA’s default resolution, as a release build with no attached debugger. My Xbox 360 is the “Elite” variant with the 65 nm “Falcon” chipset, I believe. My PC is an old Athlon64 x2 6000+ with a GeForce 460 GTX running Windows 7 x64.
The benchmark draws vertices in batches of 8192. The
UserPrimitiveBatchDrawer simply calls
DrawIndexedUserPrimitives() and the
DynamicBufferBatchDrawer creates a vertex
buffer fitting 32768 vertices, filling it 4 times with
SetDataOptions.None lock (ideally this would be
All vertices are constructed from scratch each frame but constructing each quad in a local
array, then letting the
PrimitiveBatch copy said array into its batching buffer, which gets
rendering when 8192 vertices have accumulated in it.
XNA 3.1 Benchmarks
With the latest and greatest release of my
PrimitiveBatch, the performance of
DynamicVertexBuffers are virtually identical:
Last time, I clipped the FPS values at 550 so the interesting region around 60 FPS wasn’t compressed into just a few pixels. I’ll provide the original, unclipped chart this time as well:
XNA 4.0 Benchmarks
This is the same benchmark application running in XNA 4.0. The XNA 4.0 build of my PrimitiveBatch eliminates the code paths for TriangleFans and PointLists and doesn’t need to pass the VertexDeclaration around:
As before, here is the unclipped line chart. Interestingly, the DynamicVertexBuffer class overtakes DrawUserPrimitives() on the PC for very small vertex numbers.
If you’re recreating the entire vertex and index buffer from scratch, it appears to be largely irrelevant whether you choose DrawUserPrimitives() or a DynamicVertexBuffer on both platforms.
The results without batching are again so abysmal that I don’t think it makes any sense to even think about forgoing batching (and with classes like XNA’s SpriteBatch and the Nuclex Framework’s PrimitiveBatch, there’s really no reason to not batch).
Also somewhat interesting is that a GeForce GTX 460 is about 8 times faster as the Xbox’ GPU for the operations done by this benchmark (which is GPU-bound from ~2000 vertices onwards).
You can download the source code of my benchmark suite here:
[rokdownload menuitem=”32″ downloaditem=”47″ direct_download=”true”]PrimitiveBatchBenchmark.7z[/rokdownload] (177 KiB)