DynamicVertexBuffer versus DrawUserPrimitives, Round 2

More than a year ago, I did some benchmarking in XNA 3.1, comparing the vertex throughput I could achieve on my GeForce 8800 via XNA’s DynamicVertexBuffer class versus just calling GraphicsDevice.DrawUserPrimitives(). Here’s my earlier benchmark: Efficiently Rendering Dynamic Vertices.

In all cases, DrawUserPrimitives() was marginally faster than the DynmicVertexBuffer, but it appeared to be a very bad idea to use a DynamicVertexBuffer on the Xbox 360. I had a really nice discussion with Shawn Hargreaves on the XNA forums where he provided a lot of in-depth information about how things work on the Xbox 360: .

One of today’s threads on the AppHub forums reminded me if my earlier benchmarks, so I decided to dig out my old benchmark and redo it in XNA 4.0. The benchmark uses my Nuclex Framework‘s PrimitiveBatch class, which underwent some changes since then, so I repeated the XNA 3.1 benchmarks in addition to getting the new data for XNA 4.0.


I’m running these benchmarks in XNA’s default resolution, as a release build with no attached debugger. My Xbox 360 is the “Elite” variant with the 65 nm “Falcon” chipset, I believe. My PC is an old Athlon64 x2 6000+ with a GeForce 460 GTX running Windows 7 x64.

The benchmark draws vertices in batches of 8192. The UserPrimitiveBatchDrawer simply calls DrawIndexedUserPrimitives() and the DynamicBufferBatchDrawer creates a vertex buffer fitting 32768 vertices, filling it 4 times with SetDataOptions.NoOverwrite, then doing one SetDataOptions.None lock (ideally this would be a SetDataOptions.Discard lock).

All vertices are constructed from scratch each frame but constructing each quad in a local array, then letting the PrimitiveBatch copy said array into its batching buffer, which gets rendering when 8192 vertices have accumulated in it.

XNA 3.1 Benchmarks

With the latest and greatest release of my PrimitiveBatch, the performance of DrawUserPrimitives() and DynamicVertexBuffers are virtually identical:

Line chart showing the Xbox to manage about 6400 quads before going below 60 FPS

Last time, I clipped the FPS values at 550 so the interesting region around 60 FPS wasn’t compressed into just a few pixels. I’ll provide the original, unclipped chart this time as well:

Unclipped line chart with the Xbox reaching 1300+ FPS on small numbers of vertices

XNA 4.0 Benchmarks

This is the same benchmark application running in XNA 4.0. The XNA 4.0 build of my PrimitiveBatch eliminates the code paths for TriangleFans and PointLists and doesn’t need to pass the VertexDeclaration around:

Line chart showing the Xbox to manage about 6400 quads before going below 60 FPS

As before, here is the unclipped line chart. Interestingly, the DynamicVertexBuffer class overtakes DrawUserPrimitives() on the PC for very small vertex numbers.

Unclipped line chart with the Xbox reaching 1400+ FPS on small numbers of vertices


If you’re recreating the entire vertex and index buffer from scratch, it appears to be largely irrelevant whether you choose DrawUserPrimitives() or a DynamicVertexBuffer on both platforms.

The results without batching are again so abysmal that I don’t think it makes any sense to even think about forgoing batching (and with classes like XNA’s SpriteBatch and the Nuclex Framework’s PrimitiveBatch, there’s really no reason to not batch).

Also somewhat interesting is that a GeForce GTX 460 is about 8 times faster as the Xbox’ GPU for the operations done by this benchmark (which is GPU-bound from ~2000 vertices onwards).

You can download the source code of my benchmark suite here:
[rokdownload menuitem=”32″ downloaditem=”47″ direct_download=”true”]PrimitiveBatchBenchmark.7z[/rokdownload] (177 KiB)


1 ping

  1. Hassan Selim says:

    These are some interesting benchmarks you have there :)
    I have a question, if I want to render some polygons that would just move around, wouldn’t it be better if you use a normal VertexBuffer and just apply transformations using BasicEffect?
    The only problem with using normal VertexBuffers in that case is that it would be hard to batch their rendering, so I think the question is whether unbatched rendering with a normal VertexBuffer faster or not that batched rendering with a DynamicVertexBuffer (or DrawUserPrimitives).

  2. Cygon says:

    That depends greatly on the number of independent triangles you are moving around.

    The main issue is always the number of DrawPrimitive() calls, since that method is really expensive:

    • DrawPrimitive() will transition from user mode (in which unprivileged code like normal applications run) into kernel mode (in which drivers and the windows kernel run). This transition is slow.
    • Switching textures, vertex buffers and render states happens in DrawPrimitive(), which will check if any of these settings changed and apply them.

    As a rule of thumb, stay well below 200 DrawPrimitive() calls per frame for a game that should run on Xbox 360 or PS3 class hardware.

    A few examples:

    • You have 100 models, each consisting of 1000 triangles -> keep model in a static vertex buffer and use DrawPrimitive() 100 times.
    • You have 10000 particles, each consisting of 2 triangles -> use a dynamic vertex buffer to batch them and do between 2 to 4 DrawPrimitive() calls to keep the GPU busy while you’re writing to the vertex buffer with D3DLOCK_NOOVERWRITE.
  1. How to Choose a Game Engine | Sparkrift says:

    […] to 200 sprites before, if we had been using sprite batching we could’ve potentially been drawing 20,000 sprites at once with similar […]

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Please copy the string OQawh9 to the field below:

Social Widgets powered by AB-WebLog.com.