Graphics Performance Analysis

This page contains information regarding the tools provided by Vortex® to help find graphics performance issues.

Vortex Performance Tools

Different tools are provided in order to keep an eye out for graphics performance issues. More often than not though, the users will recognize the potential issues after experiencing stuttering and will then start investigating using these same tools. The graphics performance tools are meant to be used with the Vortex Studio Player; Vortex Studio Editor is not optimized for performance and should not be used to calculate and investigate performance issues.

Polygon Mode Switching

When pressing F2, the user can change the polygon modes of what is displayed. The polygon modes are as follows:

  • Normal: No changes to shading
  • Wireframe: Displays the meshes as wireframes. Useful to find meshes with a lot of vertices.
  • Point: Shows the meshes as points only. Useful to find out if there is some pixel shading bottleneck.

NormalWireframePoint

Statistics

This shows an overlay of statistics which comes from the Profiler. Values are updated at every frame and provides both the instant FPS as well as the average FPS over 60 frames.

Note that the min and max counts are usually the ones the user should be looking at in order to find spikes and instability.

  • Cur: Value for the current frame.
  • Min: Minimum value over 60 frames.
  • Max: Maximum value over 60 frames.
  • Avg: Average of the accumulated values over 60 frames.

The following lists the different pages which can be cycled through by pressing F3.

Page 1

  • Synchronisation Mode (V Sync, SW Sync, Sync Off, SW+V Sync): Indicates the current sync mode used by the node, and can be toggled with F4, in the top left corner.
  • Frame Rate: Rate of the frame rendering (frame per second); inverse of cycle time.

Page 2

  • Timings in ms (milliseconds):
    • Cycle Time: Time taken by the application to execute all instructions for one frame and come back to the same point in the code (application update + time spent outside of the update function.
    • Application Update: Time taken by the application to execute all instructions for one frame (UI, Graphics, Network...).
    • SW Sync: Time taken by the application to wait for the time step to reach 16 ms/60 Hz - depends on current synchronization mode.
    • UI Events: Time taken to compute user interface events in the application.
    • Graphics: Time taken by the graphics module to execute all graphics instructions for one frame.
      • Extensions: Time taken by the graphics extensions to complete a cycle.
      • Snapshots: Time taken to call all snapshotables takeSnapshot().
      • PV Snapshots: Time taken to call all snapshotables per view takeSnapshot().
      • Culling: Time taken to cull unnecessary geometry.
      • Resources: Time taken to prepare the required GPU resources (e.g., textures, meshes).
      • CPU Draw: Time taken to perform the drawcalls.
      • Buffer Swap & VSync: Time taken to execute the buffer swap and VSync; depends on current synchronization mode.
    • Network: Time taken to deserialize the network information.
      • GPU Draw: Time taken to fully complete a set of GL commands to render all views.
  • Usage in percentage; press Ctrl+F3 for it to activate (may impact performance):
    • CPU Memory: Percentage of system-used CPU memory on the machine running the application.
    • GPU Memory: Percentage of used GPU memory on the graphics card running the application.
    • GPU Processor: Percentage of GPU usage on the graphics card running the application.
    • GPU Memory I/O: Percentage of memory I/O usage on the graphics card running the application.

Page 3

  • Lights: The total amount of lights in the scene (active or not).
  • Lights (Active): The total amount of activated/visible lights in the scene.
  • Lights (Casting Shadows): The total amount of active lights casting shadows.
  • Shadow Map Count: The shadow map count. The point, spot and directional lights usually use one map count. The directional light depends on the settings in the Adaptive Feature Controller extension.
  • Draw Calls: The total amount of draw calls being sent to the GPU. This depends on many things including:
    • Everything rendered (3d, 2d and UI elements)
    • Frustum culling
    • Shadow casters (don't forget frustum culling applies here as well)
    • Picking (in the Editor, activates when user moves the mouse)
    • Object selection (blueish tint when selecting an object in the Editor)
    • Instancing - the instancer module puts meshes together into an instanced draw when possible. This instanced draw also has a maximum size so many draws can trigger multiple instanced draws.
  • GPU Draw Executions: The same value minus the instancing processing. In other words, the instances are not merged together.
  • Vertex Count: The total amount of vertices sent and processed by the GPU.
  • Extensions: The total amount of graphics extensions which require an update on every iteration. The HeightField is a good example but a geometry is only meant to be data so it does not require an update.
  • Extensions (Updateable): The total amount of graphics extensions which require an update on every iteration. The HeightField is a good example but a geometry is only meant to be data so it does not require an update.
  • Snapshotables: The total amount of snapshotables. A snapshotable is an object which generates snapshots sent to the graphics engine. It can be anything including a simple graphics node or even a mirror.
  • Preparators: The total amount of view snapshots which generates the draws.
  • Preparators (Enabled): The total amount of active preparators.
  • Views: The total amount of view snapshots (viewport, mirror, picture in picture, ocean reflection, etc).
  • Views (Enabled): The total amount of activated views.
  • Passes: The total amount of passes which is usually composed of every rendering step for all the different effects. For instance, clear, background, depth prepass, opaque pass, etc.
  • Passes (Enabled): The total amount of activated passes

Page 4

The graphics module frame status page. This page displays information related to the synchronization of the engine (dynamics) with the graphics. If the dynamics/engine is not running, it will display "Waiting for the engine...". Start the simulation to have it display more information.

  • Skipped: The count is increased every time the engine produces a frame that the graphics does not consume. (Graphics' time step is too high.)
  • Repeated: The count is increased every time the graphics repeats the previous engine frame since it did not received it in time. For instance, when the engine/dynamics is slower than the graphics is expecting the frame data to be available for use, or when graphics is running above 60 Hz and desynchronizes (i.e., no Vsync).

It is important to remember that stuttering can be triggered by many things other than the simulation itself (graphics, dynamics, etc). The OS or anything between the data generation and users' eyes can be a cause. This status screen is only meant to help the user identify this stuttering source.

Page 5

A list of the enabled views with their associated passes. For example, each viewport, mirror, reflection generates one. The metrics can be changed by pressing Ctrl+F3. Pressing F3 again switches to the detailed screen of the view.

Sync Modes

The sync mode refers to how the frames are synchronized, and are cycled by pressing F4. The current mode is written on the F3 overlay (top). The sync modes are as follows.

  • SW Sync: Software sync; the application waits until the frame's time step reaches 16 ms.
  • V Sync: Vertical sync; the application waits for the graphics driver to be done with the next frame buffer so that the application can start generating its next frame (to prevent tearing). This usually synchronizes at 60 Hz but if the frame time is a little above 16 ms, it will revert to 30 Hz.
  • SW+V Sync: Both modes activated at the same time.
  • Sync Off: The application does not induce any artificial or synchronization wait at all. This is the preferred mode to see impacts of extensions or modules on the performance of the application. Note though that the graphics driver might still trigger some wait since there is a required synchronization with the commands sent to the GPU in order to protect the Render Targets from being affected by the next frame in the Buffer Swap & V Sync section.

Note You can trigger the F keys through the Vortex Studio Player's Console window Network Processes tab. This is useful since access to the different nodes of a simulation can be quite complex (they do not all have a keyboard).

Profiler

The Profiler is available in the Vortex Studio Console window and it is useful since it can render a graph of the times taken by many things in the simulation including extensions and modules (e.g., texture service, frame buffer and Vsync).

See The Profiler tab and Performance Analysis for more details.

Content Debugger

The content debugger aids in changing values within the simulation. It is available in the Vortex Studio Player's Console window. For instance, it is possible to change the visibility of objects in the scene or even change some parameters of specific extensions.

See The Content Debugger tab for more details.

Troubleshooting Using the Performance Tools

The following tools were developed to help users have a healthy simulation with respect to performance. The most common issue is the content itself but some tools are needed in order to more specifically find the bottleneck. It is important to remember that if a specific item in a scene is too computationally expensive, it has to be removed; the problem will not resolve itself. However, if the item is a major part of the training, you must balance the content with available resources; removing other items might be the only way to deal with the problem.

Graphics Analysis Expert System

When all the machines that are part of the simulator are set up properly, the user can then look at using the Expert System defined in this section.

Before starting, load your content in the Vortex Studio Player and run the simulation. Also set the sync mode to Sync OFF (F4).

Bottleneck Analysis

Starting Step
  1. In the Vortex Studio Player on the graphics node or using the console page, press F3 to display the first page of the overlay.
  2. Enable the GPU card measurements using Ctrl+F3.
  3. If the GPU Processor Load is 90-100%, you are GPU-Bound.
  4. If the GPU Memory Used is 90-100% of the memory available on your video card, you are Video-Memory-Bound.
  5. Otherwise, you are CPU-Bound.
GPU-Bound
  1. Resize your viewport
    • If you are not full-screen, resize the window that contains the 3D viewport.
    • If you are full screen, open the setup document, select the 3D Display extension(s) and reduce the width and height of its viewport(s) (Alt-Enter can also trigger the window mode).
  2. If the performance has improved by resizing, you are Pixel-Shader-Bound.
  3. Otherwise, you are Vertex-Shader-Bound.

Bottleneck Reduction

Video-Memory-Bound
  1. Review your 3D models to reduce the amount of texture memory used.
    • In the Editor, use The Inspector Panel to identify mechanisms and Graphics Galleries that have a large amount of textures.
    • Re-import your textures and compress them into DXT1 and DXT5.
    • Scale down the texture files and re-import them. Remember to keep the texture dimensions as powers of two.
    • Add LODs for the models with large textures, so that further LODs use smaller textures.
    • Remove normal maps and specular maps where they have less impact.
  2. Review your 3D models to reduce the amount of vertex memory used.
    • In the Editor, use the Inspector to identify mechanisms and Graphics Galleries that have a large number of vertices.
    • Add simpler LODs for the 3D models with more vertices.
    • Reduce the number of vertices in your geometries using your preferred tool.
  3. Review your application configuration to reduce the amount of frame buffer memory used.
    • Down scale or remove post-effects, e.g., Noise, Anti-aliasing, Desaturate, Colorize.
    • Reduce the quality of the shadows: in the Adaptive Feature Controller extension, lower Parameters > Shadows > Texture Resolution and/or Texture Layer Count.
  4. Review your imported mesh models, e.g., from CAD.
CPU-Bound
  1. Enable the Vortex Studio Player Profiler on the graphics node (you might have to edit your setup document to add the page to the node).
  2. Are some specific extension updates costly?
    1. Is the Ocean Graphics cost above its typical value (between 4 ms and 6 ms)?
      1. Reduce the fft-grid-size in Ocean extension, lower all four parameters: fft-grid-dimension-x, fft-grid-dimension-y, fft-grid-size-x, fft-grid-size-y. Important: These must always be powers of two.
      2. In the Ocean Graphics extension, set the Quality Level to Tessendorf.
      3. As a last resort, you can replace the ocean with a mesh and a translucent-blue material.
    2. Are some of the Particle Spray, Particle Precipitation or Impact Event Emitter extensions costly?, see Particle-System-Common-Performance.
  3. Is the time taken by the Graphics Module too long (e.g. more than 16 ms, 1/60 fps)?
    1. Do you have some mirrors or monitors in your scene? see Monitor-Common-Performance.
    2. Do you have some shadows in your scene ? see Shadows-Common-Performance.
    3. Do you have an ocean in the scene? You can set the performance of the ocean reflection via the Adaptive Feature Controller extension's Ocean Reflection > Quality Level field.
  4. In the Editor, check with the Inspector, how many Primitive Sets are in your Graphics Galleries?
    1. The best performing models have one primitive set for each movable node.
    2. Use the Graphics Galleries document to merge nodes that will move together.
  5. In the Player, using the Content Debugger, set some 3D models to non-visible, evaluating the impact of each one, looking for the expensive ones.
Pixel-Shader-Bound
  1. Do you have mirrors or monitors in the scene? see Monitor-Common-Performance.
  2. Do you have shadows in the scene?
    1. Check how many shadow-casting lights you have in the scene, then ask yourself whether you really need all those lights to be casting shadows. You can see this value in one of the statistics tables by cycling with the F3 key.
    2. Reduce the resolution of the shadow texture array. In the Adaptive Feature Controller extension, lower Parameters > Shadows > Texture Resolution and/or Texture Layer Count.
    3. In the Adaptive Feature Controller extension, lower Parameters > Shadows > Filtering Quality.
    4. See Shadows-Common-Performance.
  3. Do you have an ocean in the scene?
    • You can set the performance of the ocean reflection via the Adaptive Feature Controller extension's Ocean Reflection > Quality Level field.
  4. Do you have particle emitters in the scene? These are the Particle Spray, Impact Event Emitter and Particle Precipitation extensions.
    1. Reduce the maximum number of particles. In the extension, lower the Particle Limit parameter.
    2. Reduce the number of active particles. In the extension, lower the Particle Lifetime, Emission Rate and/or Emission Enabled inputs. Counter-balance this by increasing the opacity of the particles.
    3. Reduce the dimension of the particles. In the extension, lower the Particle Size Range inputs.
    4. Remove normal maps from the material used in the Particle Spray extension.
    5. Remove specular maps from the material used in the Particle Spray extension.
  5. Are you using anti-aliasing in your scene?
    • You can reduce the quality of the anti-aliasing in the Graphics Module (in the setup document). You can also use F5 to select the different anti-aliasing modes.
  6. Are you using anisotropy?
    • You can lower the value of the Anisotropy parameter in the Graphics Module (in the setup document).
  7. Reduce the complexity of the lighting in the scene.
    1. Check how many active lights you have in the scene, then ask yourself whether you really need all those lights. You can see this value in one of the statistics tables by cycling with the F3 key.
    2. Use one projected-texture light to replace multiple spot lights or multiple finite directional lights.
    3. Replace weak lights with light halos.
    4. Reduce the number of lights that cast shadows.
    5. Add scripts to hide small lights when far.
  8. Reduce the complexity of the materials in the scene.
    1. Remove less noticeable normal maps.
    2. Replace specular textures by specular colors.
    3. Replace gloss textures by gloss factors.
    4. Collapse multi-textures, using their masks, into one texture.
  9. Reduce overdraw.
    1. Check that objects have their back-faces culled properly.
    2. Add scripts to hide objects that are known to be behind the viewport camera.

Common Performance Issues

Monitor-Common-Performance
  1. Make sure you have set a valid Graphics Node.
    1. This node determines the 3D volume of the mirror, so we use it to cull away the mirror when this 3D volume is not in the view-frustum.
    2. When the Graphics Node is not set, the mirror's image is updated even if not visible.
  2. Uncheck the Enable Shadow in Mirror parameter if the shadows are insignificant in the mirror's image.
  3. Consider moving the mirrors or monitors to other nodes in your network.
  4. Consider assigning the monitors or mirrors to only the roles where they will be used.
  5. You can also set the performance of the mirror via the Adaptive Feature Controller extension's Monitor > Quality Level field.
Shadows-Common-Performance
  1. Reduce the maximum distance of the shadow map.
  2. Reduce the number of objects that receive shadows.
  3. Reduce the number of splits for the directional light shadows. In the Adaptive Feature Controller extension, lower Parameters > Shadows > Number of Splits.
  4. Reduce the number of shadow maps updated per frame. In the the Adaptive Feature Controller extension, lower Parameters > Shadows > Update Rate.
  5. Reduce the number of lights that cast shadows.
Particle-System-Common-Performance
  1. Reduce the maximum number of particles. In the extension, lower the Particle Limit parameter.
  2. Reduce the number of active particles. In the extension, lower the Particle Lifetime, Emission Rate and/or Emission Enabled inputs. Counter-balance by increasing the opacity of the particle.

Examples

Let's look at a few examples and look at different bottlenecks.

CPU-Bound

NameDescription
Forwarder Demo Scene

The first thing we should notice is that the GPU is not overloaded. This often points toward the CPU being the bottleneck. In this case, the CPU draw and culling seem to be the highest.

Draw Call Count

Again, the GPU seems all right. A high value in the "CPU draw" section usually means there's too much for the engine to compute. Looking at the amount of draw calls on the next statistic page though, it seems like something is definitely wrong.
Usually, the scene should have a maximum of 600 draw calls (this is subject to change considering machines are always getting stronger).

GPU-Bound

NameDescriptionReferences
Vertex Shader Bound

If you ever get vertex bound, something is definitely wrong. As seen here, most recent graphics cards easily support rendering millions of triangles (tested here with an NVDIA GeForce 960) - though that always depends on the vertex shader associated with the current object.
In this examples, it took about 12 millions triangles before getting bound by the vertex shader.



Fragment/Pixel Shader Bound
There are currently two main ways to get pixel shader bounded:
  • Use a particle spray with way too many particles.
  • Add many lights (we are using a forward renderer).

Here we can see the amount of draw calls, vertices, etc. are all quite low.




Texture Service Bound

Some of our extensions require updates on dynamic textures, such as the HeightField extension used for rendering of the Earthwork Zone and the Soil Bin. To do so, the texture service receives the new data and sends it to the GPU. Those updates can be quite costly and the only way to go around this issue is to lower the amount of data which is being sent to the GPU.
If the issue occurs with a HeightField in an Earthwork Zone or Soil Bin the time consumption can be reduced by reducing the number of height field cells that are rendered. This can be achieved either by reducing the size of the HeightField (on the x-y plane) or increasing the cell size of the HeightField.


GPU Processor Bound

When the GPU Processor is fully used, the time it takes to upload new data to the GPU might increase (shown as "Resources"), as the CPU will have to wait for the GPU to finish processing before new data can be uploaded.
One example is the use of the screen space mesh (SSM) effect in the Soil Particles' Graphics extension. When the Kernel Precision property is set to a higher value, the processing on the GPU will increase during the calculation of the particle surface. When the GPU reaches close to 100% processor usage (use CTRL+F3 to activate the GPU usage stats), the "Resources" time can increase significantly. This is due to the fact that the CPU stalls when trying to upload the resources to the GPU as it is waiting for the GPU to finish its processing for the last simulation step. In this case, try reducing the GPU load to make time for the resource upload. In this example, you would reduce the Kernel Precision value to achieve this.

Kernel precision high and GPU processor fully used:                                   Kernel precision low and GPU processor no longer fully used:



Instability

An unstable simulation has many potential sources. Any modules that generates a time step over the target will trigger instabilities.
The stuttering status page definitely helps to pinpoint the culprit.


Geometry Shader BoundThere are not that many ways to be Geometry Shader bound simply because this is quite a specialized shader. In Vortex, the geometry shader is only used for cables (spline rendering) and particles. In the current implementation, there has not been found a way to actually trigger that bottleneck without changing the geometry shader code directly.