Bindless Textures
This is part of a tutorial series teaching advanced modern OpenGL. To use all features to their fullest you will need to target OpenGL 4.6. These articles assume you have familiarity with OpenGL.
In this tutorial we will be looking at a method for eliminating the need to bind a texture to a uniform texture unit. If you haven’t already make sure to check out the Shader Storage Buffer Objects (SSBOs) and Programmable Vertex Pulling tutorials since we will use both of these here.
What Are Bindless Textures?
OpenGL 4.5 only requires a minimum of 16 texture units to be available to a shader for use, though many drivers support 32 or more. This is a very difficult limit to deal with if we are trying to do something like merge many different draw calls with different material textures, or access many shadow maps in a deferred lighting stage. There were two main ways to get around this in the past:
- Use array textures to combine multiple textures of the same dimensions
- Use texture atlases to allow one large texture to represent many smaller textures
Bindless textures are a 3rd alternative to this. Instead of explicitly binding a texture to a texture unit, bindless textures allow us to manually mark the texture as resident on the GPU and then pass its handle to a shader using a uniform buffer or shader storage buffer.
There is also a possible performance gain with this approach. If your renderer is currently spending a lot of time binding and unbinding textures between different draw calls, a bindless approach can reduce a lot of driver overhead. This is because your application can now figure out which textures are needed at the beginning of the frame, mark them all as resident at the sae time and then use those textures for any number of draw calls for the remainder of the frame.
Downsides Of Bindless Textures
There are two main downsides of bindless textures:
1) Renderdoc does not support them at the time of writing so you will need to use a graphics debugger that does such as Nvidia NSight
2) Bindless textures never made it into the core of OpenGL so it is still an extension even with 4.6. Nvidia and AMD both have a lot of hardware that is fully supportive of it, but outside of those two you will need to double check if the vendor supports it.
A Look At The API
For this tutorial we will be focused on 3 main API functions which we will go through now.
Retrieving Texture Handles
The purpose of this function is to take a valid texture and return a 64-bit handle that represents that texture. This handle can be inserted directly into uniform buffers or shader storage buffers.
If you were to call this function multiple times with the same texture, it will always return the same handle. So what you can do is call this once after the texture is created and then use the handle for the entire lifetime of the texture without having to call this function again.
Important! Khronos states the following on their wiki: “Once a handle is created for a texture/sampler, none of its state can be changed. For Buffer Textures, this includes the buffer object that is currently attached to it (which also means that you cannot create a handle for a buffer texture that does not have a buffer attached). Not only that, in such cases the buffer object itself becomes immutable; it cannot be reallocated with glBufferData. Though just as with textures, its storage can still be mapped and have its data modified by other functions as normal.”
So the general guideline is to perform all initial setup that you will need for the texture or buffer texture, then once you are done call this. From then on you will only be able to change the texture data but no other state related to the texture.
Making Texture Handles Resident
Since we won’t be binding to uniform texture units anymore, we need to be able to tell the driver which handles we plan to use before we use them. Otherwise it has no way of knowing.
Once you make a texture handle resident, it remains resident until you explicitly tell the driver to make it non-resident. This means that if you have a certain group of textures that will be around for the entire lifetime of your program, you could make them resident at the start and then leave them resident until the program is ending.
Making Texture Handles Non-Resident
This allows you to tell the driver that the handle is no longer being used by any shaders and can be taken off the residency list.
Example: 1600 textures in a single shader
To show how to use this we are going to randomly generate 1600 textures (one for each of 1600 cube instances) and make them all resident for the shader to use. We will only be using a single instanced draw call to draw 1600 cubes.
First we will generate the data for the cube and insert its vertices into an SSBO.
Next we will create 1600 different transforms, one for each cube instance. These will also be inserted into a SSBO.
Now we will randomly generate 1600 textures.
Setting Up Vectors
We will have a vector for the textures and for the returned handles.
Generating Textures And Retrieving Handles
There needs to be some sort of check to make sure the handle returned actually makes sense. If something bad happened it will return 0.
Next we need to pack all the texture handles into another SSBO.
Packing Handles Into SSBO
Once we have all this we are ready to enter the rendering loop!
Making Handles Resident And Drawing
A Look At The Shaders
The vertex shader is almost identical to the one found in the Programmable Vertex Pulling tutorials except that now it writes gl_InstanceID as an output so that the fragment shader can use it. gl_InstanceID in this case will contain values from 0 to 1599 and the fragment shader will use this to determine which texture it should pull data from.
Another really important thing is to remember to use #extension GL_ARB_bindless_texture : require
since the bindless textures never made it into core OpenGL. We will have to put this in both the vertex and the fragment shader.
shader.vs
In the fragment shader below you’ll notice that the readonly SSBO has type “sampler2D” inside of it. Depending on the texture type you are using this should work with sampler2D, sampler3D, samplerCube, etc. All of them only require that you pass in the 64-bit handles as the SSBO data.
fragment.vs
Running this program produces the following result on a GTX 1060:
Note About Dynamically Uniform Expressions
For information on what dynamically uniform expressions are, see https://www.khronos.org/opengl/wiki/Core_Language_(GLSL)#Dynamically_uniform_expression.
In the above shader we are pulling the texture using the value inside of flat in int fsInstance;
. Since we are drawing multiple instances with the same draw command, this will not be a dynamically uniform expression. Certain hardware supports this, for example the GTX 1060 I used supports it with no other extensions needed except bindless.
But on other hardware this could cause major issues since the code path leading to the texture access needs to be dynamically uniform. There are two main workarounds:
1) Switch to Multi-Draw Indirect (MDI) and make use of gl_DrawID
which is guaranteed to be dynaically uniform. With this setup instead of having multiple instances of the cube, you would record 1600 draw commands so that gl_DrawID
is set to a value of [0, 1600) depending on which command is being executed.
2) Add other extensions on top of bindless. For Nvidia (if required) you can add #extension GL_NV_gpu_shader5 : require
. On AMD (if required) you can add #extension GL_EXT_nonuniform_qualifier : require
.
(Special thanks to Jake Ryan over at https://juandiegomontoya.github.io/modern_opengl.html for pointing this out)
uvec2 To sampler
For an example of this, see this shader: https://github.com/JuanDiegoMontoya/GLest-Rendererer/blob/main/glRenderer/Resources/Shaders/gBufferBindless.fs
After making a 64-bit texture handle resident, it is possible to pass in that texture as an array of uvec2 instead of an explicit sampler2D. As a short example modified from the link above:
(Special thanks to Jake Ryan over at https://juandiegomontoya.github.io/modern_opengl.html for suggesting this example)
Conclusion
Bindless textures are a very powerful tool to increase the number of textures that a shader can access. It supplements the old methods of using texture arrays or texture atlases, and for programs that spend large amounts of time binding/unbinding textures between draw calls it can offer a lot of performance improvements.
The main downsides relate to a) not being core in GL 4.6, b) not all graphics debuggers supporting it, and c) outside of Nvidia and AMD which have strong support for bindless, not all vendors (especially mobile) will support them.
Learn OpenGL Fundamentals
References
- https://www.khronos.org/opengl/wiki/Bindless_Texture
- OpenGL SuperBible
- https://www.khronos.org/opengl/wiki/Core_Language_(GLSL)#Dynamically_uniform_expression
- https://stackoverflow.com/questions/40875564/opengl-bindless-textures-bind-to-uniform-sampler2d-array
- https://community.khronos.org/t/how-to-implement-bindless-textures-efficiently/76109/2
- https://juandiegomontoya.github.io/modern_opengl.html
- https://registry.khronos.org/OpenGL/extensions/NV/NV_gpu_shader5.txt
- https://github.com/KhronosGroup/GLSL/blob/master/extensions/ext/GL_EXT_nonuniform_qualifier.txt