Stronghold - Rendering

Sep 2, 2025

I knew from the beginning that working on the graphics was going to be the hardest (or at least the most time consuming) part of the process for me. I already had plenty of experience in programming gameplay features and I already had some exposure to low-level engine programming in my last project. But not 3D graphics. I had made some barebones example projects in both Vulkan and Direct3D11, but those were really just tutorial copypastas. I had done some Direct2D stuff with a fun little personal project, but that was a lot simpler than even really mundane 3D.

But there was a benefit to my lack of knowledge. It meant I could start from the beginning and consciously focus on learning a modern approach. Fortune favored me, as SDL3 GPU had just recently seen its official release. I already liked SDL2 and had used it for some of my early freelance projects so it felt natural getting back into it. And in general, I found SDL3 to be pretty much be a straight upgrade in every possible way. Yay.

One drawback I was initially worried about re. the new nature of SDL3 GPU was the lack of a tutorial to follow. I had a stark memory of following line-by-line a very dense Vulkan tutorial that was rather hard to follow and I feared SDL3 GPU would go the same route, or even worse... but instead of a tutorial I chanced upon a big example repository. And I found this was actually much, much better than a tutorial. SDL actually has really good documentation (in the source code and also available online) and combined with examples that essentially acts as a self guided tutorial and ultimate reference. Because tutorials are often aimed at newbies they usually do things in a very inefficient way or aren't implementing it as part of a game loop or have some kind of dense set of dependencies totally irrelevant to my actual project. And of course, it's a lot easier to write a ton of examples without an attached tutorial than it is to write a high quality tutorial for every example. So I think it turned out perfectly for me.

1.) Setting up Requirements

Unlike the simpler 2D I did before, setting up a 3D renderer in SDL takes a small mountain of code before anything can even happen. You need to make transfer buffers for everything you want to send to the GPU, you need to describe you vertex and index buffers for the pipeline, set up color targets and depth targets, tell the rasterizer and depth stencil what to do, and already have some shaders ready to work with the data you're giving the GPU. For a game you'll need to have a projection matrix for use with your camera and functions to transform information about your shapes relative to your viewpoint. You need more functions to actually handle uploading all that data to the GPU, then you need to the function to draw it all... And hope everything makes it to the shaders as expected and hope you see something on the screen.

Given all this I knew I was going to need to make a graphics setup as simple as possible that still met my goals. From step one I decided to be highly restrictive in my requirements for all assets that would be used by the engine to keep the surface area as low as possible for errors I would spend time investigating and fixing. So I decided ahead of time I would only one format for models and one format for textures:

Meshes would only be GLTF/GLB format from Blender. I could set up custom export rules on Blender to ensure it always came out in a precise way for my engine to ingest. Given its widespread use and Khronos nature it was well documented, had a good spec sheet, and many simple parsers existed for it.

Textures would be loaded as raw TGA only. TGA headers are very small, and I'd already written a fully featured TGA parser in the past, so this was an obvious decision for me. Blender supports it for export of texture painting and most image processors can convert all common formats to TGA, so if for some reason I needed to offload that work I could without issue.

Additionally, I would focus on rendering techniques common in early 3D titles (Ocarina of Time, Warcraft 3). This of course, like the requirements above, made it easier for me to handle the graphics programming, but because this is a largely solo project it made it easier for me to handle the modeling and texturing myself.

2.) Setting up for the Actual Rendering

The benefit of working directly with examples is having an easy way to see how different approaches actually work. It's unlikely a specific example will cover everything you want to do exactly, but it's almost certain (at least this early on) some combination of examples will. And if your solution isn't working for some reason, and theirs does, you can rest assured that you are the problem! Paying close attention to this fact was very important this early on. There are many concepts in graphics programming that are usually taken as a given. I ran into this most clearly trying to set up shaders that were just a bit different from the usual examples.

The initial setup went smoothly. Although there is a lot of code, being this new you are basically just dutifully copying whatever you're looking at. SDL also makes setting up the window a breeze (Forgette is the name of the engine).


    gfx_init :: () -> bool
    {
        SDL_SetHintWithPriority(SDL_HINT_RENDER_GPU_DEBUG, to_c_string("1",, temp), .SDL_HINT_OVERRIDE);

        // === Initial creation of window and device ===
        print("Creating window...\n");

        forgette_window = SDL_CreateWindow(
            to_c_string("Stronghold",, temp),
            1280,
            720,
            SDL_WINDOW_BORDERLESS);

        #if OS == .WINDOWS
        {
            gfx_backend = "direct3d12";
            gfx_shader_format = SDL_GPU_SHADERFORMAT_DXBC;
        }
        else #if OS == .LINUX
        {
            gfx_backend = "vulkan";
            gfx_shader_format = SDL_GPU_SHADERFORMAT_SPIRV;
        }
        else #if OS == .MACOS
        {
            assert(false, "MacOS unsupported");
            return false;
        }

        display_mode: *SDL_DisplayMode = SDL_GetDesktopDisplayMode(1);
        assert(display_mode.(bool), "Failed to get display information\n");

        print("Display size: %x%\n", display_mode.w, display_mode.h);
        forgette_window_width = display_mode.w;
        forgette_window_height = display_mode.h;
        SDL_SetWindowSize(forgette_window, forgette_window_width, forgette_window_height);
        SDL_SetWindowPosition(forgette_window, xx SDL_WINDOWPOS_CENTERED_DISPLAY(1), xx SDL_WINDOWPOS_CENTERED_DISPLAY(1));

        forgette_device = SDL_CreateGPUDevice(
                gfx_shader_format,
                false,
                to_c_string(gfx_backend,, temp));
        assert(forgette_device.(bool), tprint("Failed to create GFX device: %", to_string(SDL_GetError())));

        SDL_ClaimWindowForGPUDevice(forgette_device, forgette_window);

        SDL_SetGPUSwapchainParameters(
            forgette_device,
            forgette_window,
            .SDL_GPU_SWAPCHAINCOMPOSITION_SDR_LINEAR,
            .SDL_GPU_PRESENTMODE_VSYNC); // Mailbox might be preferable when available

        gfx_create_main_pipeline(); // In my setup, this is the function where most of the work setting up graphics goes

        make_default_sampler_set();
        default_projection_matrix = create_projection_matrix(PI/2, 6.0, 1000000.0);
      }

Inside the create main pipeline function we first load the shaders, which are the main driving force behind what we're going to see on screen.


    load_shaders :: ()
    {
      print("Loading shaders...\n");

      shader_dir := tprint("%/shaders", forgette_root_dir);
      vertex_shader_path: string;
      pixel_shader_path: string;
      shader_type: string;

      if gfx_shader_format == SDL_GPU_SHADERFORMAT_DXBC // We set this earlier based on which OS was detected
      {
          vertex_shader_path = tprint("%/gouraud.vertex.dxbc", shader_dir);
          pixel_shader_path = tprint("%/gouraud.pixel.dxbc", shader_dir);
      }
      else if gfx_shader_format == SDL_GPU_SHADERFORMAT_SPIRV
      {
          vertex_shader_path = tprint("%/gouraud_vs.spv", shader_dir);
          pixel_shader_path = tprint("%/gouraud_ps.spv", shader_dir);
      }

      // Vertex
      vertex_code_size: u64;
      vertex_code: *void = SDL_LoadFile(to_c_string(vertex_shader_path,, temp), *vertex_code_size);
      assert(vertex_code.(bool), tprint("Failed to load vertex shader from disk: %", to_string(SDL_GetError())));

      /* In this code here, you are basically just repeating what you wrote down in the shader itself. 
         It has to match exactly, or you'll get an error from the driver trying to load the shader (and probably crash your program). */
      vertex_info: SDL_GPUShaderCreateInfo = .{};
      vertex_info.code = vertex_code.(*u8);
      vertex_info.code_size = vertex_code_size;
      vertex_info.entrypoint = "vs_main";
      vertex_info.format = gfx_shader_format;
      vertex_info.stage = .SDL_GPU_SHADERSTAGE_VERTEX;
      vertex_info.num_samplers = 0;
      /* The uniform buffer holds data that doesn't usually change per instance. 
         In my case I use it for the color and direction of the big directional omni light for the map (the "sun"), 
         as well as a global ambiance value for debugging and testing (ie fullbright) */
      vertex_info.num_uniform_buffers = 1;
      vertex_info.num_storage_buffers = 1; // You can put anything you want in a storage buffer and I use it for all the instance data tied to a single mesh. 
      vertex_info.num_storage_textures = 0;
      vertex_shader = SDL_CreateGPUShader(forgette_device, *vertex_info);
      assert(vertex_shader.(bool), tprint("Failed to create vertex shader: %", to_string(SDL_GetError())));

      // Pixel
      ps_code_size: u64;
      ps_code: *void = SDL_LoadFile(to_c_string(pixel_shader_path,, temp), *ps_code_size);
      assert(ps_code.(bool), tprint("Failed to load pixel shader from disk: %", to_string(SDL_GetError())));

      ps_info: SDL_GPUShaderCreateInfo = .{};
      ps_info.code = ps_code.(*u8);
      ps_info.code_size = ps_code_size;
      ps_info.entrypoint = "ps_main";
      ps_info.format = gfx_shader_format;
      ps_info.stage = .SDL_GPU_SHADERSTAGE_FRAGMENT;
      // Right now, all the pixel shader has is one sampler for the diffuse / color texture. 
      // In the future maybe I'll add more, like normal maps, but for now it's all about keeping it simple.
      ps_info.num_samplers = 1; 
      ps_info.num_uniform_buffers = 0;
      ps_info.num_storage_buffers = 0;
      ps_info.num_storage_textures = 0;
      pixel_shader = SDL_CreateGPUShader(forgette_device, *ps_info);
      assert(pixel_shader.(bool), tprint("Failed to create pixel shader: %", to_string(SDL_GetError())));
    }

I spent a healthy chunk of time getting shaders to fully work. One drawback I ran into not using DirectX directly was seemingly more obtuse error messages. Failure to load a shader wouldn't typically be explained and it would just tell you "The parameter is incorrect." Instant facedesk everytime I see that now. What parameter?! When I have bad dreams I mumble it in my sleep...

In any case, a lot of these errors came from not fully respecting the guidelines hinted at on this doc page. The remarks at the bottom give very specific guidelines on how to order your buffers and in which space they need to go. Additionally, for HLSL (the shader language I chose to use), all your non-SV inputs need to be incremented TEXCOORDs.


    // Here inside the HLSL code, this is the input given to the pixel shader
    struct PSInput
    {
      float4 Pos          	: SV_Position;
      float4 DiffuseF     	: TEXCOORD0;
      float2 UV           	: TEXCOORD1;
      float  fogDepth     	: TEXCOORD2;
      float4 lightColor   	: TEXCOORD3;
      uint   lightingType 	: TEXCOORD4;
      float  Ambiance     	: TEXCOORD5;
      float3 HemiAmbient  	: TEXCOORD6;
      uint   RevealState  	: TEXCOORD7;
      uint   RevealBehavior : TEXCOORD8;
    };

No cool unique names for anything, but I haven't thought about it a single time since starting to follow this pattern.

Back to the CPU, we continue setting up the pipline now that we've loaded the shaders successfully. SDL uses special buffers called transfer buffers that can be re-used to shuffle data from memory to the GPU. I set up three big transfer buffers to handle things:


    vertex_transfer_buffer_info: SDL_GPUTransferBufferCreateInfo;
    vertex_transfer_buffer_info.size = MAX_MESH_SIZE;
    vertex_transfer_buffer_info.usage = .SDL_GPU_TRANSFERBUFFERUSAGE_UPLOAD; // this is an enum...
    vertex_transfer_buffer = SDL_CreateGPUTransferBuffer(forgette_device, *vertex_transfer_buffer_info);
    if !vertex_transfer_buffer
        print("Failed to create vertex transfer buffer!\n");

    index_transfer_buffer_info: SDL_GPUTransferBufferCreateInfo;
    index_transfer_buffer_info.size = MAX_MESH_SIZE;
    index_transfer_buffer_info.usage = .SDL_GPU_TRANSFERBUFFERUSAGE_UPLOAD; // this is an enum...
    index_transfer_buffer = SDL_CreateGPUTransferBuffer(forgette_device, *index_transfer_buffer_info);
    if !index_transfer_buffer
        print("Failed to create index transfer buffer!\n");

    instance_transfer_buffer_info: SDL_GPUTransferBufferCreateInfo;
    instance_transfer_buffer_info.size = size_of(Mesh_Instance_Data) * INSTANCE_BUFFER_MAX;
    instance_transfer_buffer_info.usage = .SDL_GPU_TRANSFERBUFFERUSAGE_UPLOAD; // this is an enum...
    instance_transfer_buffer = SDL_CreateGPUTransferBuffer(forgette_device, *instance_transfer_buffer_info);
    if !instance_transfer_buffer
        print("Failed to create instance transfer buffer!\n");

The vertex and index buffers transfer the mesh data itself to the GPU. This only needs to happen one time for each mesh you load from disk. The instance buffer I use to upload data unique to each *instance* of a mesh in the world (for example, I might have 20 Swordsman units all using the same mesh. I only needed to upload the mesh to the GPU once, but I will need to refresh their unique instance data on the GPU for each individual one every time something changes like their position/rotation/scale).

Already now we've come upon a failing of tutorials I mentioned earlier. How I handle transfer buffers is almost certainly not the most efficient way out there. Keeping in mind the most efficient way to handle VRAM and system memory allocation is very important for performance. I choose to sacrifice some memory size efficiency in return for keeping complexity low and performance a little faster, but it's likely I will revisit this in the future.

Next we simply tell the pipeline how we want it to render things. We need to let it know about the attributes of the vertices we send to the shader (we define a simple struct that contains position, normals, and uv information). We create out depth texture to handle depth and we cull backfaces. We use clockwise ordering for our front faces as I decided to stick with DirectX conventions to keep consistent.


    {
    ...
      // Create graphics pipeline
      print("Creating graphics pipeline...\n");

      pipeline_info: SDL_GPUGraphicsPipelineCreateInfo = .{};
      pipeline_info.vertex_shader = vertex_shader;
      pipeline_info.fragment_shader = pixel_shader;

      pipeline_info.primitive_type = .SDL_GPU_PRIMITIVETYPE_TRIANGLELIST;

      // --- Vertex buffer
      print("Creating vertex input state buffer descriptions...\n");

      vertex_buffer_descs: [1] SDL_GPUVertexBufferDescription;
      vertex_buffer_descs[0].slot = 0;
      vertex_buffer_descs[0].input_rate = .SDL_GPU_VERTEXINPUTRATE_VERTEX;
      vertex_buffer_descs[0].instance_step_rate = 0;
      vertex_buffer_descs[0].pitch = size_of(GFX_Vertex);

      pipeline_info.vertex_input_state.num_vertex_buffers = 1;
      pipeline_info.vertex_input_state.vertex_buffer_descriptions = vertex_buffer_descs.data;

      // --- Vertex attributes
      print("Creating vertex input state attributes...\n");

      VERTEX_ATTR_COUNT :: 3;

      vertex_attributes: [VERTEX_ATTR_COUNT] SDL_GPUVertexAttribute;

      // position
      vertex_attributes[0].buffer_slot = 0;
      vertex_attributes[0].location = 0;
      vertex_attributes[0].format = .SDL_GPU_VERTEXELEMENTFORMAT_FLOAT3;
      vertex_attributes[0].offset = 0;

      // normal
      vertex_attributes[1].buffer_slot = 0;
      vertex_attributes[1].location = 1;
      vertex_attributes[1].format = .SDL_GPU_VERTEXELEMENTFORMAT_FLOAT3;
      vertex_attributes[1].offset = size_of(Vector3);

      // uv
      vertex_attributes[2].buffer_slot = 0;
      vertex_attributes[2].location = 2;
      vertex_attributes[2].format = .SDL_GPU_VERTEXELEMENTFORMAT_FLOAT2;
      vertex_attributes[2].offset = size_of(Vector3) + size_of(Vector3);

      pipeline_info.vertex_input_state.num_vertex_attributes = VERTEX_ATTR_COUNT;
      pipeline_info.vertex_input_state.vertex_attributes = vertex_attributes.data;

      // --- Color target
      print("Creating color target...\n");
      color_target_descs: [1] SDL_GPUColorTargetDescription;
      color_target_descs[0].format = SDL_GetGPUSwapchainTextureFormat(forgette_device, forgette_window);
      if color_target_descs[0].format == .SDL_GPU_TEXTUREFORMAT_INVALID
      {
          color_target_descs[0].format = .SDL_GPU_TEXTUREFORMAT_R8G8B8A8_UNORM;
      }

      pipeline_info.target_info.num_color_targets = 1;
      pipeline_info.target_info.color_target_descriptions = color_target_descs.data;
      pipeline_info.target_info.depth_stencil_format = .SDL_GPU_TEXTUREFORMAT_D24_UNORM_S8_UINT;
      pipeline_info.target_info.has_depth_stencil_target = true;

      print("Creating depth texture...\n");
      depth_texture_info: SDL_GPUTextureCreateInfo;
      depth_texture_info.type                 = .SDL_GPU_TEXTURETYPE_2D;
      depth_texture_info.format               = .SDL_GPU_TEXTUREFORMAT_D24_UNORM_S8_UINT; // or D24S8 if you want stencil
      depth_texture_info.width                = forgette_window_width.(u32);   // match your swap-chain
      depth_texture_info.height               = forgette_window_height.(u32);
      depth_texture_info.layer_count_or_depth = 1;
      depth_texture_info.num_levels           = 1;
      depth_texture_info.usage                = SDL_GPU_TEXTUREUSAGE_DEPTH_STENCIL_TARGET;
      depth_texture_info.props                = 0;

      world_depth_texture = SDL_CreateGPUTexture(forgette_device, *depth_texture_info);
      assert(world_depth_texture.(bool));

      print("Creating depth target...\n");
      world_depth_target.clear_depth = 1.0;
      world_depth_target.load_op = .SDL_GPU_LOADOP_CLEAR;
      world_depth_target.store_op = .SDL_GPU_STOREOP_DONT_CARE;
      world_depth_target.stencil_load_op = .SDL_GPU_LOADOP_DONT_CARE;
      world_depth_target.stencil_store_op = .SDL_GPU_STOREOP_DONT_CARE;
      world_depth_target.texture = world_depth_texture;
      world_depth_target.cycle = true;

      // Finish pipeline creation
      print("Finalizing pipeline creation...\n");

      pipeline_info.rasterizer_state.fill_mode = .SDL_GPU_FILLMODE_FILL;
      pipeline_info.rasterizer_state.cull_mode = .SDL_GPU_CULLMODE_BACK;
      pipeline_info.rasterizer_state.front_face = .SDL_GPU_FRONTFACE_CLOCKWISE;
      pipeline_info.rasterizer_state.enable_depth_bias = false;
      pipeline_info.rasterizer_state.enable_depth_clip = true;

      pipeline_info.depth_stencil_state.enable_depth_test = true;
      pipeline_info.depth_stencil_state.enable_depth_write = true;
      pipeline_info.depth_stencil_state.compare_op = .SDL_GPU_COMPAREOP_LESS;

      graphics_pipeline = SDL_CreateGPUGraphicsPipeline(forgette_device, *pipeline_info);
      assert(graphics_pipeline.(bool), tprint("Failed to create graphics pipeline: %", to_string(SDL_GetError())));

      print("Pipeline created, releasing shaders...\n");
      SDL_ReleaseGPUShader(forgette_device, vertex_shader);
      SDL_ReleaseGPUShader(forgette_device, pixel_shader);
    }

Now everything is set up to render. The code above is a slightly simplified version of what I currently have. Below is shown the actual render function that is called every frame in the main loop -- it includes some additional passes that I will go over in future posts (a debug trace pass as well as a UI pass).


    gfx_render :: ()
    {
      if last_view_matrix != default_view_matrix
          view_dirty = true;

      if view_dirty
      {
          successful_inversion: bool;
          successful_inversion, inverse_view_projection = inverse(default_view_matrix * default_projection_matrix);
          assert(successful_inversion);
      }

      command_buffer = SDL_AcquireGPUCommandBuffer(forgette_device);

      world_color_target_info: SDL_GPUColorTargetInfo;
      world_color_target_info.clear_color = .{0.0/255.0, 0.0/255.0, 0.0/255.0, 255.0/255.0};
      world_color_target_info.load_op = .SDL_GPU_LOADOP_CLEAR;
      world_color_target_info.store_op = .SDL_GPU_STOREOP_STORE;

      swapchain_texture: *SDL_GPUTexture;
      width: u32;
      height: u32;
      SDL_WaitAndAcquireGPUSwapchainTexture(
          command_buffer,
          forgette_window,
          *swapchain_texture,
          *width,
          *height);
      world_color_target_info.texture = swapchain_texture;

      // World copy pass

      copy_pass := SDL_BeginGPUCopyPass(command_buffer);

          upload_models(copy_pass);

          upload_textura_simplexes(copy_pass);

          // upload_cursor_instance_data(copy_pass);

          #if FORGETTE_DEBUG
              upload_debug_trace_shapes(copy_pass);

          for models_master_list
          {
              model_info := *it;

              if model_info.instances.count > 0
              {
                  upload_instance_data(copy_pass, model_info);
              }
          }

      SDL_EndGPUCopyPass(copy_pass);

      // World render pass
      {
          world_render_pass := SDL_BeginGPURenderPass(
              command_buffer,
              *world_color_target_info,
              1,
              *world_depth_target);
          SDL_BindGPUGraphicsPipeline(world_render_pass, graphics_pipeline);

          if active_map
          {
              for models_master_list
              {
                  handle := it_index;
                  draw_model_instances(world_render_pass, command_buffer, handle);
              }
          }

          #if FORGETTE_DEBUG
          {
              SDL_BindGPUGraphicsPipeline(world_render_pass, debug_trace_graphics_pipeline);
              draw_debug_traces(world_render_pass, command_buffer);
          }

          SDL_EndGPURenderPass(world_render_pass);
      }

      // UI copy pass
      {
          copy_pass := SDL_BeginGPUCopyPass(command_buffer);

          for forgette_ui.atlases
          {
              atlas := HT.table_find_pointer(*forgette_ui.atlases, it_index);

              if atlas.instances.count > 0
              {
                  upload_ui_element_instance_data(copy_pass, atlas);
              }
          }

          SDL_EndGPUCopyPass(copy_pass);
      }

      ui_color_target_info: SDL_GPUColorTargetInfo;
      ui_color_target_info.clear_color = .{10.0/255.0, 0.0/255.0, 20.0/255.0, 255.0/255.0};
      ui_color_target_info.load_op = .SDL_GPU_LOADOP_LOAD;
      ui_color_target_info.store_op = .SDL_GPU_STOREOP_DONT_CARE;
      ui_color_target_info.texture = swapchain_texture;

      // UI render pass
      {
          ui_render_pass := SDL_BeginGPURenderPass(
              command_buffer,
              *ui_color_target_info,
              1,
              *ui_depth_target);
          SDL_BindGPUGraphicsPipeline(ui_render_pass, ui_graphics_pipeline);

          ui_update_cursor(Active_Cursor);

          draw_ui_element_instances(ui_render_pass, command_buffer);

          SDL_EndGPURenderPass(ui_render_pass);
      }

      SDL_SubmitGPUCommandBuffer(command_buffer);

      last_view_matrix = default_view_matrix;
      view_dirty = false;
    }

That was it for getting it set up. In the next post I will go into detail about the simple Gouraud shader I wrote that makes use of instancing, as well as how I manage my mesh and instance uploads and execute the draw calls. From there it will be an easy foray into the not-ecs-entity-and-component system.

Back to Home