Notes on WebGPU Fundamentals

February 17, 2024•45,828 words

This is a summary of webgpufundamentals.org so all credits to them.
Some of the code is shortened or changed.

I am by far not done reading it yet, so this post will be continuously updated.

WebGPU is meant to be the successor of WebGL which is based on OpenGL, first developed in 1992, and to more efficiently interact with modern GPU hardware.

The API is already available on Chrome and Edge but still marked as experimental.

Basically all it does is run 3 types of functions on the GPU:

Vertex Shader

Computes vertices, returns their position and draws a triangle between them for every 3 by default (there are also 4 other modes)

Fragment Shader

Computes colors, is called for each pixel when a triangle is drawn

Compute Shader

Function to be executed N times, iteration number is passed each time

Initialization, Configuration

Add <canvas> tag to html and this javascript

async function main() {
  const adapter = await navigator.gpu?.requestAdapter();
  const device = await adapter?.requestDevice();
  if (!device) {
    alert('Your browser does not support WebGPU');
    return;
  }
  
  const canvas = document.querySelector('canvas');
  const context = canvas.getContext('webgpu');
  const presentationFormat = navigator.gpu.getPreferredCanvasFormat();
  context.configure({
    device,
    format: presentationFormat,
  });
}
main();

Create shader module with a vertex shader and fragment shader function

const module = device.createShaderModule({
  label: 'our hardcoded red triangle shaders',
  code: `
    @vertex fn vs(
      @builtin(vertex_index) vertexIndex : u32
    ) -> @builtin(position) vec4f {
      let pos = array(
        vec2f( 0.0,  0.5),  // top center
        vec2f(-0.5, -0.5),  // bottom left
        vec2f( 0.5, -0.5)   // bottom right
      );
      return vec4f(pos[vertexIndex], 0.0, 1.0);
    }
    @fragment fn fs() -> @location(0) vec4f {
      return vec4f(1.0, 0.0, 0.0, 1.0);
    }
  `, 
});

Shaders are written in WGSL. (WebGPU shader language)

The parameter vertexIndex, which is a 32-bit unsigned integer, gets its value from the builtin vertex_index,
which is the iteration number. The vertex function returns a vec4f which is a vector of four 32-bit floating point values which will be assigned to the position builtin.

Positions need to be returned in clip space where X and Y go from -1.0 to +1.0, regardless of the size of the texture that is drawn to.

The fragment shader function returns a vec4f at location(0)
this means that it'll write to the first render target.

The vec4f values correspond to red, green, blue, and alpha.

When the GPU rasterizes the triangle (draws it with pixels), it will call the fragment shader to find out what color to make each pixel.

The label is optional but best practice, most WebGPU implementations will print an error message that includes the labels of the things related to the error.

Create Render Pipeline

const pipeline = device.createRenderPipeline({
  label: 'hardcoded red triangle pipeline',
  layout: 'auto',
  vertex: {
    module,
    entryPoint: 'vs',
  },
  fragment: {
    module,
    entryPoint: 'fs',
    targets: [{ format: presentationFormat }],
  },
});

Defines the vertex and fragment shaders to be run by the GPU.
It may also define attributes that reference buffers.

Create Render Pass

const renderPassDescriptor = {
  label: 'our basic canvas renderPass',
  colorAttachments: [
    {
      // view: <- to be filled out when rendering
      clearValue: [0.3, 0.3, 0.3, 1], //semi-dark gray
      loadOp: 'clear',
      storeOp: 'store',
    },
  ],
};

The array for colorAttachments lists the textures to render and how to treat them. loadOp specifies to clear the texture to the clear value before drawing. storeOp tells to store what is drawn, 'discard' would throw it away.

Rendering

function render() {
  renderPassDescriptor.colorAttachments[0].view =
    context.getCurrentTexture().createView();

  const encoder = device.createCommandEncoder({ label: 'our encoder' });

  const pass = encoder.beginRenderPass(renderPassDescriptor);
  pass.setPipeline(pipeline);
  pass.draw(3);
  pass.end();
  
  const commandBuffer = encoder.finish();
  device.queue.submit([commandBuffer]);
} 
render();

First it gets the current texture from the canvas context and sets it as the texture to render to. Then a command encoder (used to create a command buffer) and a render pass encoder for render specific commands is created and a pipeline is set for it. The vertex shader is called 3 times, the render pass is then ended and the encoder finished. Finally, the resulting command buffer is submitted to be executed.

Computations

Basic example for running some computation on the GPU,
doubling some numbers

async function main() {
  const adapter = await navigator.gpu?.requestAdapter();
  const device = await adapter?.requestDevice();
  if (!device) {
    alert('need a browser that supports WebGPU');
    return;
  }

  const module = device.createShaderModule({
    label: 'doubling compute module',
    code: `
      @group(0) @binding(0) var<storage, read_write> data: array<f32>;

      @compute @workgroup_size(1) fn computeSomething(
        @builtin(global_invocation_id) id: vec3<u32>
      ) {
        let i = id.x;
        data[i] = data[i] * 2.0;
      }
    `,
  });

  const pipeline = device.createComputePipeline({
    label: 'doubling compute pipeline',
    layout: 'auto',
    compute: {
      module,
      entryPoint: 'computeSomething',
    },
  });

  const input = new Float32Array([1, 3, 5]);

  // create a buffer on the GPU to hold the computation
  // input and output
  const workBuffer = device.createBuffer({
    label: 'work buffer',
    size: input.byteLength,
    usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC | GPUBufferUsage.COPY_DST,
  });
  // Copy the input data to that buffer
  device.queue.writeBuffer(workBuffer, 0, input);

  // create a buffer on the GPU to get a copy of the results
  const resultBuffer = device.createBuffer({
    label: 'result buffer',
    size: input.byteLength,
    usage: GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST,
  });

  // Setup a bindGroup to tell the shader which
  // buffer to use for the computation
  const bindGroup = device.createBindGroup({
    label: 'bindGroup for work buffer',
    layout: pipeline.getBindGroupLayout(0),
    entries: [
      { binding: 0, resource: { buffer: workBuffer } },
    ],
  });

  // Encode commands to do the computation
  const encoder = device.createCommandEncoder({
    label: 'doubling encoder',
  });
  const pass = encoder.beginComputePass({
    label: 'doubling compute pass',
  });
  pass.setPipeline(pipeline);
  pass.setBindGroup(0, bindGroup);
  pass.dispatchWorkgroups(input.length);
  pass.end();

  // Encode a command to copy the results to a mappable buffer.
  encoder.copyBufferToBuffer(workBuffer, 0, resultBuffer, 0, resultBuffer.size);

  // Finish encoding and submit the commands
  const commandBuffer = encoder.finish();
  device.queue.submit([commandBuffer]);

  // Read the results
  await resultBuffer.mapAsync(GPUMapMode.READ);
  const result = new Float32Array(resultBuffer.getMappedRange().slice());
  resultBuffer.unmap();

  console.log('input', input);
  console.log('result', result);
}

main();

Since contents of a WebGPU buffer cant directly be read from JavaScript
they have to be "mapped" because the buffer might be in use and might only exist on the GPU.
Buffers that can be mapped in JavaScript cant be used for much else, so theyre an addition to which the values from another buffer are copied to.

The iteration number in compute shaders are 3 dimensional.
This is a simplification of how they work:

function dispatchWorkgroups(width, height, depth) {
  for (z = 0; z < depth; ++z) {
    for (y = 0; y < height; ++y) {
      for (x = 0; x < width; ++x) {
        const workgroup_id = {x, y, z};
        dispatchWorkgroup(workgroup_id)
      }
    }
  }
}

function dispatchWorkgroup(workgroup_id) {
  // from @workgroup_size in WGSL
  const workgroup_size = shaderCode.workgroup_size;
  const {x: width, y: height, z: depth} = workgroup_size;
  for (z = 0; z < depth; ++z) {
    for (y = 0; y < height; ++y) {
      for (x = 0; x < width; ++x) {
        const local_invocation_id = {x, y, z};
        const global_invocation_id =
            workgroup_id * workgroup_size + local_invocation_id;
        computeShader(global_invocation_id)
      }
    }
  }
}

Canvas Resizing

The <canvas> tag, by default, has a resolution of 300x150 pixels. Simply changing its width/height wont change the resolution.

const canvasToSizeMap = new WeakMap();

const observer = new ResizeObserver(entries => {
  for (const entry of entries) {
    canvasToSizeMap.set(entry.target, {
       width: entry.contentBoxSize[0].inlineSize,
       height: entry.contentBoxSize[0].blockSize,
    });
  }
    
  // Get the canvas's current display size
  let { width, height } = canvasToSizeMap.get(canvas) || canvas;

  // Make sure it's valid for WebGPU
  width = Math.max(1, Math.min(width, device.limits.maxTextureDimension2D));
  height = Math.max(1, Math.min(height, device.limits.maxTextureDimension2D));

  // Only if the size is different, set the canvas size
  const needResize = canvas.width !== width || canvas.height !== height;
  if (needResize) {
    canvas.width = canvas.clientWidth * window.devicePixelRatio;
    canvas.height = canvas.clientHeight * window.devicePixelRatio;
  }
  render();
});
observer.observe(canvas);

canvas.width and canvas.height can also simply be assigned the local variables width and height but I replaced it with a multiplication by window.devicePixelRatio for a sharper image on HiDPI displays.

Extra Checks

A device can be "lost" for many reasons. The user may have run an intensive app and crashed the GPU or check for lost device or disabled WebGPU.

async function start() {
  if (!navigator.gpu) {
    alert('This browser does not support WebGPU');
    return;
  }

  const adapter = await navigator.gpu.requestAdapter();
  if (!adapter) {
    alert('this browser supports WebGPU but it appears disabled');
    return;
  }

  const device = await adapter?.requestDevice();
  device.lost.then((info) => {
    console.error(`WebGPU device was lost: ${info.message}`);

    // reason will be 'destroyed' if the device was intentionally destroyed.
    if (info.reason !== 'destroyed') {
      start();
    }
  });
  
  main(device);
}
start();

function main(device) {
  ... do webgpu ...
}

Inter-Stage Variables

The vertex shader can output extra values at positions and by default interpolate those values between them.

In the shader module:

const module = device.createShaderModule({
  label: 'hardcoded rgb triangle shaders',
  code: `
    struct OurVertexShaderOutput {
      @builtin(position) position: vec4f,
      @location(0) color: vec4f,
    };
    @vertex fn vs(
      @builtin(vertex_index) vertexIndex : u32
    ) -> VertexShaderOutput {
      let pos = array(
        vec2f( 0.0,  0.5),  // top center
        vec2f(-0.5, -0.5),  // bottom left
        vec2f( 0.5, -0.5)   // bottom right
      );
      var color = array<vec4f, 3>(
        vec4f(1, 0, 0, 1), // red
        vec4f(0, 1, 0, 1), // green
        vec4f(0, 0, 1, 1), // blue
      );
      var vsOutput: VertexShaderOutput;
      vsOutput.position = vec4f(pos[vertexIndex], 0.0, 1.0);
      vsOutput.color = color[vertexIndex];
      return vsOutput;
    }
    @fragment fn fs(fsInput: VertexShaderOutput) -> @location(0) vec4f {
      return fsInput.color;
    }
  `,
});

First a struct is declared in the vertex shader, which will be like a type/interface for the returned value. A vec4f "color" is added to the buildtin position. When returning, an instance of the structure is declared and its properties filled out. The fragment shader is changed to take one of these structs as an argument.

The connection between vertex shader and fragment shader is by index.
For inter-stage variables, they connect by location index.

Output:

Checkerboard

The following code draws a checkerboard into the triangle

const module = device.createShaderModule({
  label: 'hardcoded checkerboard triangle shaders',
  code: `
    struct VertexShaderOutput {
      @builtin(position) position: vec4f,
    };
    @vertex fn vs(
      @builtin(vertex_index) vertexIndex : u32
    ) -> VertexShaderOutput {
      let pos = array(
        vec2f( 0.0,  0.5),  // top center
        vec2f(-0.5, -0.5),  // bottom left
        vec2f( 0.5, -0.5)   // bottom right
      );
      var vsOutput: OurVertexShaderOutput;
      vsOutput.position = vec4f(pos[vertexIndex], 0.0, 1.0);
      return vsOutput;

    }
      
    @fragment fn fs(fsInput: VertexShaderOutput) -> @location(0) vec4f {
      let black = vec4f(0, 0, 0, 1);
      let white = vec4f(1, 1, 1, 1);

      let grid = vec2u(fsInput.position.xy) / 8;
      let checker = (grid.x + grid.y) % 2 == 1;

      return select(black, white, checker);
    }
  `,
});

The fragment shader takes the position input, converts it to a vec2u,
divides the 2 values by 8, which makes a count that increases every 8 pixels,
and then adds the x and y coordinates together, compues modulo 2, and compares the result to 1.

This makes a boolean that is true or false for every other integer.

Finally, it uses the select function which selects one or the other value based on the boolean as condition.

Note: checkerboards and other patterns are more commonly drawn using textures.
If its based on the pixel coordinates its relative to the canvas, not the triangle.

Splitted Modules

Both shaders here are in the same module,
they can also be split into separate modules and work the same

const vsModule = device.createShaderModule({
  label: 'hardcoded triangle',
  code: `
    struct OurVertexShaderOutput {
      @builtin(position) position: vec4f,
    };

    @vertex fn vs(
      @builtin(vertex_index) vertexIndex : u32
    ) -> OurVertexShaderOutput {
      let pos = array(
        vec2f( 0.0,  0.5),  // top center
        vec2f(-0.5, -0.5),  // bottom left
        vec2f( 0.5, -0.5)   // bottom right
      );

      var vsOutput: OurVertexShaderOutput;
      vsOutput.position = vec4f(pos[vertexIndex], 0.0, 1.0);
      return vsOutput;
    }
  `,
});

const fsModule = device.createShaderModule({
  label: 'checkerboard',
  code: `
    @fragment fn fs(@builtin(position) pixelPosition: vec4f) -> @location(0) vec4f {
      let red = vec4f(1, 0, 0, 1);
      let cyan = vec4f(0, 1, 1, 1);

      let grid = vec2u(pixelPosition.xy) / 8;
      let checker = (grid.x + grid.y) % 2 == 1;

      return select(red, cyan, checker);
    }
  `,
});
  
  const pipeline = device.createRenderPipeline({
  label: 'hardcoded checkerboard triangle pipeline',
  layout: 'auto',
  vertex: {
    module: vsModule,
    entryPoint: 'vs',
  },
  fragment: {
    module: fsModule,
    entryPoint: 'fs',
    targets: [{ format: presentationFormat }],
  },
});

Interpolation Settings

There are 2 types of settings:

Type

perspective: values are interpolated in a perspective correct manner (default)
linear: in a linear, non-perspective correct manner
flat: not interpolated

Sampling

center: performed at the center of the pixel (default)
centroid: performed at the point that lies within all the samples covered by the fragment within the current primitive
sample: performed per sample, fragment shader is invoked once per sample

These attributes can be specified like this

@location(2) @interpolate(linear, center) myVariableFoo: vec4f;
@location(3) @interpolate(flat) myVariableBar: vec4f;

Note: if the inter-stage variable is an integer type then the interpolation must be set to flat.

Memory Layout

In WGSL v1, there are 4 base types

f32 (a 32bit floating point number)
i32 (a 32bit integer)
u32 (a 32bit unsigned integer)
f16 (a 16bit floating point number)

A byte is 8 bits so a 32 bit value takes 4 bytes and a 16 bit value takes 2 bytes.

For example this struct

struct ExampleStruct {
  velocity: f32,
  acceleration: f32,
  frameCount: u32,
};

would be 12 bytes.
To pass data to a shader it would need to be prepared to match the memory layout of the struct.

const exampleStructData = new ArrayBuffer(12);
const exampleStructValuesAsF32 = new Float32Array(exampleStructData);
const exampleStructValuesAsU32 = new Uint32Array(ourStructData);

An ArrayBuffer is a chunk of memory,
a view has to be created to take a look at the contents of this memory.

Views can be created in different data types.

The data can then be set in the structure:

exampleStructValuesAsF32[0] = 1.2; //Velocity
exampleStructValuesAsU32[2] = 56; //FrameCount

TypedArrays have a constructor that takes various forms. Reference

Following are the properties of TypedArray

length: number of units
byteLength: size in bytes
byteOffset: offset in the TypedArray’s ArrayBuffer
buffer: the ArrayBuffer this TypedArray is viewing

It also has various methods. For example subarray, which creates a new array of the same buffer from the specified start and end (end is not included).

So having multiple views of the same buffer is also possible

const v1 = new Float32Array(5);
const v2 = v1.subarray(3, 5);  // view the last 2 floats of v1
v2[0] = 123;
v2[1] = 456;
console.log(v1);  // shows 0, 0, 0, 123, 456

Note: the map function of a TypedArray creates a new array or the same type,
mapping into another type can be done with Array.from

const f32d = Array.from(f32a).map(v => `${v} doubled = ${v *2}`);

vec and mat types

Ref table

Uniforms

Uniforms are like global variables for the shader.

The value can be set before the shader is executed and it'll have those values for every iteration.

const module = device.createShaderModule({
    label: 'triangle shaders with uniforms',
    code: `
      struct TestStruct {
        color: vec4f,
        scale: vec2f,
        offset: vec2f,
      };
        
      @group(0) @binding(0) var<uniform> testStruct: TestStruct;    <-----
  
      @vertex fn vs(
        @builtin(vertex_index) vertexIndex : u32
      ) -> @builtin(position) vec4f {
        let pos = array(
          vec2f( 0.0,  0.5),  // top center
          vec2f(-0.5, -0.5),  // bottom left
          vec2f( 0.5, -0.5)   // bottom right
        );
  
        return vec4f(
          pos[vertexIndex] * testStruct.scale + testStruct.offset, 0.0, 1.0);
      }
  
      @fragment fn fs() -> @location(0) vec4f {
        return testStruct.color;
      }
    `,
});

Next create a buffer on the GPU to hold values for the shaders
and set the flags so it can be used with uniforms and be updated by copying data to it,
also create a TypedArray to hold and set the values in JS.
Then fill out two of the values that wont be changing later, the second parameter in set is at which byteoffset the value is placed.
Then, to tell the shader about the buffer, create a bind group and bind the buffer to the same @binding that is set in the shader.
In the render function, set the remaining one of uniformValues and copy them to the buffer on the GPU.
(Besides writeBuffer there are several other ways to copy data to a buffer)
Finally, set the bind group before drawing.

const uniformBufferSize = 32 //color is 4*4 bytes, scale and offset are 2*4 bytes
const uniformBuffer = device.createBuffer({
  size: uniformBufferSize,
  usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
});
const uniformValues = new Float32Array(uniformBufferSize / 4); //creates its own ArrayBuffer with 4 bytes times the amount of the parameter, so divide by 4

uniformValues.set([0, 1, 0, 1], 0); //color
uniformValues.set([-0.5, -0.25], 6); //offset

const bindGroup = device.createBindGroup({
  layout: pipeline.getBindGroupLayout(0),
  entries: [
    { binding: 0, resource: { buffer: uniformBuffer }},
  ],
});

function render() {
    const aspect = canvas.width / canvas.height;
    uniformValues.set([0.5 / aspect, 0.5], 4); //scale
    
    device.queue.writeBuffer(uniformBuffer, 0, uniformValues);
    ...
    pass.setBindGroup(0, bindGroup);
    pass.draw(3);
    ...
}

Randomize variables

const rand = (min, max) => {
    return min + Math.random() * (max - min);
  };
  
  const numObjects = 100;
  const objectInfos = [];
  
  for (let i = 0; i < numObjects; ++i) {
    const uniformBuffer = device.createBuffer({
      label: `uniforms for obj: ${i}`,
      size: 32,
      usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
    });
  
    const uniformValues = new Float32Array(uniformBufferSize / 4);
    uniformValues.set([rand(), rand(), rand(), 1], 0); //color
    uniformValues.set([rand(-0.9, 0.9), rand(-0.9, 0.9)], 6); //offset
  
    const bindGroup = device.createBindGroup({
      label: `bind group for obj: ${i}`,
      layout: pipeline.getBindGroupLayout(0),
      entries: [
        { binding: 0, resource: { buffer: uniformBuffer }},
      ],
    });
  
    objectInfos.push({
      scale: rand(0.2, 0.7),
      uniformBuffer,
      uniformValues,
      bindGroup,
    });
  }
  
  function render() {
    renderPassDescriptor.colorAttachments[0].view =
        context.getCurrentTexture().createView();
  
    const encoder = device.createCommandEncoder();
    const pass = encoder.beginRenderPass(renderPassDescriptor);
    pass.setPipeline(pipeline);
  
    const aspect = canvas.width / canvas.height;
  
    for (const {scale, bindGroup, uniformBuffer, uniformValues} of objectInfos) {
      uniformValues.set([scale / aspect, scale], 4); //scale
      device.queue.writeBuffer(uniformBuffer, 0, uniformValues);
  
      pass.setBindGroup(0, bindGroup);
      pass.draw(3);
    }
    pass.end();
  
    const commandBuffer = encoder.finish();
    device.queue.submit([commandBuffer]);
  }

Split Uniforms

const module = device.createShaderModule({
    code: `
      struct ObjStruct {
        color: vec4f,
        offset: vec2f,
      };
      struct ObjScaleStruct {
        scale: vec2f,
      };
  
      @group(0) @binding(0) var<uniform> objStruct: ObjStruct;
      @group(0) @binding(1) var<uniform> objScaleStruct: ObjScaleStruct;
  
      @vertex fn vs(
        @builtin(vertex_index) vertexIndex : u32
      ) -> @builtin(position) vec4f {
        let pos = array(
          vec2f( 0.0,  0.5),  // top center
          vec2f(-0.5, -0.5),  // bottom left
          vec2f( 0.5, -0.5)   // bottom right
        );
  
        return vec4f(
          pos[vertexIndex] * objScaleStruct.scale + objStruct.offset, 0.0, 1.0);
      }
  
      @fragment fn fs() -> @location(0) vec4f {
        return objStruct.color;
      }
    `,
 });
  
  const staticUniformBufferSize = 32; //color is 4*4bytes, offset 2*4bytes, and 2*4bytes padding
  const uniformBufferSize = 8; //scale is 2*4bytes
  
  const kNumObjects = 100;
  const objectInfos = [];
  
  for (let i = 0; i < kNumObjects; ++i) {
    const staticUniformBuffer = device.createBuffer({
      label: `static uniforms for obj: ${i}`,
      size: staticUniformBufferSize,
      usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
    });
  
    {
      const uniformValues = new Float32Array(staticUniformBufferSize / 4);
      uniformValues.set([rand(), rand(), rand(), 1], 0); //color
      uniformValues.set([rand(-0.9, 0.9), rand(-0.9, 0.9)], 4); //offset
  
      device.queue.writeBuffer(staticUniformBuffer, 0, uniformValues);
    }
  
    const uniformValues = new Float32Array(uniformBufferSize / 4);
    const uniformBuffer = device.createBuffer({
      label: `changing uniforms for obj: ${i}`,
      size: uniformBufferSize,
      usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
    });
  
    const bindGroup = device.createBindGroup({
      label: `bind group for obj: ${i}`,
      layout: pipeline.getBindGroupLayout(0),
      entries: [
        { binding: 0, resource: { buffer: staticUniformBuffer }},
        { binding: 1, resource: { buffer: uniformBuffer }},
      ],
    });
  
    objectInfos.push({
      scale: rand(0.2, 0.5),
      uniformBuffer,
      uniformValues,
      bindGroup,
    });
 }

The render() code doesnt change except for the scale offset in uniformValues which is 0 now. The bind group for each object contains a reference to both uniform buffers. When setting the scale now it only updates that instead of all the values.

Storage Buffers

Differences between uniform buffers and storage buffers:

Uniform buffers can be faster for their typical use-case
Storage buffers can be much larger than uniform buffers
- min-max size of a uniform buffer is 64k
- min-max size of a storage buffer is 128meg
Storage buffers can be read/write, uniform buffers are read-only

Instancing

Random triangles code changed to use storage buffers and draw all objects in a single draw call.

const rand = (min, max) => {
  if (min === undefined) {
    min = 0;
    max = 1;
  } else if (max === undefined) {
    max = min;
    min = 0;
  }
  return min + Math.random() * (max - min);
};

async function main() {
  const adapter = await navigator.gpu?.requestAdapter();
  const device = await adapter?.requestDevice();
  if (!device) {
    alert('need a browser that supports WebGPU');
    return;
  }

  // Get a WebGPU context from the canvas and configure it
  const canvas = document.querySelector('canvas');
  const context = canvas.getContext('webgpu');
  const presentationFormat = navigator.gpu.getPreferredCanvasFormat();
  context.configure({
    device,
    format: presentationFormat,
  });

  const module = device.createShaderModule({
    code: `
      struct TestStruct {
        color: vec4f,
        offset: vec2f,
      };

      struct ScaleStruct {
        scale: vec2f,
      };

      struct VSOutput {
        @builtin(position) position: vec4f,
        @location(0) color: vec4f,
      }

      @group(0) @binding(0) var<storage, read> testStructs: array<TestStruct>;
      @group(0) @binding(1) var<storage, read> scaleStructs: array<ScaleStruct>;

      @vertex fn vs(
        @builtin(vertex_index) vertexIndex : u32,
        @builtin(instance_index) instanceIndex: u32
      ) -> VSOutput {
        let pos = array(
          vec2f( 0.0,  0.5),  // top center
          vec2f(-0.5, -0.5),  // bottom left
          vec2f( 0.5, -0.5)   // bottom right
        );

        let scaleStruct = scaleStructs[instanceIndex];
        let testStruct = testStructs[instanceIndex];

        var vsOut: VSOutput;
        vsOut.position = vec4f(
            pos[vertexIndex] * scaleStruct.scale + testStruct.offset, 0.0, 1.0);
        vsOut.color = testStruct.color;
        return vsOut;
      }

      @fragment fn fs(vsOut: VSOutput) -> @location(0) vec4f {
        return vsOut.color;
      }
    `,
  });

  const pipeline = device.createRenderPipeline({
    label: 'split storage buffer pipeline',
    layout: 'auto',
    vertex: {
      module,
      entryPoint: 'vs',
    },
    fragment: {
      module,
      entryPoint: 'fs',
      targets: [{ format: presentationFormat }],
    },
  });

  const numObjects = 100;

  // create 2 storage buffers
  const staticStorageUnitSize = 32; //color 4*4 bytes, offset 2*4, buffer 2*4
  const storageUnitSize = 8; //scale 2*4
  const staticStorageBufferSize = staticStorageUnitSize * numObjects;
  const storageBufferSize = storageUnitSize * numObjects;

  const staticStorageBuffer = device.createBuffer({
    label: 'static storage for objects',
    size: staticStorageBufferSize,
    usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
  });

  const storageBuffer = device.createBuffer({
    label: 'changing storage for objects',
    size: storageBufferSize,
    usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
  });

  const staticStorageValues = new Float32Array(staticStorageBufferSize / 4);
  const storageValues = new Float32Array(storageBufferSize / 4);

  // offsets to the various storage values in float32 indices
  const kColorOffset = 0;
  const kOffsetOffset = 4;
  const kScaleOffset = 0;

  const objectInfos = [];

  for (let i = 0; i < numObjects; ++i) {
    const staticOffset = i * (staticStorageUnitSize / 4);

    staticStorageValues.set([rand(), rand(), rand(), 1], staticOffset + kColorOffset);        // set the color
    staticStorageValues.set([rand(-0.9, 0.9), rand(-0.9, 0.9)], staticOffset + kOffsetOffset);      // set the offset

    objectInfos.push({
      scale: rand(0.2, 0.5),
    });
  }
  device.queue.writeBuffer(staticStorageBuffer, 0, staticStorageValues);

  const bindGroup = device.createBindGroup({
    label: 'bind group for objects',
    layout: pipeline.getBindGroupLayout(0),
    entries: [
      { binding: 0, resource: { buffer: staticStorageBuffer }},
      { binding: 1, resource: { buffer: storageBuffer }},
    ],
  });

  const renderPassDescriptor = {
    label: 'canvas renderPass',
    colorAttachments: [
      {
        // view: <- to be filled out when rendering
        clearValue: [0.3, 0.3, 0.3, 1],
        loadOp: 'clear',
        storeOp: 'store',
      },
    ],
  };

  function render() {
    renderPassDescriptor.colorAttachments[0].view =
        context.getCurrentTexture().createView();

    const encoder = device.createCommandEncoder();
    const pass = encoder.beginRenderPass(renderPassDescriptor);
    pass.setPipeline(pipeline);

    const aspect = canvas.width / canvas.height;

    objectInfos.forEach(({scale}, ndx) => {
      const offset = ndx * (storageUnitSize / 4);
      storageValues.set([scale / aspect, scale], offset + kScaleOffset);
    });
    device.queue.writeBuffer(storageBuffer, 0, storageValues);

    pass.setBindGroup(0, bindGroup);
    pass.draw(3, kNumObjects);  //call the vertex shader 3 times for a number of instances
    pass.end();

    device.queue.submit([encoder.finish()]);
  }

  const observer = new ResizeObserver(entries => {
    for (const entry of entries) {
      const canvas = entry.target;
      const width = entry.contentBoxSize[0].inlineSize;
      const height = entry.contentBoxSize[0].blockSize;
      canvas.width = canvas.clientWidth * window.devicePixelRatio;
      canvas.height = canvas.clientHeight * window.devicePixelRatio;
      render();
    }
  });
  observer.observe(canvas);
}
main();

This is just one of the possible ways to handle drawing alot of instances of the same object.

- The buffer usage flag is set to storage
- The storage variables in the shader code are declared as array of the struct
- A new parameter @builtin(instance_index) is added to the vertex shader,
which gets its value for each instance drawn
- Another struct is created for the output of the vertex shader,
which now also returns a color to pass to the fragment shader
- In the render function, set the scale in storageValues for each
object in objectInfos
and then upload all to the buffer at once
- Finally, call draw 3 times for each instance

Using storage buffers for vertex data

Another use case of storage buffers would be to store vertex data.
This is the previous example changed to draw rings instead:

const rand = (min, max) => {
  if (min === undefined) {
    min = 0;
    max = 1;
  } else if (max === undefined) {
    max = min;
    min = 0;
  }
  return min + Math.random() * (max - min);
};

function createCircleVertices({
  radius = 1,
  numSubdivisions = 24,
  innerRadius = 0,
  startAngle = 0,
  endAngle = Math.PI * 2,
} = {}) {
  // 2 triangles per subdivision, 3 verts per tri, 2 values (xy) each.
  const numVertices = numSubdivisions * 3 * 2;
  const vertexData = new Float32Array(numSubdivisions * 2 * 3 * 2);

  let offset = 0;
  const addVertex = (x, y) => {
    vertexData[offset++] = x;
    vertexData[offset++] = y;
  };

  for (let i = 0; i < numSubdivisions; ++i) {
    const angle1 = startAngle + (i + 0) * (endAngle - startAngle) / numSubdivisions;
    const angle2 = startAngle + (i + 1) * (endAngle - startAngle) / numSubdivisions;

    // first triangle
    addVertex(Math.cos(angle1) * radius, Math.sin(angle1) * radius);
    addVertex(Math.cos(angle2) * radius, Math.sin(angle2) * radius);
    addVertex(Math.cos(angle1) * innerRadius, Math.sin(angle1) * innerRadius);
    // second triangle
    addVertex(Math.cos(angle1) * innerRadius, Math.sin(angle1) * innerRadius);
    addVertex(Math.cos(angle2) * radius, Math.sin(angle2) * radius);
    addVertex(Math.cos(angle2) * innerRadius, Math.sin(angle2) * innerRadius);
  }

  return {
    vertexData,
    numVertices,
  };
}

async function main() {
  const adapter = await navigator.gpu?.requestAdapter();
  const device = await adapter?.requestDevice();
  if (!device) {
    alert('need a browser that supports WebGPU');
    return;
  }

  const canvas = document.querySelector('canvas');
  const context = canvas.getContext('webgpu');
  const presentationFormat = navigator.gpu.getPreferredCanvasFormat();
  context.configure({
    device,
    format: presentationFormat,
  });

  const module = device.createShaderModule({
    code: `
      struct OurStruct {
        color: vec4f,
        offset: vec2f,
      };

      struct OtherStruct {
        scale: vec2f,
      };

      struct Vertex {
        position: vec2f,
      };

      struct VSOutput {
        @builtin(position) position: vec4f,
        @location(0) color: vec4f,
      };

      @group(0) @binding(0) var<storage, read> ourStructs: array<OurStruct>;
      @group(0) @binding(1) var<storage, read> otherStructs: array<OtherStruct>;
      @group(0) @binding(2) var<storage, read> pos: array<Vertex>;

      @vertex fn vs(
        @builtin(vertex_index) vertexIndex : u32,
        @builtin(instance_index) instanceIndex: u32
      ) -> VSOutput {
        let otherStruct = otherStructs[instanceIndex];
        let ourStruct = ourStructs[instanceIndex];

        var vsOut: VSOutput;
        vsOut.position = vec4f(
            pos[vertexIndex].position * otherStruct.scale + ourStruct.offset, 0.0, 1.0);
        vsOut.color = ourStruct.color;
        return vsOut;
      }

      @fragment fn fs(vsOut: VSOutput) -> @location(0) vec4f {
        return vsOut.color;
      }
    `,
  });

  const pipeline = device.createRenderPipeline({
    label: 'storage buffer vertices',
    layout: 'auto',
    vertex: {
      module,
      entryPoint: 'vs',
    },
    fragment: {
      module,
      entryPoint: 'fs',
      targets: [{ format: presentationFormat }],
    },
  });

  const kNumObjects = 100;
  const objectInfos = [];

  const staticUnitSize = 32; //color 4*4 bytes, offset 2*4, buffer 2*4
  const changingUnitSize = 8; //scale 2*4
  const staticStorageBufferSize = staticUnitSize * kNumObjects;
  const changingStorageBufferSize = changingUnitSize * kNumObjects;

  const staticStorageBuffer = device.createBuffer({
    label: 'static storage for objects',
    size: staticStorageBufferSize,
    usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
  });

  const changingStorageBuffer = device.createBuffer({
    label: 'changing storage for objects',
    size: changingStorageBufferSize,
    usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
  });

  // offsets to the various uniform values in float32 indices
  const kColorOffset = 0;
  const kOffsetOffset = 4;
  const kScaleOffset = 0;

  {
    const staticStorageValues = new Float32Array(staticStorageBufferSize / 4);
    for (let i = 0; i < kNumObjects; ++i) {
      const staticOffset = i * (staticUnitSize / 4);

      staticStorageValues.set([rand(), rand(), rand(), 1], staticOffset + kColorOffset);        // set the color
      staticStorageValues.set([rand(-0.9, 0.9), rand(-0.9, 0.9)], staticOffset + kOffsetOffset);      // set the offset

      objectInfos.push({
        scale: rand(0.2, 0.5),
      });
    }
    device.queue.writeBuffer(staticStorageBuffer, 0, staticStorageValues);
  }

  // a typed array to be used to update the changingStorageBuffer
  const storageValues = new Float32Array(changingStorageBufferSize / 4);

  // setup storage buffer for vertex data
  const { vertexData, numVertices } = createCircleVertices({
    radius: 0.5,
    innerRadius: 0.25,
  });
  const vertexStorageBuffer = device.createBuffer({
    label: 'storage buffer vertices',
    size: vertexData.byteLength,
    usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
  });
  device.queue.writeBuffer(vertexStorageBuffer, 0, vertexData);

  const bindGroup = device.createBindGroup({
    label: 'bind group for objects',
    layout: pipeline.getBindGroupLayout(0),
    entries: [
      { binding: 0, resource: { buffer: staticStorageBuffer }},
      { binding: 1, resource: { buffer: changingStorageBuffer }},
      { binding: 2, resource: { buffer: vertexStorageBuffer }},
    ],
  });

  const renderPassDescriptor = {
    label: 'our basic canvas renderPass',
    colorAttachments: [
      {
        // view: <- to be filled out when rendering
        clearValue: [0.3, 0.3, 0.3, 1],
        loadOp: 'clear',
        storeOp: 'store',
      },
    ],
  };

  function render() {
    renderPassDescriptor.colorAttachments[0].view =
        context.getCurrentTexture().createView();

    const encoder = device.createCommandEncoder();
    const pass = encoder.beginRenderPass(renderPassDescriptor);
    pass.setPipeline(pipeline);

    const aspect = canvas.width / canvas.height;
    objectInfos.forEach(({scale}, ndx) => {
      const offset = ndx * (changingUnitSize / 4);
      storageValues.set([scale / aspect, scale], offset + kScaleOffset);
    });
    device.queue.writeBuffer(changingStorageBuffer, 0, storageValues);

    pass.setBindGroup(0, bindGroup);
    pass.draw(numVertices, kNumObjects);
    pass.end();

    device.queue.submit([encoder.finish()]);
  }

  const observer = new ResizeObserver(entries => {
    for (const entry of entries) {
      const canvas = entry.target;
      canvas.width = canvas.clientWidth * window.devicePixelRatio;
      canvas.height = canvas.clientHeight * window.devicePixelRatio;
      render();
    }
  });
  observer.observe(canvas);
}
main();

- Another struct and variable is added in the vertex shader,
it replaces the hard-coded array of positions
- Another storage buffer is added for the vertices for a circle
- The buffer is added to the bind group

The ring is drawn from triangles like this

Screenshot 2024-02-19 150932.png

Vertex Buffers

While it is growing in popularity to use storage buffers,
the traditional way to provide vertex data to a vertex shader is via vertex buffers
and attributes.

The difference is that vertex buffers arent accessed directly from the vertex shader.
Instead, WGPU is told what kind of data is in the buffer and how its organized,
it then pulls the data out of the buffer and provides it.

Changes to the random rings example to use a vertex buffer:

const module = device.createShaderModule({
    code: `
        struct Vertex {
          @location(0) position: vec2f,
        };

        ...

        @group(0) @binding(0) var<storage, read> testStructs: array<TestStruct>;
        @group(0) @binding(1) var<storage, read> scaleStructs: array<ScaleStruct>;

        @vertex fn vs(
          vert: Vertex,
          @builtin(instance_index) instanceIndex: u32
        ) -> VSOutput {
          ...

          var vsOut: VSOutput;
          vsOut.position = vec4f(
            vert.position * scaleStruct.scale + testStruct.offset, 0.0, 1.0);
          vsOut.color = testStruct.color;
          return vsOut;
        }
        ...
    `,
});

const pipeline = device.createRenderPipeline({
    label: 'vertex buffer pipeline',
    layout: 'auto',
    vertex: {
        module,
        entryPoint: 'vs',
        buffers: [{
            arrayStride: 2 * 4, // 2 floats, 4 bytes each
            attributes: [
                {shaderLocation: 0, offset: 0, format: 'float32x2'},  // position
            ],
        }],
    },
    ...
});

const vertexBuffer = device.createBuffer({
    label: 'vertex buffer vertices',
    size: vertexData.byteLength,
    usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
});
device.queue.writeBuffer(vertexBuffer, 0, vertexData);

function render() {
    pass.setPipeline(pipeline);
    pass.setVertexBuffer(0, vertexBuffer);
    ...
}

- Declares a struct to define the data for a vertex, important is to use @location(0) for the position field
- Adds a buffers array to the vertex entry of the pipeline, describing how to pull data out of 1 or more vertex buffers, arrayStride is how many bytes to get from the data for one vertex in the buffer to the next, the shaderLocation attribute corresponds to location(0)
- Changes the usage of the buffer holding vertex data to VERTEX and remove it from the bind group
- Sets the vertex buffer to use at draw time, the 0 corresponds to the first element of the render pipeline buffers array

The vertex format can be one of the values specified in the GPUVertexFormat definition.

Instancing

Attributes can advance per vertex or per instance.
Advancing per instance would be using @builtin(instance_index)
To accomplish the same thing with vertex buffers:

const module = device.createShaderModule({
  code: `
    struct Vertex {
      @location(0) position: vec2f,
      @location(1) color: vec4f,
      @location(2) offset: vec2f,
      @location(3) scale: vec2f,
    };

    struct VSOutput {
      @builtin(position) position: vec4f,
      @location(0) color: vec4f,
    };

    @vertex fn vs(
      vert: Vertex,
    ) -> VSOutput {
      var vsOut: VSOutput;
      vsOut.position = vec4f(
          vert.position * vert.scale + vert.offset, 0.0, 1.0);
      vsOut.color = vert.color;
      return vsOut;
    }
  `,
});

const pipeline = device.createRenderPipeline({
  label: 'flat colors',
  layout: 'auto',
  vertex: {
    module,
    entryPoint: 'vs',
    buffers: [
      {
        arrayStride: 2 * 4, // 2 floats, 4 bytes each
        attributes: [
          {shaderLocation: 0, offset: 0, format: 'float32x2'},  // position
        ],
      },
      {
        arrayStride: 6 * 4, // 6 floats, 4 bytes each
        stepMode: 'instance',
        attributes: [
          {shaderLocation: 1, offset:  0, format: 'float32x4'},  // color
          {shaderLocation: 2, offset: 16, format: 'float32x2'},  // offset
        ],
      },
      {
        arrayStride: 2 * 4, // 2 floats, 4 bytes each
        stepMode: 'instance',
        attributes: [
          {shaderLocation: 3, offset: 0, format: 'float32x2'},   // scale
        ],
      },
    ],
  },
});

const staticUnitSize = 24; //color is 4*4, offset is 2*4
const changingUnitSize = 8; //scale
const staticVertexBufferSize = staticUnitSize * kNumObjects;
const changingVertexBufferSize = changingUnitSize * kNumObjects;

const staticVertexBuffer = device.createBuffer({
  label: 'static vertex for objects',
  size: staticVertexBufferSize,
  usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
});

const changingVertexBuffer = device.createBuffer({
  label: 'changing vertex for objects',
  size: changingVertexBufferSize,
  usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
});

- Changes the shader to only use vertex attributes
- Updates the render pipeline to tell it how to supply data to those attributes, stepMode: 'instance' means the attribute will only advance per instance, the default would be 'vertex'
- Changes the buffers usage to vertex, removes the padding from the staticUnitSize since vertex attributes do not have the same restrictions
- The bind group is no longer needed
- In the render function, set the buffers instead of the bind group

Make rings shaded:

function createCircleVertices({
  ...
} = {}) {
  // 2 triangles per subdivision, 3 verts per tri, 5 values (xyrgb) each.
  const numVertices = numSubdivisions * 3 * 2;
  const vertexData = new Float32Array(numVertices * (2 + 3));
 
  let offset = 0;
  const addVertex = (x, y, r, g, b) => {
    vertexData[offset++] = x;
    vertexData[offset++] = y;
    vertexData[offset++] = r;
    vertexData[offset++] = g;
    vertexData[offset++] = b;
  };
 
  const innerColor = [1, 1, 1];
  const outerColor = [0.1, 0.1, 0.1];
 
  for (let i = 0; i < numSubdivisions; ++i) {
    ...
 
    // first triangle
    addVertex(c1 * radius, s1 * radius, ...outerColor);
    addVertex(c2 * radius, s2 * radius, ...outerColor);
    addVertex(c1 * innerRadius, s1 * innerRadius, ...innerColor);
    // second triangle
    addVertex(c1 * innerRadius, s1 * innerRadius, ...innerColor);
    addVertex(c2 * radius, s2 * radius, ...outerColor);
    addVertex(c2 * innerRadius, s2 * innerRadius, ...innerColor);
  }
 
  ...
}

const module = device.createShaderModule({
  code: `
    struct Vertex {
      @location(0) position: vec2f,
      @location(1) color: vec4f,
      @location(2) offset: vec2f,
      @location(3) scale: vec2f,
      @location(4) perVertexColor: vec3f,
    };

    ...

    @vertex fn vs(
      vert: Vertex,
    ) -> VSOutput {
      var vsOut: VSOutput;
      vsOut.position = vec4f(
          vert.position * vert.scale + vert.offset, 0.0, 1.0);
      vsOut.color = vert.color * vec4f(vert.perVertexColor, 1);
      return vsOut;
    }

    ...
  `,
});

const pipeline = device.createRenderPipeline({
  label: 'per vertex color',
  layout: 'auto',
  vertex: {
    module,
    entryPoint: 'vs',
    buffers: [
      {
        arrayStride: 5 * 4, // 5 floats, 4 bytes each
        attributes: [
          {shaderLocation: 0, offset: 0, format: 'float32x2'},  // position
          {shaderLocation: 4, offset: 8, format: 'float32x3'},  // perVertexColor
        ],
      },
      ...
    ],
  },
  ...
});

- Changes the circle vertex generation code to provide a dark color for vertices on the outer edge and a light color for the inner vertices
- Adds another attribute for a per vertex color to the shader
- Updates the buffer in the pipeline, interleave perVertexColor with the position attribute

Using normalized values to save space

The space usage could be optimized by using 8bit values for the color which has 4*4 bytes
and normalizing it from 0 ↔ 255 to 0.0 ↔ 1.0

function createCircleVertices({
  ...
} = {}) {
  // 2 triangles per subdivision, 3 verts per tri
  const numVertices = numSubdivisions * 3 * 2;
  // 2 32-bit values for position (xy) and 1 32-bit value for color (rgb_)
  // The 32-bit color value will be written/read as 4 8-bit values
  const vertexData = new Float32Array(numVertices * (2 + 1));
  const colorData = new Uint8Array(vertexData.buffer);
 
  let offset = 0;
  let colorOffset = 8;
  const addVertex = (x, y, r, g, b) => {
    vertexData[offset++] = x;
    vertexData[offset++] = y;
    offset += 1;  // skip the color
    colorData[colorOffset++] = r * 255;
    colorData[colorOffset++] = g * 255;
    colorData[colorOffset++] = b * 255;
    colorOffset += 9;  // skip extra byte and the position
  };
  ...
}

const staticUnitSize = 4 + 2 * 4;    // color is 4 bytes, offset is 2*4 bytes

...

{
  const staticVertexValuesU8 = new Uint8Array(staticVertexBufferSize);
  const staticVertexValuesF32 = new Float32Array(staticVertexValuesU8.buffer);
  for (let i = 0; i < kNumObjects; ++i) {
    const staticOffsetU8 = i * staticUnitSize;
    const staticOffsetF32 = staticOffsetU8 / 4;

    staticVertexValuesU8.set(
        [rand() * 255, rand() * 255, rand() * 255, 255], //color
        staticOffsetU8 + kColorOffset);

    staticVertexValuesF32.set(
        [rand(-0.9, 0.9), rand(-0.9, 0.9)], //offset
        staticOffsetF32 + kOffsetOffset);

    objectInfos.push({
      scale: rand(0.2, 0.5),
    });
  }
  device.queue.writeBuffer(staticVertexBuffer, 0, staticVertexValuesF32);
}

const pipeline = device.createRenderPipeline({
  label: 'per vertex color',
  layout: 'auto',
  vertex: {
    module,
    entryPoint: 'vs',
    buffers: [
      {
        arrayStride: 2 * 4 + 4, // 2 floats, 4 bytes each + 4 bytes
        attributes: [
          {shaderLocation: 0, offset: 0, format: 'float32x2'},  // position
          {shaderLocation: 4, offset: 8, format: 'unorm8x4'},   // perVertexColor
        ],
      },
      {
        arrayStride: 4 + 2 * 4, // 4 bytes + 2 floats, 4 bytes each
        stepMode: 'instance',
        attributes: [
          {shaderLocation: 1, offset: 0, format: 'unorm8x4'},   // color
          {shaderLocation: 2, offset: 4, format: 'float32x2'},  // offset
        ],
      },
      ...
    ],
  },
  ...
});

- Changes the vertex generation function and creates a Uint8Array view of the same data as vertexData ,
then when inserting the colors, expand it from 0 ↔ 1 to 0 ↔ 255
- Updates the per instance data, include 4 bytes for color instead of 4*4 in the staticUnitSize and set the buffer values accordingly to the new sizing
- Changes the pipeline to pull out the data as 8bit unsigned values and normalize them back to 0 ↔ 1

These changes result in 20 bytes per vertex being reduced to 12 and 24 bytes per instance to 12,
a 40% and 50% saving

Index Buffers

They describe the order to process and use the vertices.

Changed createCircleVertices function to reuse vertices when calculating subdivisions:

function createCircleVertices({
  ...
} = {}) {
  // 2 vertices at each subdivision, + 1 to wrap around the circle.
  const numVertices = (numSubdivisions + 1) * 2;

  ...

  for (let i = 0; i <= numSubdivisions; ++i) {
    const angle = startAngle + (i + 0) * (endAngle - startAngle) / numSubdivisions;

    const c1 = Math.cos(angle);
    const s1 = Math.sin(angle);

    addVertex(c1 * radius, s1 * radius, ...outerColor);
    addVertex(c1 * innerRadius, s1 * innerRadius, ...innerColor);
  }

  const indexData = new Uint32Array(numSubdivisions * 6);
  let ndx = 0;

  for (let i = 0; i < numSubdivisions; ++i) {
    const ndxOffset = i * 2;

    // first triangle
    indexData[ndx++] = ndxOffset;
    indexData[ndx++] = ndxOffset + 1;
    indexData[ndx++] = ndxOffset + 2;

    // second triangle
    indexData[ndx++] = ndxOffset + 2;
    indexData[ndx++] = ndxOffset + 1;
    indexData[ndx++] = ndxOffset + 3;
  }

  return {
    vertexData,
    indexData,
    numVertices: indexData.length,
  };
}

const { vertexData, indexData, numVertices } = createCircleVertices({
  radius: 0.5,
  innerRadius: 0.25,
});

const indexBuffer = device.createBuffer({
  label: 'index buffer',
  size: indexData.byteLength,
  usage: GPUBufferUsage.INDEX | GPUBufferUsage.COPY_DST,
});
device.queue.writeBuffer(indexBuffer, 0, indexData);

...
pass.setIndexBuffer(indexBuffer, 'uint32');
pass.drawIndexed(numVertices, kNumObjects);

Textures

Textures most often represent a 2D image, which is just a 2D array of color values.
What makes them special is that they can be accessed by special hardware called a sampler.
It can read up to 16 different values in a texture and blend them together in a way that is useful for many common use cases.
For example increasing the size of an image and filtering it to look less pixelated.

Modified inter-stage variables example drawing a quad with a texture:

async function main() {
  const adapter = await navigator.gpu?.requestAdapter();
  const device = await adapter?.requestDevice();
  if (!device) {
    alert('need a browser that supports WebGPU');
    return;
  }

  const canvas = document.querySelector('canvas');
  const context = canvas.getContext('webgpu');
  const presentationFormat = navigator.gpu.getPreferredCanvasFormat();
  context.configure({
    device,
    format: presentationFormat,
  });

  const module = device.createShaderModule({
    label: 'hardcoded textured quad shaders',
    code: `
      struct VertexShaderOutput {
        @builtin(position) position: vec4f,
        @location(0) texcoord: vec2f,
      };

      @vertex fn vs(
        @builtin(vertex_index) vertexIndex : u32
      ) -> VertexShaderOutput {
        let pos = array(
          vec2f( 0.0,  0.0),  // center
          vec2f( 1.0,  0.0),  // right, center
          vec2f( 0.0,  1.0),  // center, top
          vec2f( 0.0,  1.0),  // center, top
          vec2f( 1.0,  0.0),  // right, center
          vec2f( 1.0,  1.0),  // right, top
        );

        var vsOutput: VertexShaderOutput;
        let xy = pos[vertexIndex];
        vsOutput.position = vec4f(xy, 0.0, 1.0);
        vsOutput.texcoord = xy;
        return vsOutput;
      }

      @group(0) @binding(0) var testSampler: sampler;
      @group(0) @binding(1) var testTexture: texture_2d<f32>;

      @fragment fn fs(fsInput: VertexShaderOutput) -> @location(0) vec4f {
        return textureSample(testTexture, testSampler, fsInput.texcoord);
      }
    `,
  });

  const pipeline = device.createRenderPipeline({
    label: 'hardcoded textured quad pipeline',
    layout: 'auto',
    vertex: {
      module,
      entryPoint: 'vs',
    },
    fragment: {
      module,
      entryPoint: 'fs',
      targets: [{ format: presentationFormat }],
    },
  });

  const kTextureWidth = 5;
  const kTextureHeight = 7;
  const _ = [255,   0,   0, 255];  // red
  const y = [255, 255,   0, 255];  // yellow
  const b = [  0,   0, 255, 255];  // blue
  const textureData = new Uint8Array([
    b, _, _, _, _,
    _, y, y, y, _,
    _, y, _, _, _,
    _, y, y, _, _,
    _, y, _, _, _,
    _, y, _, _, _,
    _, _, _, _, _,
  ].flat());

  const texture = device.createTexture({
    label: 'yellow F on red',
    size: [kTextureWidth, kTextureHeight],
    format: 'rgba8unorm',
    usage:
      GPUTextureUsage.TEXTURE_BINDING |
      GPUTextureUsage.COPY_DST,
  });
  device.queue.writeTexture(
      { texture },
      textureData,
      { bytesPerRow: kTextureWidth * 4 },
      { width: kTextureWidth, height: kTextureHeight },
  );

  const sampler = device.createSampler();

  const bindGroup = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [
      { binding: 0, resource: sampler },
      { binding: 1, resource: texture.createView() },
    ],
  });

  const renderPassDescriptor = {
    label: 'basic canvas renderPass',
    colorAttachments: [
      {
        // view: <- to be filled out when rendering
        clearValue: [0.3, 0.3, 0.3, 1],
        loadOp: 'clear',
        storeOp: 'store',
      },
    ],
  };

  function render() {
    renderPassDescriptor.colorAttachments[0].view =
        context.getCurrentTexture().createView();

    const encoder = device.createCommandEncoder({
      label: 'render quad encoder',
    });
    const pass = encoder.beginRenderPass(renderPassDescriptor);
    pass.setPipeline(pipeline);
    pass.setBindGroup(0, bindGroup);
    pass.draw(6);
    pass.end();

    const commandBuffer = encoder.finish();
    device.queue.submit([commandBuffer]);
  }

  const observer = new ResizeObserver(entries => {
    for (const entry of entries) {
      const canvas = entry.target;
      canvas.width = canvas.clientWidth * window.devicePixelRatio;
      canvas.height = canvas.clientHeight * window.devicePixelRatio;
      render();
    }
  });
  observer.observe(canvas);
}
main();

- Change the positions in the vertex shader to draw a quad in the top right of the canvas, and change the output to pass texcoord instead of color, which will also be interpolated between the 3 vertices of each triangle
- Creates a 5x7 texel F in an rgba8unorm texture, unorm means unsigned normalized which means the values will be converted from unsigned bytes with values from 0 to 255 to floating point values from 0.0 to 1.0
- Creates a texture, sets its size, format, and sets the usage flag to TEXTURE_BINDING to be able bind it into a bind group, COPY_DST means that data can be copied to it
- Copies data to the texture with device.queue.writeTexture , the third parameter specifies how many bytes to get from one row of the source data to the next row
- Creates a sampler with device.createSampler
- Adds the texture and sampler to a bind group with bindings that match the @binding in the shader
- Update the render function: set the bind group and call the vertex shader 6 times

Note: It's not common to pass position values as texture coordinates but in this case it happens that the texture coordinates needed match the positions.

To fix the texture being upside down, either flip the coordinates in the vertex or fragment shader, or simply flip the textureData

  vsOutput.texcoord = vec2f(xy.x, 1.0 - xy.y);

magFilter

Since the previously drawn texture is larger than 5x7 texel, the magFilter is used by the sampler, which magnifies the texture. If the filter was changed from nearest to linear then it would linearly interpolate between 4 pixels.

Texture coordinates are often called UVs, for a given uv, the closest 4 pixels are chosen.
t1 is the horizontal distance between the top left chosen pixel's center and the pixel to its right's center.
t2 is the same but vertically.

The sampler settings addressModeU and addressModeV can be set to repeat or clamp-to-edge,
when set to 'repeat' and the coordinate is within half a texel of the edge of the texture it wraps around and blends with pixels on the opposite side,
when set to 'clamp-to-edge', the coordinate is clamped so that it cant go into the half texel on the edge

const sampler = device.createSampler({
   addressModeU: 'repeat',
   addressModeV:'repeat',
   magFilter: 'linear',
});

minFilter

The minFilter setting does similar math to magFilter for when the texture is drawn smaller than its size.

Changes to the previous example to create a 2x2 moving quad to show the flickering:

canvas {
  image-rendering: pixelated;
  image-rendering: crisp-edges;
}

// CSS ^

const module = device.createShaderModule({
  label: 'our hardcoded textured quad shaders',
  code: `
    struct Uniforms {
      scale: vec2f,
      offset: vec2f,
    };
    ...

    @group(0) @binding(2) var<uniform> uni: Uniforms;

    @vertex fn vs(
      @builtin(vertex_index) vertexIndex : u32
    ) -> OurVertexShaderOutput {
      let pos = array(
        ...
      );

      var vsOutput: OurVertexShaderOutput;
      let xy = pos[vertexIndex];
      vsOutput.position = vec4f(xy * uni.scale + uni.offset, 0.0, 1.0);
      vsOutput.texcoord = xy;
      return vsOutput;
    }

    ...
  `,
});

const uniformBufferSize = 16 + // scale is 2*4, offset is 2*4
const uniformBuffer = device.createBuffer({
  label: 'uniforms for quad',
  size: uniformBufferSize,
  usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
});

const uniformValues = new Float32Array(uniformBufferSize / 4);

const kScaleOffset = 0;
const kOffsetOffset = 2;

const sampler = device.createSampler({
   addressModeU: 'repeat',
   addressModeV:'repeat',
   magFilter: 'linear',
   minFilter: 'linear' // <--
});

const bindGroup = device.createBindGroup({
  layout: pipeline.getBindGroupLayout(0),
  entries: [
    { binding: 0, resource: sampler },
    { binding: 1, resource: texture.createView() },
    { binding: 2, resource: { buffer: uniformBuffer }}, // <--
  ],
});

...

function render(time) {
  time *= 0.001;
  const scaleX = 4 / canvas.width;
  const scaleY = 4 / canvas.height;

  uniformValues.set([scaleX, scaleY], kScaleOffset);
  uniformValues.set([Math.sin(time * 0.25) * 0.8, -0.8], kOffsetOffset);
  device.queue.writeBuffer(uniformBuffer, 0, uniformValues);

  ...
  requestAnimationFrame(render);
}
requestAnimationFrame(render);

const observer = new ResizeObserver(entries => {
  for (const entry of entries) {
    const canvas = entry.target;
    const width = entry.contentBoxSize[0].inlineSize / 64 | 0;
    const height = entry.contentBoxSize[0].blockSize / 64 | 0;
    canvas.width = Math.max(1, Math.min(width, device.limits.maxTextureDimension2D));
    canvas.height = Math.max(1, Math.min(height, device.limits.maxTextureDimension2D));
  }
});
observer.observe(canvas);

- Makes the canvas low-res in the CSS, so it doesnt do the same magFilter: 'linear' effect on it
- Lowers the resolution in the ResizeObserver callback as well - Adds a uniform variable for scale and offset in the vertex shader - Creates and adds a uniform buffer to the bind group - Sets the values in the render function, remove the render call in ResizeObserver and do a requestAnimationFrame loop instead (also called a rAF loop)

A reason for the flickering is that the quad is positioned with real numbers but pixels are integers. This can be solved with "mip-mapping", so pre-blending the colors when resizing. The best algorith for blending the pixels from one mip to the next is a matter of research and opinion. Here would be one in the example above.

But also any data can be put into each mip level using the canvas 2d api, for example:

const createCheckedMipmap = () => {
   const ctx = document.createElement('canvas').getContext('2d', {willReadFrequently: true});
   const levels = [
     { size: 64, color: 'rgb(128,0,255)', },
     { size: 32, color: 'rgb(0,255,0)', },
     { size: 16, color: 'rgb(255,0,0)', },
     { size:  8, color: 'rgb(255,255,0)', },
     { size:  4, color: 'rgb(0,0,255)', },
     { size:  2, color: 'rgb(0,255,255)', },
     { size:  1, color: 'rgb(255,0,255)', },
   ];
   return levels.map(({size, color}, i) => {
     ctx.canvas.width = size;
     ctx.canvas.height = size;
     ctx.fillStyle = i & 1 ? '#000' : '#fff';
     ctx.fillRect(0, 0, size, size);
     ctx.fillStyle = color;
     ctx.fillRect(0, 0, size / 2, size / 2);
     ctx.fillRect(size / 2, size / 2, size / 2, size / 2);
     return ctx.getImageData(0, 0, size, size);
   });
 };

This creates a mipmap with entirely different colors for each level for demonstration.

The texture is then created with that data

const createTextureWithMips = (mips, label) => {
   const texture = device.createTexture({
      label,
      size: [mips[0].width, mips[0].height],
      mipLevelCount: mips.length,
      format: 'rgba8unorm',
      usage:
         GPUTextureUsage.TEXTURE_BINDING |
         GPUTextureUsage.COPY_DST,
   });
   mips.forEach(({data, width, height}, mipLevel) => {
      device.queue.writeTexture(
         { texture, mipLevel },
         data,
         { bytesPerRow: width * 4 },
         { width, height },
      );
   });
   return texture;
};

const textures = [
   createTextureWithMips(createBlendedMipmap(), 'blended'),
   createTextureWithMips(createCheckedMipmap(), 'checker'),
];

mipmapFilter

The mipmapFilter setting can also be set to nearest or linear.
This chooses if colors are blended between mip levels.
It's most often used when drawing in 3D.
In the example above, all planes on the top are set to nearest and bottom to linear.
Which one to use depends on the style and also speed, nearest is faster.

Texture Types and Views

There are three types of textures: 1d, 2d, and 3d.
Each has its limits, these can be read with device.limits.maxTextureDimension2D for example

There are six types of texture views: 1d, 2d, 2d-array, 3d, cube, cube-array.
If a 2d texture has 6 layers it can have a cube view.

Each type of texture has its corresponding type in WGSL:

type	WGSL types
"1d"	`texture_1d` or `texture_storage_1d`
"2d"	`texture_2d` or `texture_storage_2d` or `texture_multisampled_2d` as well as a special case for in certain situations `texture_depth_2d` and `texture_depth_multisampled_2d`
"2d-array"	`texture_2d_array` or `texture_storage_2d_array` and sometimes `texture_depth_2d_array`
"3d"	`texture_3d` or `texture_storage_3d`
"cube"	`texture_cube` and sometimes `texture_depth_cube`
"cube-array"	`texture_cube_array` and sometimes `texture_depth_cube_array`

Texture Formats

Reference table

Loading Images into Textures

Example:

async function main() {
  const adapter = await navigator.gpu?.requestAdapter();
  const device = await adapter?.requestDevice();
  if (!device) {
    alert('need a browser that supports WebGPU');
    return;
  }

  const canvas = document.querySelector('canvas');
  const context = canvas.getContext('webgpu');
  const presentationFormat = navigator.gpu.getPreferredCanvasFormat();
  context.configure({
    device,
    format: presentationFormat,
  });

  const module = device.createShaderModule({
    label: 'hardcoded textured quad shaders',
    code: `
      struct VertexShaderOutput {
        @builtin(position) position: vec4f,
        @location(0) texcoord: vec2f,
      };

      @vertex fn vs(
        @builtin(vertex_index) vertexIndex : u32
      ) -> VertexShaderOutput {
        let pos = array(
          vec2f( 0.0,  0.0),  // center
          vec2f( 1.0,  0.0),  // right, center
          vec2f( 0.0,  1.0),  // center, top
          vec2f( 0.0,  1.0),  // center, top
          vec2f( 1.0,  0.0),  // right, center
          vec2f( 1.0,  1.0),  // right, top
        );

        var vsOutput: VertexShaderOutput;
        let xy = pos[vertexIndex];
        vsOutput.position = vec4f(xy, 0.0, 1.0);
        vsOutput.texcoord = xy;
        return vsOutput;
      }

      @group(0) @binding(0) var testSampler: sampler;
      @group(0) @binding(1) var testTexture: texture_2d<f32>;

      @fragment fn fs(fsInput: VertexShaderOutput) -> @location(0) vec4f {
        return textureSample(testTexture, testSampler, fsInput.texcoord);
      }
    `,
  });

  const pipeline = device.createRenderPipeline({
    label: 'hardcoded textured quad pipeline',
    layout: 'auto',
    vertex: {
      module,
      entryPoint: 'vs',
    },
    fragment: {
      module,
      entryPoint: 'fs',
      targets: [{ format: presentationFormat }],
    },
  });

  async function loadImageBitmap(url) {
    const res = await fetch(url);
    const blob = await res.blob();
    return await createImageBitmap(blob, { colorSpaceConversion: 'none' });
  }

  const url = 'https://webgpufundamentals.org/webgpu/resources/images/f-texture.png'; 
  const source = await loadImageBitmap(url);
  const texture = device.createTexture({
    label: url,
    format: 'rgba8unorm',
    size: [source.width, source.height],
    usage: GPUTextureUsage.TEXTURE_BINDING |
           GPUTextureUsage.COPY_DST |
           GPUTextureUsage.RENDER_ATTACHMENT,
  });
  device.queue.copyExternalImageToTexture(
    { source, flipY: true },
    { texture },
    { width: source.width, height: source.height },
  );

  const sampler = device.createSampler();

  const bindGroup = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [
      { binding: 0, resource: sampler },
      { binding: 1, resource: texture.createView() },
    ],
  });

  const renderPassDescriptor = {
    label: 'our basic canvas renderPass',
    colorAttachments: [
      {
        clearValue: [0.3, 0.3, 0.3, 1],
        loadOp: 'clear',
        storeOp: 'store',
      },
    ],
  };

  function render() {
    renderPassDescriptor.colorAttachments[0].view =
        context.getCurrentTexture().createView();

    const encoder = device.createCommandEncoder({
      label: 'render quad encoder',
    });
    const pass = encoder.beginRenderPass(renderPassDescriptor);
    pass.setPipeline(pipeline);
    pass.setBindGroup(0, bindGroup);
    pass.draw(6);
    pass.end();

    const commandBuffer = encoder.finish();
    device.queue.submit([commandBuffer]);
  }

  const observer = new ResizeObserver(entries => {
    for (const entry of entries) {
      const canvas = entry.target;
      const width = entry.contentBoxSize[0].inlineSize;
      const height = entry.contentBoxSize[0].blockSize;
      canvas.width = Math.max(1, Math.min(width, device.limits.maxTextureDimension2D));
      canvas.height = Math.max(1, Math.min(height, device.limits.maxTextureDimension2D));
      render();
    }
  });
  observer.observe(canvas);
}
main();

- Loads an image and gets an ImageBitmap from it
- Creates a texture with usage flags COPY_DST and RENDER_ATTACHMENT since they're required by copyExternalImageToTexture
- Use said function to copy the bitmap to the texture, the parameters are source, destination and size

Generating mips on the GPU

Changes to the mipmapFilter example:

const numMipLevels = (...sizes) => {
  const maxSize = Math.max(...sizes);
  return 1 + Math.log2(maxSize) | 0;
};

const texture = device.createTexture({
  mipLevelCount: numMipLevels(source.width, source.height),
  ...
});

const generateMips = (() => {
  let sampler;
  let module;
  const pipelineByFormat = {};

  return function generateMips(device, texture) {
    if (!module) {
      module = device.createShaderModule({
        label: 'textured quad shaders for mip level generation',
        code: `
          struct VSOutput {
            @builtin(position) position: vec4f,
            @location(0) texcoord: vec2f,
          };

          @vertex fn vs(
            @builtin(vertex_index) vertexIndex : u32
          ) -> VSOutput {
            let pos = array(
              vec2f( 0.0,  0.0),  // center
              vec2f( 1.0,  0.0),  // right, center
              vec2f( 0.0,  1.0),  // center, top
              vec2f( 0.0,  1.0),  // center, top
              vec2f( 1.0,  0.0),  // right, center
              vec2f( 1.0,  1.0),  // right, top
            );

            var vsOutput: VSOutput;
            let xy = pos[vertexIndex];
            vsOutput.position = vec4f(xy * 2.0 - 1.0, 0.0, 1.0);
            vsOutput.texcoord = vec2f(xy.x, 1.0 - xy.y);
            return vsOutput;
          }

          @group(0) @binding(0) var testSampler: sampler;
          @group(0) @binding(1) var testTexture: texture_2d<f32>;

          @fragment fn fs(fsInput: VSOutput) -> @location(0) vec4f {
            return textureSample(testTexture, testSampler, fsInput.texcoord);
          }
        `,
      });

      sampler = device.createSampler({
        minFilter: 'linear',
      });
    }

    if (!pipelineByFormat[texture.format]) {
      pipelineByFormat[texture.format] = device.createRenderPipeline({
        label: 'mip level generator pipeline',
        layout: 'auto',
        vertex: {
          module,
          entryPoint: 'vs',
        },
        fragment: {
          module,
          entryPoint: 'fs',
          targets: [{ format: texture.format }],
        },
      });
    }
    const pipeline = pipelineByFormat[texture.format];

    const encoder = device.createCommandEncoder({
      label: 'mip gen encoder',
    });

    let width = texture.width;
    let height = texture.height;
    let baseMipLevel = 0;
    while (width > 1 || height > 1) {
      width = Math.max(1, width / 2 | 0);
      height = Math.max(1, height / 2 | 0);

      const bindGroup = device.createBindGroup({
        layout: pipeline.getBindGroupLayout(0),
        entries: [
          { binding: 0, resource: sampler },
          { binding: 1, resource: texture.createView({baseMipLevel, mipLevelCount: 1}) },
        ],
      });

      ++baseMipLevel;

      const renderPassDescriptor = {
        label: 'basic canvas renderPass',
        colorAttachments: [
          {
            view: texture.createView({baseMipLevel, mipLevelCount: 1}),
            loadOp: 'clear',
            storeOp: 'store',
          },
        ],
      };

      const pass = encoder.beginRenderPass(renderPassDescriptor);
      pass.setPipeline(pipeline);
      pass.setBindGroup(0, bindGroup);
      pass.draw(6);
      pass.end();
    }

    device.queue.submit([encoder.finish()]);
  };
})();

function copySourceToTexture(device, texture, source, {flipY} = {}) {
  device.queue.copyExternalImageToTexture(
    { source, flipY, },
    { texture },
    { width: source.width, height: source.height },
  );

  if (texture.mipLevelCount > 1) {
    generateMips(device, texture);
  }
}

function createTextureFromSource(device, source, options = {}) {
  const texture = device.createTexture({
    format: 'rgba8unorm',
    mipLevelCount: options.mips ? numMipLevels(source.width, source.height) : 1,
    size: [source.width, source.height],
    usage: GPUTextureUsage.TEXTURE_BINDING |
           GPUTextureUsage.COPY_DST |
           GPUTextureUsage.RENDER_ATTACHMENT,
  });
  copySourceToTexture(device, texture, source, options);
  return texture;
}

async function createTextureFromImage(device, url, options) {
  const imgBitmap = await loadImageBitmap(url);
  return createTextureFromSource(device, imgBitmap, options);
}

const textures = await Promise.all([
  await createTextureFromImage(device,
      'resources/images/f-texture.png', {mips: true, flipY: false}),
  await createTextureFromImage(device,
      'resources/images/Granite_paving_tileable_512x512.jpeg', {mips: true}),
]);

...

- numMipLevels calculates the number of mip levels needed by using log2 on the the biggest size, which returns the amount of times it can be divided by 2, and adds 1 to that
- Uses that number for the mipLevelCount when creating the texture
- Creates a function to generate mips by drawing textured quads and using minFilter: 'linear'
- Creates a helper function that loads an image and calls another one that creates a texture with it and calls another one to fill it with the data
- Creates some textures using these functions

Loading Canvas

HTMLCanvasElement can be used to draw things in a 2d canvas, and then get the result in a texture in WGPU. It can have its benefits since the 2d canvas has a relatively high level API.

Example animation:

const size = 256;
const half = size / 2;

const ctx = document.createElement('canvas').getContext('2d');
ctx.canvas.width = size;
ctx.canvas.height = size;

const hsl = (h, s, l) => `hsl(${h * 360 | 0}, ${s * 100}%, ${l * 100 | 0}%)`;

function update2DCanvas(time) {
  time *= 0.0005;
  ctx.clearRect(0, 0, size, size);
  ctx.save();
  ctx.translate(half, half);
  const num = 20;
  for (let i = 0; i < num; ++i) {
    ctx.fillStyle = hsl(i / num * 0.2 + time * 0.1, 1, i % 2 * 0.5);
    ctx.fillRect(-half, -half, size, size);
    ctx.rotate(time * 0.5);
    ctx.scale(0.85, 0.85);
    ctx.translate(size / 16, 0);
  }
  ctx.restore();
}

function render(time) {
  update2DCanvas(time);
  requestAnimationFrame(render);
}
requestAnimationFrame(render);

document.body.appendChild(ctx.canvas);

Changes to the previous example to load that canvas into wgpu:

const texture = createTextureFromSource(device, ctx.canvas, {mips: true});
 
const textures = await Promise.all([
  texture,
]);

function render(time) {
  update2DCanvas(time);
  copySourceToTexture(device, texture, ctx.canvas);
   ...
  requestAnimationFrame(render);
}
requestAnimationFrame(render);

const observer = new ResizeObserver(entries => {
  for (const entry of entries) {
    const canvas = entry.target;
    const width = entry.contentBoxSize[0].inlineSize;
    const height = entry.contentBoxSize[0].blockSize;
    canvas.width = Math.max(1, Math.min(width, device.limits.maxTextureDimension2D));
    canvas.height = Math.max(1, Math.min(height, device.limits.maxTextureDimension2D));
  }
});
observer.observe(canvas);

Loading Video

Loading a video is similar but uses the <video> element.
The size properties on HTMLVideoElement are called videoWidth and videoHeight.

Updates to the previous code to handle that difference:

function getSourceSize(source) {
  return [
    source.videoWidth || source.width,
    source.videoHeight || source.height,
  ];
}

function copySourceToTexture(device, texture, source, {flipY} = {}) {
  device.queue.copyExternalImageToTexture(
    { source, flipY, },
    { texture },
    getSourceSize(source),
  );
  ...
}

function createTextureFromSource(device, source, options = {}) {
  const size = getSourceSize(source);
  const texture = device.createTexture({
    size,
    ...
  });
  ...
}

function startPlayingAndWaitForVideo(video) {
  return new Promise((resolve, reject) => {
    video.addEventListener('error', reject);
    if ('requestVideoFrameCallback' in video) {
      video.requestVideoFrameCallback(resolve);
    } else {
      const timeWatcher = () => {
        if (video.currentTime > 0) {
          resolve();
        } else {
          requestAnimationFrame(timeWatcher);
        }
      };
      timeWatcher();
    }
    video.play().catch(reject);
  });
}

function waitForClick() {
  return new Promise(resolve => {
    window.addEventListener(
      'click',
      () => {
        document.querySelector('#start').style.display = 'none';
        resolve();
      },
      { once: true });
  });
}

const video = document.createElement('video');
video.muted = true;
video.loop = true;
video.preload = 'auto';
video.src = 'resources/videos/example.webm';
await waitForClick();
await startPlayingAndWaitForVideo(video);

canvas.addEventListener('click', () => {
  if (video.paused) {
    video.play();
  } else {
    video.pause();
  }
});

const texture = createTextureFromSource(device, video, {mips: true});

function render() {
  copySourceToTexture(device, texture, video);
  ...
}

- A video needs to have started playing before it can be passed to wgpu,
this can be done with video.requestVideoFrameCallback in modern browsers,
and for a fallback for older ones wait for the time to advance

Optimization

Only update the video for each new frame but the texture at its own frame rate:

...
await startPlayingAndWaitForVideo(video);

let alwaysUpdateVideo = !('requestVideoFrameCallback' in video);
let haveNewVideoFrame = false;
if (!alwaysUpdateVideo) {
  function recordHaveNewFrame() {
    haveNewVideoFrame = true;
    video.requestVideoFrameCallback(recordHaveNewFrame);
  }
  video.requestVideoFrameCallback(recordHaveNewFrame);
}

function render() {
  if (alwaysUpdateVideo || haveNewVideoFrame) {
    haveNewVideoFrame = false;
    copySourceToTexture(device, texture, video);
  }
  ...
}

Texture Atlases

A texture atlas is an image containing multiple smaller images packed together to reduce overall dimensions. The texture coordinates can be used to select which parts go where.
This is alot more efficient than using multiple textures.

Example that creates a rotating cube and provides texture coordinates that select each portion of an image onto a specific face of the cube (3D stuff covered later):

import { mat4 } from 'https://wgpu-matrix.org/dist/2.x/wgpu-matrix.module.js';

function createCubeVertices() {
  const vertexData = new Float32Array([
    //  position   |  texture coordinate
    -1,  1,  1,        0   , 0  ,
    -1, -1,  1,        0   , 0.5,
     1,  1,  1,        0.25, 0  ,
     1, -1,  1,        0.25, 0.5,

     1,  1, -1,        0.25, 0  ,
     1,  1,  1,        0.5 , 0  ,
     1, -1, -1,        0.25, 0.5,
     1, -1,  1,        0.5 , 0.5,

     1,  1, -1,        0.5 , 0  ,
     1, -1, -1,        0.5 , 0.5,
    -1,  1, -1,        0.75, 0  ,
    -1, -1, -1,        0.75, 0.5,

    -1,  1,  1,        0   , 0.5,
    -1,  1, -1,        0.25, 0.5,
    -1, -1,  1,        0   , 1  ,
    -1, -1, -1,        0.25, 1  ,

     1, -1,  1,        0.25, 0.5,
    -1, -1,  1,        0.5 , 0.5,
     1, -1, -1,        0.25, 1  ,
    -1, -1, -1,        0.5 , 1  ,

    -1,  1,  1,        0.5 , 0.5,
     1,  1,  1,        0.75, 0.5,
    -1,  1, -1,        0.5 , 1  ,
     1,  1, -1,        0.75, 1  ,
  ]);

  const indexData = new Uint16Array([
     0,  1,  2,  2,  1,  3,
     4,  5,  6,  6,  5,  7,
     8,  9, 10, 10,  9, 11,
    12, 13, 14, 14, 13, 15,
    16, 17, 18, 18, 17, 19,
    20, 21, 22, 22, 21, 23,
  ]);

  return {
    vertexData,
    indexData,
    numVertices: indexData.length,
  };
}

async function main() {
  const adapter = await navigator.gpu?.requestAdapter();
  const device = await adapter?.requestDevice();
  if (!device) {
    alert('need a browser that supports WebGPU');
    return;
  }

  const canvas = document.querySelector('canvas');
  const context = canvas.getContext('webgpu');
  const presentationFormat = navigator.gpu.getPreferredCanvasFormat();
  context.configure({
    device,
    format: presentationFormat,
    alphaMode: 'premultiplied',
  });

  const module = device.createShaderModule({
    code: `
      struct Uniforms {
        matrix: mat4x4f,
      };

      struct Vertex {
        @location(0) position: vec4f,
        @location(1) texcoord: vec2f,
      };

      struct VSOutput {
        @builtin(position) position: vec4f,
        @location(0) texcoord: vec2f,
      };

      @group(0) @binding(0) var<uniform> uni: Uniforms;
      @group(0) @binding(1) var testSampler: sampler;
      @group(0) @binding(2) var testTexture: texture_2d<f32>;

      @vertex fn vs(vert: Vertex) -> VSOutput {
        var vsOut: VSOutput;
        vsOut.position = uni.matrix * vert.position;
        vsOut.texcoord = vert.texcoord;
        return vsOut;
      }

      @fragment fn fs(vsOut: VSOutput) -> @location(0) vec4f {
        return textureSample(testTexture, testSampler, vsOut.texcoord);
      }
    `,
  });

  const pipeline = device.createRenderPipeline({
    label: '2 attributes',
    layout: 'auto',
    vertex: {
      module,
      entryPoint: 'vs',
      buffers: [
        {
          arrayStride: (3 + 2) * 4, // (3+2) floats 4 bytes each
          attributes: [
            {shaderLocation: 0, offset: 0, format: 'float32x3'},  // position
            {shaderLocation: 1, offset: 12, format: 'float32x2'},  // texcoord
          ],
        },
      ],
    },
    fragment: {
      module,
      entryPoint: 'fs',
      targets: [{ format: presentationFormat }],
    },
    primitive: {
      cullMode: 'back',
    },
    depthStencil: {
      depthWriteEnabled: true,
      depthCompare: 'less',
      format: 'depth24plus',
    },
  });

  const numMipLevels = (...sizes) => {
    const maxSize = Math.max(...sizes);
    return 1 + Math.log2(maxSize) | 0;
  };

  async function loadImageBitmap(url) {
    const res = await fetch(url);
    const blob = await res.blob();
    return await createImageBitmap(blob, { colorSpaceConversion: 'none' });
  }

  function copySourceToTexture(device, texture, source, {flipY} = {}) {
    device.queue.copyExternalImageToTexture(
      { source, flipY, },
      { texture },
      { width: source.width, height: source.height },
    );

    if (texture.mipLevelCount > 1) {
      generateMips(device, texture);
    }
  }

  function createTextureFromSource(device, source, options = {}) {
    const texture = device.createTexture({
      format: 'rgba8unorm',
      mipLevelCount: options.mips ? numMipLevels(source.width, source.height) : 1,
      size: [source.width, source.height],
      usage: GPUTextureUsage.TEXTURE_BINDING |
             GPUTextureUsage.COPY_DST |
             GPUTextureUsage.RENDER_ATTACHMENT,
    });
    copySourceToTexture(device, texture, source, options);
    return texture;
  }

  const generateMips = (() => {
    let sampler;
    let module;
    const pipelineByFormat = {};

    return function generateMips(device, texture) {
      if (!module) {
        module = device.createShaderModule({
          label: 'textured quad shaders for mip level generation',
          code: `
            struct VSOutput {
              @builtin(position) position: vec4f,
              @location(0) texcoord: vec2f,
            };

            @vertex fn vs(
              @builtin(vertex_index) vertexIndex : u32
            ) -> VSOutput {
              let pos = array(

                vec2f( 0.0,  0.0),  // center
                vec2f( 1.0,  0.0),  // right, center
                vec2f( 0.0,  1.0),  // center, top

                // 2st triangle
                vec2f( 0.0,  1.0),  // center, top
                vec2f( 1.0,  0.0),  // right, center
                vec2f( 1.0,  1.0),  // right, top
              );

              var vsOutput: VSOutput;
              let xy = pos[vertexIndex];
              vsOutput.position = vec4f(xy * 2.0 - 1.0, 0.0, 1.0);
              vsOutput.texcoord = vec2f(xy.x, 1.0 - xy.y);
              return vsOutput;
            }

            @group(0) @binding(0) var testSampler: sampler;
            @group(0) @binding(1) var testTexture: texture_2d<f32>;

            @fragment fn fs(fsInput: VSOutput) -> @location(0) vec4f {
              return textureSample(testTexture, testSampler, fsInput.texcoord);
            }
          `,
        });

        sampler = device.createSampler({
          minFilter: 'linear',
          magFilter: 'linear',
        });
      }

      if (!pipelineByFormat[texture.format]) {
        pipelineByFormat[texture.format] = device.createRenderPipeline({
          label: 'mip level generator pipeline',
          layout: 'auto',
          vertex: {
            module,
            entryPoint: 'vs',
          },
          fragment: {
            module,
            entryPoint: 'fs',
            targets: [{ format: texture.format }],
          },
        });
      }
      const pipeline = pipelineByFormat[texture.format];

      const encoder = device.createCommandEncoder({
        label: 'mip gen encoder',
      });

      let width = texture.width;
      let height = texture.height;
      let baseMipLevel = 0;
      while (width > 1 || height > 1) {
        width = Math.max(1, width / 2 | 0);
        height = Math.max(1, height / 2 | 0);

        const bindGroup = device.createBindGroup({
          layout: pipeline.getBindGroupLayout(0),
          entries: [
            { binding: 0, resource: sampler },
            { binding: 1, resource: texture.createView({baseMipLevel, mipLevelCount: 1}) },
          ],
        });

        ++baseMipLevel;

        const renderPassDescriptor = {
          label: 'basic canvas renderPass',
          colorAttachments: [
            {
              view: texture.createView({baseMipLevel, mipLevelCount: 1}),
              loadOp: 'clear',
              storeOp: 'store',
            },
          ],
        };

        const pass = encoder.beginRenderPass(renderPassDescriptor);
        pass.setPipeline(pipeline);
        pass.setBindGroup(0, bindGroup);
        pass.draw(6);
        pass.end();
      }

      const commandBuffer = encoder.finish();
      device.queue.submit([commandBuffer]);
    };
  })();

  async function createTextureFromImage(device, url, options) {
    const imgBitmap = await loadImageBitmap(url);
    return createTextureFromSource(device, imgBitmap, options);
  }

  const texture = await createTextureFromImage(device,
      'https://webgpufundamentals.org/webgpu/resources/images/noodles.jpg', {mips: true, flipY: false}); 

  const sampler = device.createSampler({
    magFilter: 'linear',
    minFilter: 'linear',
    mipmapFilter: 'linear',
  });

  const uniformBufferSize = (16) * 4;
  const uniformBuffer = device.createBuffer({
    label: 'uniforms',
    size: uniformBufferSize,
    usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
  });

  const uniformValues = new Float32Array(uniformBufferSize / 4);
  const kMatrixOffset = 0;
  const matrixValue = uniformValues.subarray(kMatrixOffset, kMatrixOffset + 16);

  const { vertexData, indexData, numVertices } = createCubeVertices();
  const vertexBuffer = device.createBuffer({
    label: 'vertex buffer vertices',
    size: vertexData.byteLength,
    usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
  });
  device.queue.writeBuffer(vertexBuffer, 0, vertexData);

  const indexBuffer = device.createBuffer({
    label: 'index buffer',
    size: vertexData.byteLength,
    usage: GPUBufferUsage.INDEX | GPUBufferUsage.COPY_DST,
  });
  device.queue.writeBuffer(indexBuffer, 0, indexData);

  const bindGroup = device.createBindGroup({
    label: 'bind group for object',
    layout: pipeline.getBindGroupLayout(0),
    entries: [
      { binding: 0, resource: { buffer: uniformBuffer }},
      { binding: 1, resource: sampler },
      { binding: 2, resource: texture.createView() },
    ],
  });

  const renderPassDescriptor = {
    label: 'basic canvas renderPass',
    colorAttachments: [
      {
        loadOp: 'clear',
        storeOp: 'store',
      },
    ],
    depthStencilAttachment: {
      depthClearValue: 1.0,
      depthLoadOp: 'clear',
      depthStoreOp: 'store',
    },
  };

  const degToRad = d => d * Math.PI / 180;
  const rotation = [degToRad(20), degToRad(25), degToRad(0)];
  let depthTexture;

  function render(time) {
    canvas.width = canvas.clientWidth * window.devicePixelRatio;
    canvas.height = canvas.clientHeight * window.devicePixelRatio;
    const canvasTexture = context.getCurrentTexture();
    renderPassDescriptor.colorAttachments[0].view = canvasTexture.createView();
    time *= 0.0006;

    if (!depthTexture ||
        depthTexture.width !== canvasTexture.width ||
        depthTexture.height !== canvasTexture.height) {
      if (depthTexture) {
        depthTexture.destroy();
      }
      depthTexture = device.createTexture({
        size: [canvasTexture.width, canvasTexture.height],
        format: 'depth24plus',
        usage: GPUTextureUsage.RENDER_ATTACHMENT,
      });
    }
    renderPassDescriptor.depthStencilAttachment.view = depthTexture.createView();

    const encoder = device.createCommandEncoder();
    const pass = encoder.beginRenderPass(renderPassDescriptor);
    pass.setPipeline(pipeline);
    pass.setVertexBuffer(0, vertexBuffer);
    pass.setIndexBuffer(indexBuffer, 'uint16');

    const aspect = canvas.clientWidth / canvas.clientHeight;
    mat4.perspective(
        60 * Math.PI / 180,
        aspect,
        0.1,      // zNear
        10,      // zFar
        matrixValue,
    );
    const view = mat4.lookAt(
      [0, 1, 5],  // camera position
      [0, 0, 0],  // target
      [0, 1, 0],  // up
    );
    mat4.multiply(matrixValue, view, matrixValue);
    mat4.rotateX(matrixValue, rotation[0], matrixValue);
    mat4.rotateY(matrixValue, time * 0.5, matrixValue);
    mat4.rotateZ(matrixValue, rotation[2], matrixValue);

    device.queue.writeBuffer(uniformBuffer, 0, uniformValues);
    pass.setBindGroup(0, bindGroup);
    pass.drawIndexed(numVertices);

    pass.end();

    const commandBuffer = encoder.finish();
    device.queue.submit([commandBuffer]);

    requestAnimationFrame(render);
  }
  requestAnimationFrame(render);
  
}
main();

Using Video Efficiently

Beside copyExternalImageToTexture, which copies the current frame from the video itself into a pre-existing texture,
there's also importExternalTexture which provides a GPUExternalTexture that represents the data in the video directly, no copy made.

Note: the texture is only valid until the current javascript task is exited,
in the shader, texture_external has to be used,
and since external textures cant have a mipmap, textureSampleBaseClampToEdge has to be used which only samples the base texture mip level 0

Changes to the previous video example:

const module = device.createShaderModule({
  label: 'our hardcoded textured quad shaders',
  code: `
      ...
      @vertex fn vs(
        @builtin(vertex_index) vertexIndex : u32
      ) -> OurVertexShaderOutput {
        ...
        vsOutput.texcoord = xy;
        return vsOutput;
      }

      @group(0) @binding(0) var ourSampler: sampler;
      @group(0) @binding(1) var ourTexture: texture_external;

      @fragment fn fs(fsInput: OurVertexShaderOutput) -> @location(0) vec4f {
        return textureSampleBaseClampToEdge(
          ourTexture,
          ourSampler,
          fsInput.texcoord,
        );
      }
  `,
});

...
const sampler = device.createSampler({
  addressModeU: 'repeat',
  addressModeV: 'repeat',
  magFilter: (i & 1) ? 'linear' : 'nearest',
  minFilter: (i & 2) ? 'linear' : 'nearest',
});
...

function render() {
  ...
  const texture = device.importExternalTexture({source: video});

  objectInfos.forEach(({sampler, matrix, uniformBuffer, uniformValues}, i) => {
    const bindGroup = device.createBindGroup({
      layout: pipeline.getBindGroupLayout(0),
      entries: [
        { binding: 0, resource: sampler },
        { binding: 1, resource: texture },
        { binding: 2, resource: { buffer: uniformBuffer }},
      ],
    });
    ...

    pass.setBindGroup(0, bindGroup);
    pass.draw(6);
  });

  pass.end();
  ...
}

- In the shader, texture_2d<f32> becomes texture_external and textureSample becomes textureSampleBaseClampToEdge
- Remove the mipmapFilter setting from createSampler
- In the render function, instead of calling copySourceToTexture, call importExternalTexture and create the bindgroups there

Note: video is often delivered with the luminance part (brightness of each pixel) separate from the chroma part (color). A common way to separate and encode this is YUV, it also compresses better.
When using texture_external and textureSampleBaseClampToEdge in a shader, WGPU converts it to RGBA. If more flexibility is needed, this can be done manually by copying the data to an RGBA texture and using it as texture_2d<f32>. This method is slower though.

Cubemaps

A type of texture that consists of 6 faces of a cube and uses a normal (3D) direction instead of the traditional texture coordinates.

struct Uniforms {
  matrix: mat4x4f,
};
 
struct Vertex {
  @location(0) position: vec4f,
};
 
struct VSOutput {
  @builtin(position) position: vec4f,
  @location(0) normal: vec3f,
};

@group(0) @binding(0) var<uniform> uni: Uniforms;
@group(0) @binding(1) var ourSampler: sampler;
@group(0) @binding(2) var ourTexture: texture_cube<f32>;

@vertex fn vs(vert: Vertex) -> VSOutput {
  var vsOut: VSOutput;
  vsOut.position = uni.matrix * vert.position;
  vsOut.normal = normalize(vert.position.xyz);
  return vsOut;
}
 
@fragment fn fs(vsOut: VSOutput) -> @location(0) vec4f {
  return textureSample(ourTexture, ourSampler, normalize(vsOut.normal));
}

- Uses texture_cube instead of texture_2d in the fragment shader and therefore takes a vec3f direction in textureSample

function copySourcesToTexture(device, texture, sources, {flipY} = {}) {
  sources.forEach((source, layer) => {
    device.queue.copyExternalImageToTexture(
      { source, flipY, },
      { texture, origin: [0, 0, layer] },
      { width: source.width, height: source.height },
    );
  });
}

- Change the texture helper functions to take multiple sources, and set an origin to specify the layer when copying a source to the texture

const bindGroup = device.createBindGroup({
  label: 'bind group for object',
  layout: pipeline.getBindGroupLayout(0),
  entries: [
    { binding: 0, resource: { buffer: uniformBuffer }},
    { binding: 1, resource: sampler },
    { binding: 2, resource: texture.createView({dimension: 'cube'}) },
  ],
});

- In the bindgroup, set the view dimension to cube

Note: cubemaps are usually not used for cubes, the standard way to texture a cube is a texture atlas

Storage Textures

Its like a storage buffer that is used as 2d array,
some differences are that storage textures have format interpretation,
dimensions (buffer only has length),
and they can also still be used as normal texture in a different shader

Limitations:

- Only the formats r32float, r32sint, and r32uint can be read_write
- Only the texture formats rgba8(unorm/snorm/sint/uint), rgba16(float/sint/unit), rg32(float/sint/uint), and rgba32(float/sint/uint) can be used as storage textures
- Storage textures cannot use samplers

Canvas as a storage texture

Configure the context to provide a texture that can be used as storage texture

const presentationFormat = navigator.gpu.getPreferredCanvasFormat()
context.configure({
  device,
  format: presentationFormat,
  usage: GPUTextureUsage.TEXTURE_BINDING |
          GPUTextureUsage.STORAGE_BINDING,
});

To render to the texture via render pass, it would also use RENDER_ATTACHMENT

Since getPreferredCanvasFormat() will return either rgba8unorm or bgra8unorm depending on the user's system, and since bgra8unorm cant be used for storage textures, the feature bgra8unorm-storage needs to be enabled. It may not be available on some platforms though, so a check has to be done.

const adapter = await navigator.gpu?.requestAdapter();
const hasBGRA8unormStorage = adapter.features.has('bgra8unorm-storage');
const device = await adapter?.requestDevice({
  requiredFeatures: hasBGRA8unormStorage
    ? ['bgra8unorm-storage']
    : [],
});
if (!device) {
  alert('need a browser that supports WebGPU');
  return;
}

const canvas = document.querySelector('canvas');
const context = canvas.getContext('webgpu');
const presentationFormat = hasBGRA8unormStorage
   ? navigator.gpu.getPreferredCanvasFormat()
   : 'rgba8unorm';
context.configure({
  device,
  format: presentationFormat,
  usage: GPUTextureUsage.TEXTURE_BINDING |
         GPUTextureUsage.STORAGE_BINDING,
});

Multisampling (MSAA)

Anti-aliasing means trying to prevent the blocky problem when trying to draw a vector shape as discrete pixels. Multisampling is a more efficient way of rendering that uses a "multisample texture" and reduces the amount of fragment shader calls by writing a result to all the samples inside a selected area in one call instead of making a call for each pixel before resolving the result.
Another difference is that instead of checking for centers of pixels being inside an area it uses "sample positions" which are not in a grid so the process of resolving them is not bilinear filtering but up to the GPU.

WebGPU currently only supports a sample count of 4.
1 is the default and means the texture is not multisampled.

[]

The following is needed to use multisampling:
Pipeline set to render to a multisample texture, create a ms texture the same size as the final texture, set the render pass to render to the ms texture and resolve to the final texture (canvas)

Example:

Points of note

- Advance per vertex or per instance
- Which buffer type to use (uniform, storage, ...)
- Use normalized values to save space
-

Notes on WebGPU Fundamentals

Initialization, Configuration

Create shader module with a vertex shader and fragment shader function

Create Render Pipeline

Create Render Pass

Rendering

Computations

Canvas Resizing

Extra Checks

Inter-Stage Variables

Checkerboard

Splitted Modules

Interpolation Settings

Memory Layout

vec and mat types

Uniforms

Randomize variables

Split Uniforms

Storage Buffers

Instancing

Using storage buffers for vertex data

Vertex Buffers

Instancing

Using normalized values to save space

Index Buffers

Textures

magFilter

minFilter

mipmapFilter

Texture Types and Views

Texture Formats

Loading Images into Textures

Generating mips on the GPU

Loading Canvas

Loading Video

Optimization

Texture Atlases

Using Video Efficiently

Cubemaps

Storage Textures

Multisampling (MSAA)

More from Cili's NotesAll posts

The PEAN Stack with GraphQL

More from Cili's Notes
All posts