Featured

From Theory to Practice: Quantization and Dequantization Made Simple
Quantization transforms floating-point values (‘float32’) into lower-precision formats, such as …

The Simple Path to PyTorch Graphs: Dynamo and AOT Autograd Explained
Graph acquisition in PyTorch refers to the process of creating and managing the computational graph …
All Stories
Understanding Triton Kernels from First Principles
A deep dive into how Triton kernels work, explained from absolute basics to complete understanding. …
Under the Hood: How PyTorch Chooses Attention Kernels and Why It Matters for Performance
A deep dive into PyTorch’s attention kernel selection and what each choice means for your …
Breaking Down Vision Transformers: A Code-Driven Explanation
In this article, I’ll break down the layers of a ViT step by step with code snippets, and a …
Turn 3D Gaussian Splat Files into Stunning Assets in Unity 6
This guide walks you through the process of loading splat files in Unity 6 using the Gaussian …
Intel GPU Scheduling: Exploring Matrix Addition with SYCL and PyTorch
If you’ve ever worked with GPUs, you know how crucial it is to understand how they manage workloads. …
HLSL Ray Tracing: Crafting Realistic Scenes in Unity, One Ray at a Time
Instead of just slapping textures on polygons, ray tracing lets us simulate how light interacts with …





