Rudrite Research

Rudrite Research https://research.rudrite.com/ Interactive, animated, visual explainers of landmark AI & ML papers — the frontier, made legible. en-us Tue, 09 Jun 2026 00:00:00 GMT Scaling Laws for Neural Language Models — interactive visual explainer | Rudrite Research https://research.rudrite.com/scaling-laws https://research.rudrite.com/scaling-laws Loss falls as a clean power law in size, data, and compute — and tells you how to spend the budget. A free, interactive, animated walkthrough of Scaling Laws… Tue, 09 Jun 2026 00:00:00 GMT Adam: A Method for Stochastic Optimization — interactive visual explainer | Rudrite Research https://research.rudrite.com/adam https://research.rudrite.com/adam A per-parameter adaptive learning rate from two moving averages of the gradient. A free, interactive, animated walkthrough of Adam: A Method for Stochastic… Tue, 09 Jun 2026 00:00:00 GMT Deep Residual Learning for Image Recognition — interactive visual explainer | Rudrite Research https://research.rudrite.com/resnet https://research.rudrite.com/resnet Add the input back — the identity skip that made 152-layer nets trainable. A free, interactive, animated walkthrough of Deep Residual Learning for Image… Tue, 09 Jun 2026 00:00:00 GMT Denoising Diffusion Probabilistic Models — interactive visual explainer | Rudrite Research https://research.rudrite.com/ddpm https://research.rudrite.com/ddpm Add noise to an image, then learn the reverse — the recipe behind modern diffusion. A free, interactive, animated walkthrough of Denoising Diffusion… Tue, 09 Jun 2026 00:00:00 GMT Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity — interactive visual explainer | Rudrite Research https://research.rudrite.com/switch-transformers https://research.rudrite.com/switch-transformers Send each token to a single expert — and scale a model to a trillion parameters. A free, interactive, animated walkthrough of Switch Transformers: Scaling to… Tue, 09 Jun 2026 00:00:00 GMT LoRA: Low-Rank Adaptation of Large Language Models — interactive visual explainer | Rudrite Research https://research.rudrite.com/lora https://research.rudrite.com/lora Freeze the model, learn its change as two skinny matrices — 10,000× fewer trainable weights, zero added latency. A free, interactive, animated walkthrough of… Tue, 09 Jun 2026 00:00:00 GMT GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism — interactive visual explainer | Rudrite Research https://research.rudrite.com/gpipe https://research.rudrite.com/gpipe Split a giant model across chips and pipeline micro-batches to keep them all busy A free, interactive, animated walkthrough of GPipe: Efficient Training of… Tue, 09 Jun 2026 00:00:00 GMT GSPMD: General and Scalable Parallelization for ML Computation Graphs — interactive visual explainer | Rudrite Research https://research.rudrite.com/gspmd https://research.rudrite.com/gspmd Annotate a few tensors; the compiler shards the trillion-parameter rest. A free, interactive, animated walkthrough of GSPMD: General and Scalable… Tue, 09 Jun 2026 00:00:00 GMT Pathways: Asynchronous Distributed Dataflow for ML — interactive visual explainer | Rudrite Research https://research.rudrite.com/pathways https://research.rudrite.com/pathways One controller, thousands of accelerators — parallel dispatch makes single-controller ML as fast as SPMD. A free, interactive, animated walkthrough of… Tue, 09 Jun 2026 00:00:00 GMT Ring Attention with Blockwise Transformers for Near-Infinite Context — interactive visual explainer | Rudrite Research https://research.rudrite.com/ring-attention https://research.rudrite.com/ring-attention Shard one sequence across a ring of devices, rotate the KV blocks — context scales with device count. A free, interactive, animated walkthrough of Ring… Tue, 09 Jun 2026 00:00:00 GMT Efficiently Scaling Transformer Inference — interactive visual explainer | Rudrite Research https://research.rudrite.com/scaling-inference https://research.rudrite.com/scaling-inference Chop a 540B model across a TPU pod: 29ms/token, 76% MFU, 32x longer context A free, interactive, animated walkthrough of Efficiently Scaling Transformer… Tue, 09 Jun 2026 00:00:00 GMT Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving — interactive visual explainer | Rudrite Research https://research.rudrite.com/mooncake https://research.rudrite.com/mooncake Schedule the KV cache, not the GPU: disaggregated prefill/decode serving that survives overload. A free, interactive, animated walkthrough of Mooncake: A… Tue, 09 Jun 2026 00:00:00 GMT Fast Inference from Transformers via Speculative Decoding — interactive visual explainer | Rudrite Research https://research.rudrite.com/speculative-decoding https://research.rudrite.com/speculative-decoding A small model guesses ahead, the big one verifies in parallel — same output, 2–3× faster. A free, interactive, animated walkthrough of Fast Inference from… Tue, 09 Jun 2026 00:00:00 GMT Chain-of-Thought Prompting Elicits Reasoning in Large Language Models — interactive visual explainer | Rudrite Research https://research.rudrite.com/chain-of-thought https://research.rudrite.com/chain-of-thought Add worked examples to the prompt — and reasoning emerges in big models, no training A free, interactive, animated walkthrough of Chain-of-Thought Prompting… Tue, 09 Jun 2026 00:00:00 GMT Training language models to follow instructions with human feedback — interactive visual explainer | Rudrite Research https://research.rudrite.com/instructgpt https://research.rudrite.com/instructgpt RLHF: align GPT-3 from human feedback — a 1.3B model beats the 175B on preference A free, interactive, animated walkthrough of Training language models to… Tue, 09 Jun 2026 00:00:00 GMT Direct Preference Optimization: Your Language Model is Secretly a Reward Model — interactive visual explainer | Rudrite Research https://research.rudrite.com/dpo https://research.rudrite.com/dpo Skip the reward model and the RL — one cross-entropy loss aligns the policy directly from preferences. A free, interactive, animated walkthrough of Direct… Tue, 09 Jun 2026 00:00:00 GMT DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models — interactive visual explainer | Rudrite Research https://research.rudrite.com/deepseekmath https://research.rudrite.com/deepseekmath A 7B open model hits 51.7% on MATH — by web-mining 120B math tokens and inventing GRPO. A free, interactive, animated walkthrough of DeepSeekMath: Pushing the… Tue, 09 Jun 2026 00:00:00 GMT Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters — interactive visual explainer | Rudrite Research https://research.rudrite.com/test-time-compute https://research.rudrite.com/test-time-compute Think longer on hard prompts — and let difficulty decide how to spend the compute. A free, interactive, animated walkthrough of Scaling LLM Test-Time Compute… Tue, 09 Jun 2026 00:00:00 GMT Constitutional AI: Harmlessness from AI Feedback — interactive visual explainer | Rudrite Research https://research.rudrite.com/constitutional-ai https://research.rudrite.com/constitutional-ai Train a harmless, non-evasive assistant from a written constitution — zero human harm labels. A free, interactive, animated walkthrough of Constitutional AI:… Tue, 09 Jun 2026 00:00:00 GMT DAPO: An Open-Source LLM Reinforcement Learning System at Scale — interactive visual explainer | Rudrite Research https://research.rudrite.com/dapo https://research.rudrite.com/dapo Four named techniques turn DeepSeek-style RL into a reproducible run to AIME 50. A free, interactive, animated walkthrough of DAPO: An Open-Source LLM… Tue, 09 Jun 2026 00:00:00 GMT Tree of Thoughts: Deliberate Problem Solving with Large Language Models — interactive visual explainer | Rudrite Research https://research.rudrite.com/tree-of-thoughts https://research.rudrite.com/tree-of-thoughts Wrap a frozen GPT-4 in tree search — branch, self-evaluate, prune. Game of 24: 4% to 74%. A free, interactive, animated walkthrough of Tree of Thoughts:… Tue, 09 Jun 2026 00:00:00 GMT ReAct: Synergizing Reasoning and Acting in Language Models — interactive visual explainer | Rudrite Research https://research.rudrite.com/react https://research.rudrite.com/react A frozen LLM that thinks, acts, and reads results in one loop — the blueprint for every agent. A free, interactive, animated walkthrough of ReAct: Synergizing… Tue, 09 Jun 2026 00:00:00 GMT FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision — interactive visual explainer | Rudrite Research https://research.rudrite.com/flash-attention-3 https://research.rudrite.com/flash-attention-3 Rebuilds attention for Hopper — async warps + FP8 — for 740 TFLOPs/s, 1.5-2.0x over FA-2. A free, interactive, animated walkthrough of FlashAttention-3: Fast… Tue, 09 Jun 2026 00:00:00 GMT Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality — interactive visual explainer | Rudrite Research https://research.rudrite.com/mamba-2 https://research.rudrite.com/mamba-2 Selective SSMs and masked attention are one structured matrix, computed two ways. A free, interactive, animated walkthrough of Transformers are SSMs:… Tue, 09 Jun 2026 00:00:00 GMT DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model — interactive visual explainer | Rudrite Research https://research.rudrite.com/deepseek-v2 https://research.rudrite.com/deepseek-v2 236B MoE, 21B active per token — MLA folds the whole KV cache into one latent vector A free, interactive, animated walkthrough of DeepSeek-V2: A Strong,… Tue, 09 Jun 2026 00:00:00 GMT EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty — interactive visual explainer | Rudrite Research https://research.rudrite.com/eagle https://research.rudrite.com/eagle Draft one layer down: autoregress on features, not tokens — 2.7–3.5× faster, losslessly. A free, interactive, animated walkthrough of EAGLE: Speculative… Tue, 09 Jun 2026 00:00:00 GMT AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration — interactive visual explainer | Rudrite Research https://research.rudrite.com/awq https://research.rudrite.com/awq Find the 1% of weights that matter by watching activations, then scale to protect them at INT4. A free, interactive, animated walkthrough of AWQ:… Tue, 09 Jun 2026 00:00:00 GMT RoFormer: Enhanced Transformer with Rotary Position Embedding — interactive visual explainer | Rudrite Research https://research.rudrite.com/rope https://research.rudrite.com/rope Encode position by rotating Q and K, so attention sees only the relative offset m−n. A free, interactive, animated walkthrough of RoFormer: Enhanced… Tue, 09 Jun 2026 00:00:00 GMT An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale — interactive visual explainer | Rudrite Research https://research.rudrite.com/vision-transformer https://research.rudrite.com/vision-transformer Cut an image into 16×16 patches, call each a word, feed a plain Transformer. A free, interactive, animated walkthrough of An Image is Worth 16x16 Words:… Tue, 09 Jun 2026 00:00:00 GMT Learning Transferable Visual Models From Natural Language Supervision — interactive visual explainer | Rudrite Research https://research.rudrite.com/clip https://research.rudrite.com/clip Match captions to images, and you get a classifier for any concept you can name. A free, interactive, animated walkthrough of Learning Transferable Visual… Tue, 09 Jun 2026 00:00:00 GMT High-Resolution Image Synthesis with Latent Diffusion Models — interactive visual explainer | Rudrite Research https://research.rudrite.com/latent-diffusion https://research.rudrite.com/latent-diffusion Move diffusion into a compact latent space — cheaper, and the architecture behind Stable Diffusion. A free, interactive, animated walkthrough of… Tue, 09 Jun 2026 00:00:00 GMT Scalable Diffusion Models with Transformers — interactive visual explainer | Rudrite Research https://research.rudrite.com/dit https://research.rudrite.com/dit Drop the U-Net: a plain transformer on latent patches whose quality scales with Gflops. A free, interactive, animated walkthrough of Scalable Diffusion Models… Tue, 09 Jun 2026 00:00:00 GMT Robust Speech Recognition via Large-Scale Weak Supervision — interactive visual explainer | Rudrite Research https://research.rudrite.com/whisper https://research.rudrite.com/whisper 680k hours of weak supervision → one Transformer that transcribes the real world, zero-shot A free, interactive, animated walkthrough of Robust Speech… Tue, 09 Jun 2026 00:00:00 GMT Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention — interactive visual explainer | Rudrite Research https://research.rudrite.com/native-sparse-attention https://research.rudrite.com/native-sparse-attention Trainable, hardware-aligned sparse attention: 3 gated branches, 11.6x decode, beats dense A free, interactive, animated walkthrough of Native Sparse… Tue, 09 Jun 2026 00:00:00 GMT Group Sequence Policy Optimization — interactive visual explainer | Rudrite Research https://research.rudrite.com/gspo https://research.rudrite.com/gspo Reward lands on the whole sequence — so the importance ratio should too, not per token A free, interactive, animated walkthrough of Group Sequence Policy… Tue, 09 Jun 2026 00:00:00 GMT DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving — interactive visual explainer | Rudrite Research https://research.rudrite.com/distserve https://research.rudrite.com/distserve Split a request's timeline into prefill and decode GPU pools — 4.48x more requests under SLO. A free, interactive, animated walkthrough of DistServe:… Tue, 09 Jun 2026 00:00:00 GMT CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion — interactive visual explainer | Rudrite Research https://research.rudrite.com/cacheblend https://research.rudrite.com/cacheblend Reuse every retrieved chunk's KV cache anywhere, then recompute the ~15% of tokens that stitch cross-attention back. A free, interactive, animated walkthrough… Tue, 09 Jun 2026 00:00:00 GMT GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding — interactive visual explainer | Rudrite Research https://research.rudrite.com/gshard https://research.rudrite.com/gshard Top-2 experts per token + an SPMD compiler: a 600B model trained in 4 days. A free, interactive, animated walkthrough of GShard: Scaling Giant Models with… Tue, 09 Jun 2026 00:00:00 GMT GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints — interactive visual explainer | Rudrite Research https://research.rudrite.com/gqa https://research.rudrite.com/gqa One dial from MQA to MHA — near-MHA quality at near-MQA decode speed, retrofitted cheaply. A free, interactive, animated walkthrough of GQA: Training… Tue, 09 Jun 2026 00:00:00 GMT YaRN: Efficient Context Window Extension of Large Language Models — interactive visual explainer | Rudrite Research https://research.rudrite.com/yarn https://research.rudrite.com/yarn Extend a RoPE model to 128k by reshaping frequencies per wavelength — for a tenth of the tuning A free, interactive, animated walkthrough of YaRN: Efficient… Tue, 09 Jun 2026 00:00:00 GMT Efficient Streaming Language Models with Attention Sinks — interactive visual explainer | Rudrite Research https://research.rudrite.com/streaming-llm https://research.rudrite.com/streaming-llm Pin 4 "attention-sink" tokens + a rolling window — stream 4M tokens, no fine-tuning. A free, interactive, animated walkthrough of Efficient Streaming Language… Tue, 09 Jun 2026 00:00:00 GMT Generative Adversarial Networks — interactive visual explainer | Rudrite Research https://research.rudrite.com/gan https://research.rudrite.com/gan Two networks duel — a forger and a detective — until the fakes pass for real. A free, interactive, animated walkthrough of Generative Adversarial Networks —… Tue, 09 Jun 2026 00:00:00 GMT Segment Anything — interactive visual explainer | Rudrite Research https://research.rudrite.com/segment-anything https://research.rudrite.com/segment-anything Point at anything, get a clean mask back in milliseconds — segmentation as a foundation model. A free, interactive, animated walkthrough of Segment Anything —… Tue, 09 Jun 2026 00:00:00 GMT Visual Instruction Tuning — interactive visual explainer | Rudrite Research https://research.rudrite.com/llava https://research.rudrite.com/llava A blind GPT-4 writes the lessons; one matrix turns sight into tokens — the open VLM template. A free, interactive, animated walkthrough of Visual Instruction… Tue, 09 Jun 2026 00:00:00 GMT PPO vs DPO vs GRPO — what's the difference? | Rudrite Research https://research.rudrite.com/compare/ppo-vs-dpo-vs-grpo https://research.rudrite.com/compare/ppo-vs-dpo-vs-grpo Three ways to turn preferences into a better policy — a full RL loop, a single classification loss, or group-relative RL without a critic. A clear,… Tue, 09 Jun 2026 00:00:00 GMT MHA vs GQA vs MLA — what's the difference? | Rudrite Research https://research.rudrite.com/compare/mha-vs-gqa-vs-mla https://research.rudrite.com/compare/mha-vs-gqa-vs-mla Three points on the attention-memory curve — how much of the KV cache you keep decides how long a context you can afford to serve. A clear, side-by-side… Tue, 09 Jun 2026 00:00:00 GMT GAN vs VAE vs Diffusion — what's the difference? | Rudrite Research https://research.rudrite.com/compare/gan-vs-vae-vs-diffusion https://research.rudrite.com/compare/gan-vs-vae-vs-diffusion Three ways to learn a distribution and sample from it — an adversarial game, a probabilistic autoencoder, and an iterative denoiser. A clear, side-by-side… Tue, 09 Jun 2026 00:00:00 GMT FlashAttention vs FlashAttention-3 — what's the difference? | Rudrite Research https://research.rudrite.com/compare/flashattention-vs-flashattention-3 https://research.rudrite.com/compare/flashattention-vs-flashattention-3 The same exact-attention algorithm, rebuilt for a new generation of GPU — IO-aware tiling, then Hopper-era asynchrony and FP8. A clear, side-by-side… Tue, 09 Jun 2026 00:00:00 GMT Transformers vs Mamba — what's the difference? | Rudrite Research https://research.rudrite.com/compare/transformers-vs-mamba https://research.rudrite.com/compare/transformers-vs-mamba All-pairs attention versus a selective state-space recurrence — quadratic recall against linear-time throughput. A clear, side-by-side comparison with examples. Mon, 08 Jun 2026 00:00:00 GMT FlashAttention vs PagedAttention — what's the difference? | Rudrite Research https://research.rudrite.com/compare/flashattention-vs-pagedattention https://research.rudrite.com/compare/flashattention-vs-pagedattention Two attention optimizations that solve different problems — and are used together, not instead of each other. A clear, side-by-side comparison with examples. Mon, 08 Jun 2026 00:00:00 GMT Dense vs Mixture-of-Experts — what's the difference? | Rudrite Research https://research.rudrite.com/compare/dense-vs-mixture-of-experts https://research.rudrite.com/compare/dense-vs-mixture-of-experts Activate every parameter for every token, or route each token to a few of many experts. A clear, side-by-side comparison with examples. Mon, 08 Jun 2026 00:00:00 GMT DeepSeek-V3 — interactive visual explainer | Rudrite Research https://research.rudrite.com/deepseek-v3 https://research.rudrite.com/deepseek-v3 A 671B mixture-of-experts that activates only 37B — via latent-KV attention and loss-free routing. A free, interactive, animated walkthrough of DeepSeek-V3 —… Sun, 07 Jun 2026 00:00:00 GMT Qwen3 — interactive visual explainer | Rudrite Research https://research.rudrite.com/qwen3 https://research.rudrite.com/qwen3 One family, dense and MoE — with a unified thinking / non-thinking switch. A free, interactive, animated walkthrough of Qwen3 — Qwen Team, 2025. Sun, 07 Jun 2026 00:00:00 GMT OLMo 2 — interactive visual explainer | Rudrite Research https://research.rudrite.com/olmo-2 https://research.rudrite.com/olmo-2 A fully-open model, stabilized by moving the norms to the output and clamping QK. A free, interactive, animated walkthrough of OLMo 2 — Ai2, 2025. Sun, 07 Jun 2026 00:00:00 GMT MiniMax-01 — interactive visual explainer | Rudrite Research https://research.rudrite.com/minimax-01 https://research.rudrite.com/minimax-01 Near-linear attention at 456B — lightning attention, with a softmax layer every eighth block. A free, interactive, animated walkthrough of MiniMax-01 —… Sun, 07 Jun 2026 00:00:00 GMT Gemma 4 — interactive visual explainer | Rudrite Research https://research.rudrite.com/gemma-4 https://research.rudrite.com/gemma-4 Five sizes, one design — interleaved local/global sliding-window attention, now with MoE. A free, interactive, animated walkthrough of Gemma 4 — Google… Sun, 07 Jun 2026 00:00:00 GMT Attention Is All You Need — interactive visual explainer | Rudrite Research https://research.rudrite.com/attention https://research.rudrite.com/attention The 2017 paper behind every LLM you use — watch attention decide what matters. A free, interactive, animated walkthrough of Attention Is All You Need —… Fri, 05 Jun 2026 00:00:00 GMT FlashAttention — interactive visual explainer | Rudrite Research https://research.rudrite.com/flash-attention https://research.rudrite.com/flash-attention Exact attention, made fast by never writing the big matrix to memory. A free, interactive, animated walkthrough of FlashAttention — Dao et al., NeurIPS 2022. Fri, 05 Jun 2026 00:00:00 GMT PagedAttention (vLLM) — interactive visual explainer | Rudrite Research https://research.rudrite.com/paged-attention https://research.rudrite.com/paged-attention Serve far more requests by paging the KV cache like an operating system. A free, interactive, animated walkthrough of PagedAttention (vLLM) — Kwon et al.,… Fri, 05 Jun 2026 00:00:00 GMT Megatron-LM — interactive visual explainer | Rudrite Research https://research.rudrite.com/megatron-lm https://research.rudrite.com/megatron-lm Split a model across GPUs along the matrix — and train billions of parameters. A free, interactive, animated walkthrough of Megatron-LM — Shoeybi et al.,… Fri, 05 Jun 2026 00:00:00 GMT DeepSeek-R1 — interactive visual explainer | Rudrite Research https://research.rudrite.com/deepseek-r1 https://research.rudrite.com/deepseek-r1 Reasoning that emerges from reinforcement learning, not imitation. A free, interactive, animated walkthrough of DeepSeek-R1 — DeepSeek-AI, 2025. Fri, 05 Jun 2026 00:00:00 GMT GPT-3: Language Models are Few-Shot Learners — interactive visual explainer | Rudrite Research https://research.rudrite.com/gpt-3 https://research.rudrite.com/gpt-3 Scale a language model until it learns new tasks from a few examples. A free, interactive, animated walkthrough of GPT-3: Language Models are Few-Shot… Fri, 05 Jun 2026 00:00:00 GMT ZeRO: Zero Redundancy Optimizer — interactive visual explainer | Rudrite Research https://research.rudrite.com/zero https://research.rudrite.com/zero Partition a model across GPUs instead of replicating it — and train toward a trillion parameters. A free, interactive, animated walkthrough of ZeRO: Zero… Fri, 05 Jun 2026 00:00:00 GMT Mixtral of Experts — interactive visual explainer | Rudrite Research https://research.rudrite.com/mixtral https://research.rudrite.com/mixtral Grow capacity without growing per-token cost — route each token to two of eight experts. A free, interactive, animated walkthrough of Mixtral of Experts —… Fri, 05 Jun 2026 00:00:00 GMT Training Compute-Optimal Large Language Models — interactive visual explainer | Rudrite Research https://research.rudrite.com/chinchilla https://research.rudrite.com/chinchilla Given a fixed compute budget, double the model and double the data — in equal proportion. A free, interactive, animated walkthrough of Training… Fri, 05 Jun 2026 00:00:00 GMT Mamba: Linear-Time Sequence Modeling with Selective State Spaces — interactive visual explainer | Rudrite Research https://research.rudrite.com/mamba https://research.rudrite.com/mamba Let a state-space model read what it's reading — and a recurrence outruns attention. A free, interactive, animated walkthrough of Mamba: Linear-Time Sequence… Fri, 05 Jun 2026 00:00:00 GMT BERT: Pre-training of Deep Bidirectional Transformers — interactive visual explainer | Rudrite Research https://research.rudrite.com/bert https://research.rudrite.com/bert Read the whole sentence at once — pre-train by filling in the blanks, then fine-tune anywhere. A free, interactive, animated walkthrough of BERT: Pre-training… Fri, 05 Jun 2026 00:00:00 GMT