Welcome to the AI Terminal

This blog is a space for going deep on AI/ML — the engineering, the research, and everything in between. If you’re here, you probably care about how things actually work under the hood, not just the hype cycle.

Who Is This For

This is going to get geeky. If you’re looking for high-level LinkedIn thought leadership about how “AI will change everything,” you’re in the wrong terminal.

This blog is for technical people — engineers, researchers, and practitioners who want to understand the details. People who read papers, write training loops, debug CUDA kernels, and argue about attention mechanisms.

Expect content that goes both wide and deep:

Engineering — building production ML systems, training infrastructure, serving at scale, the unglamorous plumbing that makes models actually work
Research — reading, implementing, and dissecting papers, because understanding the math matters when your gradient norms explode at 3 AM
The intersection — turning research ideas into real systems, and the hard trade-offs that come with it

Whether you’re a seasoned ML engineer or someone ramping up and wanting to go beyond the tutorials, there should be something here for you.

Written by a Human, Enhanced by LLM

Let’s talk about how this blog gets written.

Every post starts with me — my experience, my opinions, my mistakes. I use LLMs as a writing tool the same way I use them for code: to accelerate, to refine, to catch what I miss. But the ideas, the technical depth, and the perspective are mine.

This is not LLM slop. You won’t find generic “10 Things You Need to Know About Transformers” content here. If I’m writing about something, it’s because I’ve actually built it, broken it, or spent too many hours debugging it. Authenticity over polish.

The bar is simple: every post should teach you something you couldn’t get from a quick ChatGPT prompt.

Code-First

I believe the best way to understand ML is to look at real code, run real commands, and work through real math. Posts here will be code-first — not pseudocode hand-waving, but actual implementations you can read, run, and learn from.

Here’s the kind of thing you’ll see. A PyTorch implementation of scaled dot-product attention:

import torch
import torch.nn.functional as F

def scaled_dot_product_attention(Q, K, V, mask=None):
    """Compute scaled dot-product attention."""
    d_k = Q.size(-1)
    scores = torch.matmul(Q, K.transpose(-2, -1)) / torch.sqrt(
        torch.tensor(d_k, dtype=torch.float32)
    )
    if mask is not None:
        scores = scores.masked_fill(mask == 0, float("-inf"))
    weights = F.softmax(scores, dim=-1)
    return torch.matmul(weights, V), weights

Real training commands, not toy examples:

torchrun --nproc_per_node=4 train.py \
  --model_name_or_path meta-llama/Llama-3-8b \
  --dataset my_dataset \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --output_dir ./checkpoints

And the math that makes it all work. The attention formula behind that code:

$$ \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V $$

The cross-entropy loss that drives language model training:

$$ \mathcal{L} = -\sum_{i=1}^{N} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c}) $$

Inline math works too — learning rate $\alpha = 2 \times 10^{-5}$, batch size $B = 32$, gradient accumulation steps to simulate larger effective batches.

Code, commands, and math. That’s the language we’ll be speaking here.

What’s Coming

Topics in the pipeline include LoRA fine-tuning from scratch, KV cache optimization, building evaluation harnesses for LLMs, and deep-dives into recent papers. Stay tuned.