Signal — Five Links. Every Tuesday. No Noise.

Vol. IV · No. 08

Tuesday, 25 Feb 2026

Data Science Edition

Five links. Every Tuesday. No noise.

For the data scientist who reads the paper, not the tweet about it.

LINK 01 / ISSUE #08

arxiv · cs.LG

"Flash Attention 3: Fast and Accurate Attention with Asynchrony and Low-precision"

The headline number is 75% faster on H100s, but the more interesting detail is the asynchrony model — they pipeline softmax and matmul across warp groups so neither unit sits idle. If you're running long-context workloads and haven't upgraded your attention kernel since 2023, this is the one that will make you feel the gap.

Tri Dao, Daniel Y. Fu·4 min read

From Issue #08 ↗

Read Last Tuesday's IssueNo signup required. Read the full issue free.

Five LinksEvery TuesdayNo NoiseCurated by HandVol. IV · No. 08arxiv · GitHub · SubstackFor Data ScientistsNot Another FeedFive LinksEvery TuesdayNo NoiseCurated by HandVol. IV · No. 08arxiv · GitHub · SubstackFor Data ScientistsNot Another Feed

Issue #08 · Feb 25, 2026

This week's five. Typeset exactly as they land in your inbox.

GitHub · pytorch/ao·Tooling

torchao: PyTorch Architecture Optimization

Meta quietly shipped the quantization library that makes INT8 inference actually ergonomic. Two lines to go from fp32 to int8 dynamic quant with no accuracy drop on standard benchmarks — and unlike bitsandbytes, it doesn't require a custom CUDA build. The repo has been starred 3k times in two weeks. Your inference costs care.

PyTorch Core Team·6 min + repo

Blog · Eugene Yan·Systems

Patterns for Building LLM-Based Systems & Products

The most grounded post on production LLM systems published this year. Yan catalogues evals, fallbacks, guardrails, and caching patterns from actual shipped products — not demos. Section 4 on "model cascades" alone is worth the read. Drop this in your team's Notion before Thursday's architecture review.

Eugene Yan·18 min read

arxiv · stat.ML·Research

Scaling Laws for Reward Model Overoptimization in RLHF

If you're fine-tuning anything with human feedback, this paper explains why your reward model eventually degrades — and gives you the KL-divergence budget math to predict when. The proxy-gaming curve in Figure 3 is the most useful single chart for anyone doing RLHF at scale. This one's been in my "re-read" folder for three weeks.

Gao, Biderman, et al.·12 min read

“Link five is always the wildcard — the one that has nothing to do with your current sprint and everything to do with where the field is going in eighteen months. That's the one readers forward.”

From the Editor

A Tuesday in February

How five links weave through a real workday.

7:14 AM

Morning

Inbox

SIGNAL · Vol. IV · No. 08

The notification lands.

You're still on your first coffee. Signal drops at 7 AM every Tuesday — same time, every week, no exceptions. The subject line is always just the date. You open it because you already know the format: five links, two sentences each, zero padding.

Code on a dark monitor screen with terminal window

7:16 AM

Research

Link 01

arxiv · cs.LG · 4 min

You read the first annotation.

Flash Attention 3. The two sentences tell you exactly why it matters for your H100 workload — not the paper abstract, not a summary, the actual implication for the code you shipped last sprint. You tab it open. You'll finish it before standup.

Slack or messaging app on laptop screen in office setting

11:40 AM

Shared

Link 03

Blog · Eugene Yan · 18 min

You drop it into #ml-eng.

"Has everyone seen this? Section 4 on model cascades." Three replies in four minutes. Your tech lead adds it to the architecture doc. The annotation you forwarded is better than anything the team would have written themselves — because it was written for someone exactly like you.

Terminal showing git clone command on dark background

3:00 PM

Shipped

Link 04

GitHub · pytorch/ao · 6 min

The repo is already cloned.

torchao. You ran the two-line INT8 quant on your inference pipeline during lunch. 31% cost reduction on your staging environment. You didn't find it on Hacker News — it was buried in a weekend GitHub trending list that you'd stopped checking six months ago. Signal found it for you.

The algorithm gives you everything.
Signal gives you five.

Your Tuesday feed

01r/MachineLearning post with 847 upvotes and no context

02arxiv alert: 23 new papers matching "attention mechanism"

03HN thread: "Ask HN: What ML papers should I read?"

04Twitter/X thread: "🧵 10 papers every ML engineer should read"

05Medium post: "Top 15 Data Science Tools for 2026"

06LinkedIn: "Exciting to share my journey into AI..."

07YouTube: "GPT-5 Explained in 47 minutes"

08Newsletter with 12 links and no annotations

09Substack: "The future of AI is..." (paywalled)

10Discord: "Has anyone tried the new Llama fine-tune?"

+ 847 more items today

Signal · Issue #08

Flash Attention 3 — 75% faster on H100s. Here's why your inference pipeline should care this sprint.

arxiv · cs.LG

torchao ships INT8 quant in two lines. No custom CUDA build. Your inference costs will feel it.

GitHub · pytorch/ao

Eugene Yan's production LLM patterns. Section 4 on model cascades is the one for your architecture doc.

Blog · Eugene Yan

RLHF reward model overoptimization — the KL-divergence budget math you need before scaling.

arxiv · stat.ML

The wildcard: a 2019 database paper that every ML engineer building feature stores is about to rediscover.

VLDB · Systems

Curated by a human. Annotated for your context.

5Links per issue

104Issues published

2 minTo read the whole thing

Issue #08 · Available Now

Read before
Thursday's standup.

The full issue is free. Five annotated links, each one chosen because it changes something about how you work this week. No email required to read it.

Read Last Tuesday's Issue

Opens in a hosted archive. No paywall. No signup wall. Just the dispatch.

Get Next Tuesday's Issue

Vol. IV · No. 09 drops in 6 days