On-device email classification, explained

Most email clients in 2026 that ship “AI features” do them in the cloud. Your message is sent to a server, processed, and the result returned. This is fast, capable, and well-understood. It also means your email content visits a server you do not own.

STAMP does classification on-device. We do not send email content to anyone's server for processing, ours or otherwise. This post explains how and why.

What classification means in this context

Before we get to the implementation, the definition. Classification, for STAMP, is the act of looking at an incoming email and assigning it tags:

Reply needed / informational / auto-generated.
Urgent / ordinary / whenever.
VIP sender / known contact / cold outreach.
Frustrated / neutral.
Newsletter / receipt / personal / work.

Each of those is a small decision the model makes. Together they drive triage.

Why on-device

Three reasons.

One: privacy. Your investor's confidential thread does not need to visit a third-party server to be classified. The model can run on your Mac, and we never see the content.

Two: latency. A server round-trip is 50 to 300 ms depending on connection. On-device classification is sub-10 ms. For a triage layer that processes every incoming message, that adds up.

Three: offline. When your wifi drops, your inbox still works. Classification still runs. Tags still apply. We tested this on planes. It is nice.

What runs locally

A small transformer model. Quantized. Roughly 80 MB on disk. Loads into memory at app launch. Runs on Apple Silicon's Neural Engine when available, falls back to CPU on Intel.

Inference time per message: 4 to 8 milliseconds. Batch processing 200 messages at once: about 1.2 seconds. Cold-start hit on the very first message after launch: under a second.

This is a small, narrow model. It is not GPT-4. It does one job (classify email into our tag set) and does it fast. The narrowness is the feature.

The training data question

You will reasonably ask: what was the model trained on?

We trained on a synthetic dataset of about 4 million labeled emails. Emails were generated, not collected. We used a combination of public corpora (Enron, after redaction), generated examples, and human-labeled samples from consenting employees.

We did not train on user email. We did not collect user email. We will never train on user email without explicit per-user opt-in, paid, with a clear data agreement. We have no plans to ask.

How tags improve over time

If we do not train on your email, how does the model learn that your CFO is VIP?

Three layers.

Layer one: the base model. Out of the box, the model classifies the average inbox correctly about 89 percent of the time. Tags like “newsletter” and “reply needed” are heuristic enough to be near-universal.

Layer two: local fine-tuning. When you press U to mark a thread urgent, or V to mark a sender VIP, STAMP records the correction. Locally. We use those corrections to fine-tune a small adapter that runs on top of the base model. The adapter is yours. It never leaves your machine.

Layer three: rule-based overrides. Some classifications you want to be deterministic. “Anything from this domain is always VIP” is a rule, not a model decision. STAMP supports rules for the cases where the model would be too soft.

Together, these get most users to about 96 to 98 percent accuracy on their personal triage within two weeks.

The tradeoffs

We are honest about the costs.

Smaller model than cloud. A 4-billion-parameter local model is not a 200-billion-parameter cloud model. We give up some sophistication. For pure classification, that gap is small. For tasks like “summarize this thread,” the gap is real, which is why we do not ship a one-click summary.

Storage. The model adds 80 MB to the app. Most users do not notice. A few do.

Cold start on Intel Macs. Apple Silicon is fast. Intel CPUs without the Neural Engine are slower. Classification on a 2018 Intel MacBook Pro is about 25 ms per message instead of 5. Still fast enough to be invisible.

Updates. When we improve the base model, we ship a new version with the app update. Users on old app versions get older model behavior until they update. We try to keep updates monthly.

What we do not do on-device

To be precise about the privacy posture: classification runs on-device. Other things happen in the normal way.

Email transport. SMTP and IMAP and OAuth still happen. We do not run an email server. Your mail still passes through Google, Microsoft, or your provider as usual. We do not change that.
Sync state. When you mark a message read on your phone, that state syncs through your provider, like always.
Crash reports. If STAMP crashes, we collect a stack trace. The trace contains no email content. We have audited this carefully.

What we do not do, anywhere, in any pipeline:

Send email content to our servers.
Send email content to any third-party service for processing.
Train models on your email without explicit opt-in.

The argument for cloud AI

To be fair to the other side: cloud-side AI has real advantages. Bigger models, faster updates, more capabilities (summarization, draft generation, semantic search). If those features are decisive for you, a cloud-AI client is the right choice.

We picked the privacy bet. The tradeoff is real. We think it is the right one for a primary email client.

Privacy is not a feature. It is the absence of a problem.

How to verify the claims

We are an early-access product. You should not take privacy claims on faith.

Our binary is signed and notarized. You can audit network traffic with Little Snitch. STAMP makes no outbound calls during classification.
We will publish a transparency report annually. The first one comes out in 2027.
We are committing to a third-party audit by an independent security firm in 2026 H2.

If we ever change the privacy posture, it will be in a release note in 18-point bold and an email to every user before the change ships.

Where to go from here

For the broader argument, why your email client should not read your email. For the protocol layer, IMAP, SMTP, OAuth — what actually happens when you connect an account.

Privacy by architecture. hello@stamp.email

privacymltechnical

On-device email classification, explained

What classification means in this context

Why on-device

What runs locally

The training data question

How tags improve over time

The tradeoffs

What we do not do on-device

The argument for cloud AI

How to verify the claims

Where to go from here

More from the privacy pile.

Why your email client shouldn't read your email

IMAP, SMTP, OAuth — what actually happens when you connect an account

Email privacy in 2026 — what to ask before you trust a client

STAMP is the email client built for this essay.