Safe
Deterministic, lossless-to-meaning rules: strip frontmatter, badges, HTML comments, dead tables of contents, tracking params, shell prompts in code fences. Nothing that changes what the document says.
mdcompress strips the parts a model doesn't need — badges, boilerplate, hedging prose, dead tables of contents, repeated context — from your READMEs and agent-context files, into a hidden token-optimized mirror. 35 deterministic rules, an optional LLM rewriter, and a faithfulness audit.
Every time you hand a model a README or a docs tree, you pay for the badges, the table of contents, the “it is worth noting that” padding — tokens that carry no meaning the model can use.
Agent-context files like CLAUDE.md and AGENTS.md are worse: they're re-sent on every call, so the same bloat is billed over and over, all session long.
mdcompress produces a meaning-preserving mirror that's cheaper to read — without you hand-editing a single doc.
Pick how aggressive to be. Tiers stack: each includes everything below it. The default, Tier 2, is deterministic except where you opt into the LLM rewriter.
Deterministic, lossless-to-meaning rules: strip frontmatter, badges, HTML comments, dead tables of contents, tracking params, shell prompts in code fences. Nothing that changes what the document says.
Adds prose-simplification and cross-file work: strip hedging phrases and admonition prefixes, drop benchmark narration, factor repeated paragraphs and code blocks across the repo into back-references. The default.
Section-level rewriting with a language model, each section gated by a faithfulness audit (a separate model answers questions about the original and the rewrite; rewrites that drift are rejected). CLI-only.
Most of the work is plain, predictable text transformation — no model required, no surprises. Rules run in a fixed sequence; four of the boldest ship opt-in even when their tier is active.
The full rule list lives in the README.
Numbers from a 20-repo benchmark corpus on Tier 2. mdcompress is strongest on marketing-heavy READMEs, repeated agent context, and generated command output — and honest about where it isn't.
It helps less on dense technical reference where most tokens are code, API names, or tables — expect single-digit savings there, and check per-rule diffs before enabling aggressive rules broadly. Full methodology in BENCHMARKS.md, or run the live benchmarks.
mdcompress's single-document rules are compiled to WebAssembly and shipped into this site's Lab — paste a README, pick a tier, and watch the tokens drop, with a per-rule breakdown. Your text never leaves the tab.
The browser build runs the single-document rules. Cross-file deduplication and the Tier-3 LLM rewriter need the CLI — that's what the quickstart below is for.
Install the CLI, set it up once per repo, and compress. Nothing leaves your machine; the originals are never mutated — output goes to a hidden mirror.
# one-line installercurl -fsSL \ https://raw.githubusercontent.com/\dhruv1794/mdcompress/main/install.sh | sh # or with the Go toolchaingo install \ github.com/dhruv1794/mdcompress/\cmd/mdcompress@latest
# set up config in any repomdcompress init # compress every tracked .mdmdcompress run --all # see cumulative savingsmdcompress status
version: 1tier: aggressiverules: disabled: - dedup-cross-section - collapse-example-outputeval: backend: ollama model: llama3.1:8b threshold: 0.95
mdcompress is v3.2: the deterministic engine is the core, but it reaches further when you want it to.
Expose compression to an AI agent over the Model Context Protocol — the agent compresses context on demand instead of re-sending bloated docs.
Tier-3 section rewriting guarded by a faithfulness check (default threshold 0.95) so aggressive prose edits can’t silently change meaning.
Register your own rules against the same fixed-order pipeline — the 35 built-in rules are just the defaults.
`mdcompress web` serves an interactive test page with per-rule diffs, token/byte stats, and a cost estimate — all on localhost.
MIT licensed. Runs in your browser or your shell — nothing uploaded.