A deterministic middleware to compress LLM prompts by 50-80%

Hi HN, I’m working on Skillware, an open-source framework that treats AI capabilities as installable, self-contained modules. I just added a "Prompt Token Rewriter" skill. It’s an offline heuristic middleware that strips conversational filler and redundant context from long agentic loops before they hit the LLM. It saves significant token costs and inference time, and it's 100% deterministic (no extra model calls). We're building a registry of "Agentic Know-How" (Logic + Cognition + Governance). If you have a specialized tool for LLMs or want to see what a "standard" skill looks like, I'd love your feedback or a PR: https://github.com/ARPAHLS/skillware

AI Agent
Integrations
LLM

Mar 21, 2026Visit website

✨ AI Summary

Skillware is an open-source framework that offers a "Prompt Token Rewriter" skill. This middleware deterministically compresses LLM prompts by removing conversational filler and redundant context, reducing token costs and inference time.

Best For

Developers building LLM applications, AI engineers optimizing inference costs, Users of agentic AI loops

Why It Matters

Skillware's Prompt Token Rewriter deterministically reduces LLM prompt size by 50-80%, saving costs and speeding up inference without additional model calls.

Key Features

Compresses LLM prompts by 50-80%
Strips conversational filler and redundant context
Operates as offline heuristic middleware
Reduces token costs and inference time

Use Cases

A developer building a complex AI agent for customer support can integrate Skillware's Prompt Token Rewriter to reduce the token count of each turn in a long conversation, thereby lowering API costs and speeding up response times for customers.
A data scientist experimenting with LLM-based text summarization can use the Prompt Token Rewriter to pre-process lengthy documents, removing extraneous conversational elements before feeding them to the summarization model, leading to more focused and efficient analysis.
An AI researcher developing an autonomous agent that requires multiple internal reasoning steps can leverage the deterministic prompt compression to ensure that intermediate thoughts and context passed between agent modules do not inflate the final prompt sent to the LLM, maintaining control over computational resources.

A deterministic middleware to compress LLM prompts by 50-80%

✨ AI Summary

Key Features

Use Cases

Original Sources