Needle: We Distilled Gemini Tool Calling into a 26M Model

  • Hacker News
  • Published: May 12, 2026
  • First seen: May 12, 2026

Product Summary

Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the little effort made towards building agentic models that run on b...

Best for

Primary discovery source is Hacker News. / A public GitHub repo is available for direct technical review.

Why it matters

Primary discovery source is Hacker News.

Key Features

  • Primary public product URL is https://github.com/cactus-compute/needle.
  • Description: Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the litt....
  • GitHub repository is linked as cactus-compute/needle.
  • Listed on Hacker News as "Needle: We Distilled Gemini Tool Calling into a 26M Model".
  • Source description: Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the litt....

Use Cases

  • Primary discovery source is Hacker News.
  • A public GitHub repo is available for direct technical review.
  • Hacker News mention is recent (2026-05-12).
  • Primary public product URL is https://github.com/cactus-compute/needle.
  • Description: Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the litt....

Why Now

Needle: We Distilled Gemini Tool Calling into a 26M Model is appearing on fresh discovery surfaces, so it is worth reviewing while momentum is still forming. Confidence is currently medium (49/100), so treat this as an early signal rather than a settled trend.

Intelligence Breakdown

Facts

  • Listed on Hacker News as "Needle: We Distilled Gemini Tool Calling into a 26M Model".
  • Source description: Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the litt....
  • Source publish date is 2026-05-12.
  • Description: Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the litt....
  • GitHub repository is linked as cactus-compute/needle.
  • Primary public product URL is https://github.com/cactus-compute/needle.

Signals

  • Hacker News mention is recent (2026-05-12).
  • A public GitHub repo is available for direct technical review.
  • Primary discovery source is Hacker News.

Inference

  • Public code access can lower evaluation friction for developer audiences.

Unknowns

  • Documentation is not explicitly linked in the current allowed evidence set.
  • No tagline is stored on the current product record.
  • Pricing details are not explicitly linked in the current allowed evidence set.
  • Recent changelog or release history is not explicitly linked in the current allowed evidence set.
  • Release cadence cannot be confirmed unless a changelog or release link is explicitly provided.

Evidence Snapshots

Needle: We Distilled Gemini Tool Calling into a 26M Model

Listed on Hacker News as "Needle: We Distilled Gemini Tool Calling into a 26M Model".

Needle: We Distilled Gemini Tool Calling into a 26M Model GitHub repository

GitHub repository is linked as cactus-compute/needle.

Needle: We Distilled Gemini Tool Calling into a 26M Model official profile

Primary public product URL is https://github.com/cactus-compute/needle.

Original Sources