Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the little effort made towards building agentic models that run on b...
Best for
Primary discovery source is Hacker News. / A public GitHub repo is available for direct technical review.
Why it matters
Primary discovery source is Hacker News.
Key Features
Primary public product URL is https://github.com/cactus-compute/needle.
Description: Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the litt....
GitHub repository is linked as cactus-compute/needle.
Listed on Hacker News as "Needle: We Distilled Gemini Tool Calling into a 26M Model".
Source description: Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the litt....
Use Cases
Primary discovery source is Hacker News.
A public GitHub repo is available for direct technical review.
Hacker News mention is recent (2026-05-12).
Primary public product URL is https://github.com/cactus-compute/needle.
Description: Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the litt....
Why Now
Needle: We Distilled Gemini Tool Calling into a 26M Model is appearing on fresh discovery surfaces, so it is worth reviewing while momentum is still forming. Confidence is currently medium (49/100), so treat this as an early signal rather than a settled trend.
Intelligence Breakdown
Facts
Listed on Hacker News as "Needle: We Distilled Gemini Tool Calling into a 26M Model".
Source description: Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the litt....
Source publish date is 2026-05-12.
Description: Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the litt....
GitHub repository is linked as cactus-compute/needle.
Primary public product URL is https://github.com/cactus-compute/needle.
Signals
Hacker News mention is recent (2026-05-12).
A public GitHub repo is available for direct technical review.
Primary discovery source is Hacker News.
Inference
Public code access can lower evaluation friction for developer audiences.
Unknowns
Documentation is not explicitly linked in the current allowed evidence set.
No tagline is stored on the current product record.
Pricing details are not explicitly linked in the current allowed evidence set.
Recent changelog or release history is not explicitly linked in the current allowed evidence set.
Release cadence cannot be confirmed unless a changelog or release link is explicitly provided.
Evidence Snapshots
Needle: We Distilled Gemini Tool Calling into a 26M Model
Listed on Hacker News as "Needle: We Distilled Gemini Tool Calling into a 26M Model".
Needle: We Distilled Gemini Tool Calling into a 26M Model GitHub repository
GitHub repository is linked as cactus-compute/needle.
Needle: We Distilled Gemini Tool Calling into a 26M Model official profile
Primary public product URL is https://github.com/cactus-compute/needle.
Hey HN, I've been working on a multi-model database called NodeDB. Originally, i've found out the idea of SurrealDB quite good. However, it doesn't have some graph and vector features that I need. And since it is just a KV wrapper, instead of purpose-built engine, the performa...
Hey HN! We're Charles and Dean. A few weeks ago we posted about Stage, a code review tool that guides you through reading a PR step by step - https://news.ycombinator.com/item?id=47796818 . We got a lot of great feedback but also heard from many people that they wanted to have...
Idempotency4j is a Java idempotency library with pluggable storage backends and Spring Web / Spring Boot support. This library solves the problem of ensuring that sensitive endpoints do not trigger side-effects multiple times - this is especially useful for any endpoints that...
Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves. I'm Ben Cochran, I spent 20+ years in the trenches with full-stack Engineering, DevOps, high performance computing & ML with stints at NVIDIA, AM...
Hi HN, We made OSS Claude Cowork, built as an OpenClaw plugin. It lets you create live artifacts (like Claude) that connect to datasources instead of datasets. (eg: fetching Stripe data automatically) Other tools(Paperclip, Multica) focus on task management but our vision is t...