llamafile 0.10.0 재구축, Qwen3.5, lfm2, Anthropic API

Hi all, I'm Peter at Staff Engineer and Mozilla.ai and I want to share our idea for a standard for shared agent learning, conceptually it seemed to fit easily in my mental model as a Stack Overflow for agents. The project is trying to see if we can get agents (any agent, any model) to propose 'knowledge units' (KUs) as a standard schema based on gotchas it runs into during use, and proactively query for existing KUs in order to get insights which it can verify and confirm if they prove useful. It's currently very much a PoC with a more lofty proposal in the repo, we're trying to iterate from local use, up to team level, and ideally eventually have some kind of public commons. At the team level (see our Docker compose example) and your coding agent configured to point to the API address for the team to send KUs there instead - where they can be reviewed by a human in the loop (HITL) via a UI in the browser, before they're allowed to appear in queries by other agents in your team. We're learning a lot even from using it locally on various repos internally, not just in the kind of KUs it generates, but also from a UX perspective on trying to make it easy to get using it and approving KUs in the browser dashboard. There are bigger, complex problems to solve in the future around data privacy, governance etc. but for now we're super focussed on getting something that people can see some value from really quickly in their day-to-day. Tech stack: * Skills - markdown * Local Python MCP server (FastMCP) - managing a local SQLite knowledge store * Optional team API (FastAPI, Docker) for sharing knowledge across an org * Installs as a Claude Code plugin or OpenCode MCP server * Local-first by default; your knowledge stays on your machine unless you opt into team sync by setting the address in config * OSS (Apache 2.0 licensed) Here's an example of something which seemed straight forward, when asking Claude Code to write a GitHub action it often used actions that were multiple major versions out of date because of its training data. In this case I told the agent what I saw when I reviewed the GitHub action YAML file it created and it proposed the knowledge unit to be persisted. Next time in a completely different repo using OpenCode and an OpenAI model, the cq skill was used up front before it started the task and it got the information about the gotcha on major versions in training data and checked GitHub proactively, using the correct, latest major versions. It then confirmed the KU, increasing the confidence score. I guess some folks might say: well there's a CLAUDE.md in your repo, or in ~/.claude/ but we're looking further than that, we want this to be available to all agents, to all models, and maybe more importantly we don't want to stuff AGENTS.md or CLAUDE.md with loads of rules that lead to unpredictable behaviour, this is targetted information on a particular task and seems a lot more useful. Right now it can be installed locally as a plugin for Claude Code and OpenCode: claude plugin marketplace add mozilla-ai/cq claude plugin install cq This allows you to capture data in your local ~/.cq/local.db (the data doesn't get sent anywhere else). We'd love feedback on this, the repo is open and public - so GitHub issues are welcome. We've posted on some of our social media platforms with a link to the blog post (below) so feel free to reply to us if you found it useful, or ran into friction, we want to make this something that's accessible to everyone. Blog post with the full story: https://blog.mozilla.ai/cq-stack-overflow-for-agents/ GitHub repo: https://github.com/mozilla-ai/cq Thanks again for your time.

API 플랫폼
MacOS
대형 언어 모델

Mar 19, 2026웹사이트 방문

✨ AI 요약

llamafile 0.10.0은 Qwen3.5 및 lfm2 모델 지원과 Anthropic API 통합을 추가한 재구성된 릴리스입니다. 대규모 언어 모델과 그 의존성을 단일 실행 파일로 패키징합니다.

주요 기능

향상된 성능과 안정성을 위한 재구축된 아키텍처
Qwen3.5 및 LFM2 언어 모델 통합
Anthropic API 호환성 지원
쉬운 배포를 위한 단일 파일 배포

사용 사례

개발자가 여러 설치를 관리하지 않고 로컬에서 다양한 AI 모델을 테스트하려고 합니다. llamafile을 다운로드하고 실행 파일에서 Qwen3.5 같은 다양한 모델을 직접 실행하며, 프로토타입 애플리케이션의 출력을 비교합니다.
연구원이 민감한 데이터 분석을 위해 언어 모델을 오프라인에서 실행해야 합니다. llamafile을 사용하여 로컬 머신에서 모델을 실행하며, 데이터가 보안 환경을 벗어나지 않도록 하면서도 고급 AI 기능을 활용합니다.
교육자가 인터넷 접속이 제한된 교실에서 학생들을 위한 인터랙티브 AI 데모를 만듭니다. 학생들이 실습 워크숍에서 자신의 노트북에서 실행할 수 있는 사전 구성된 모델이 포함된 llamafile 실행 파일을 배포합니다.

원본 출처

Hacker News 토론→