Clippy – screen-aware voice AI in the browser

✨ AI 요약

A friend and I built a browser prototype that answers questions about whatever’s on your screen using getDisplayMedia, client-side wake-word detection, and server-side multimodal inference. Hard parts: – Getting the model to point to specific UI elements – Keeping it coherent across multi-step workflows (“Help me create a sword in Tinkercad”) – Preventing the infinite mirror effect and confusion between window vs full-screen sharing – Keeping voice → screenshot → inference → voice latency low enough to feel conversational We packaged it as “Clippy” for fun, but the real experiment is letting a model tool-call fresh screenshots to help it gather more context. One practical use case is remote tech support — I'm sending this to my mom next time she calls instead of screen sharing. Curious what breaks.

추천 대상

개발자, 제품 팀, 기술 중심 창업자.

중요한 이유

A friend and I built a browser prototype that answers questions about whatever’s on your screen using getDisplayMedia, client-side wake-word detection, and server-side multimodal inference. Hard parts: – Getting the model to point to specific UI elements – Keeping it coherent across multi-step workflows (“Help me create a sword in Tinkercad”) – Preventing the infinite mirror effect and confusion between window vs full-screen sharing – Keeping voice → screenshot → inference → voice latency low enough to feel conversational We packaged it as “Clippy” for fun, but the real experiment is letting a model tool-call fresh screenshots to help it gather more context. One practical use case is remote tech support — I'm sending this to my mom next time she calls instead of screen sharing. Curious what breaks.

주요 기능

A friend and I built a browser prototype that answers questions about whatever’s on your screen using getDisplayMedia, client-side wake-word detection, and server-side multimodal inference.
Hard parts: – Getting the model to point to specific UI elements – Keeping it coherent across multi-step workflows (“Help me create a sword in Tinkercad”) – Preventing the infinite mirror effect and confusion between window vs full-screen sharing – Keeping voice → screenshot → inference → voice latency low enough to feel conversational We packaged it as “Clippy” for fun, but the real experiment is letting a model tool-call fresh screenshots to help it gather more context.
One practical use case is remote tech support — I'm sending this to my mom next time she calls instead of screen sharing.
Curious what breaks.

사용 사례

Review original launch sources before making adoption decisions.
Track community momentum from Product Hunt, GitHub, and Hacker News.

원본 출처

Hacker News 토론→