I ran a language model on a PS2
The Emotion Engine has 32 MB of RAM total, so the trick is streaming weights from CD-ROM one matrix at a time during the forward pass — only activations, KV cache and embeddings live in RAM. This means models bigger than the RAM can still run, they just read more from disc. Had to build a custom quantized format (PSNT), hack endianness, write a tokenizer pipeline, and most of the PS2 SDK from scratch (releasing that separately). The model itself is also custom — a 10M param Llama-style architecture I trained specifically for this. And it works. On real hardware.
- Assistant de Recherche
- Création de contenu
- Génération de Code
✨ Résumé IA
This project demonstrates running a 10M parameter Llama-style language model on a PlayStation 2 by streaming weights from the CD-ROM due to the console's limited RAM. It involved creating a custom quantized format, modifying the PS2 SDK, and developing a custom tokenizer.
Idéal pour
Retro computing enthusiasts, Embedded systems developers, AI researchers interested in resource-constrained environments
Pourquoi c'est important
Enables running modern AI models on severely limited hardware through innovative data streaming and custom software development.
Fonctionnalités clés
- Runs a 10M parameter Llama-style language model on a PlayStation 2.
- Streams model weights from CD-ROM to overcome 32MB RAM limitation.
- Utilizes a custom quantized format (PSNT) for model weights.
- Includes a custom tokenizer pipeline.
Cas d'usage
- A retro-computing enthusiast could use this to experiment with AI on vintage hardware, showcasing the capabilities of older systems for modern tasks.
- A game developer specializing in retro-style games might integrate this LLM into a PlayStation 2 title to generate dynamic in-game dialogue or lore, adding a unique interactive element.
- A student learning about AI model optimization could study the techniques used to quantize and stream model weights, applying these principles to resource-constrained embedded systems.