SmolLM2-135M Distilled

85MB LLM running entirely in your browser. Distilled from SmolLM2-1.7B.

Initializing...

Don't expect an accurate of coherent chatbot!

This isn't a useable LLM, but rather, an experiment in (attempted) b1.58 quantization and distillation of an already incredibly small model. Unsurprisingly, there is not much redundancy to work with when starting from a 135M model. But at ~85MB and ~40 tokens per second via WASM in the browser, it's an incredibly small and efficient proof-of-concept.