I ran my own LLM on a GPU. Cold starts killed the experience. Here's the real tradeoff between quality, speed, and money when you self-host inference.