App developers can now deploy private, local, offline AI models in their mobile apps, achieving up to 150 tokens/sec and <50ms time to first token. Cactus is used by 3k+ developers and completes 500k+ weekly inference tasks on phones today. It is open-source! Check out the repo: https://github.com/cactus-compute/cactus