Mume AI

MiMo-V2-Omni

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities, 256K context window.

by Xiaomi|262K context|$0.40/M input tokens|$2.00/M output tokens

Endpoints

Available providers for this model, with details on pricing, context limits, and real-time health metrics.

No explicit endpoints reported for this model.