Mume AI

DeepSeek V4 Flash

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and high-throughput workloads, while maintaining strong reasoning and coding performance. The model includes hybrid attention for efficient long-context processing. Reasoning efforts `high` and `xhigh` are supported; `xhigh` maps to max reasoning. It is well suited for applications such as coding assistants, chat systems, and agent workflows where responsiveness and cost efficiency are important.

by Deepseek|1M context|$0.14/M input tokens|$0.28/M output tokens

Endpoints

Available providers for this model, with details on pricing, context limits, and real-time health metrics.

No explicit endpoints reported for this model.