Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the blog post here.
by Inception|128K context|$0.25/M input tokens|$1.00/M output tokens
Endpoints
Available providers for this model, with details on pricing, context limits, and real-time health metrics.