Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for image input. It delivers significant performance across a broad range of visual tasks.
by Qwen|8K context|$0.21/M input tokens|$0.63/M output tokens
Endpoints
Available providers for this model, with details on pricing, context limits, and real-time health metrics.