AI Models — Browse 500+ Models

GPT-4o Mini Transcribe

GPT-4o Mini Transcribe is OpenAI's smaller, cost-efficient speech-to-text model built on GPT-4o Mini audio capabilities. It's priced per token (input and output), making it suitable for high-volume transcription workflows that benefit from token-level billing transparency at a lower cost point.

by Openai|128K Context|$1.25/M In|$5.00/M Out

Grok 4.3

Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual accuracy. Reasoning is always active and cannot be disabled or configured by effort level. It supports a 1 million token context window with no output token limit, making it well-suited for long-document analysis, deep research, and multi-step agentic tasks. Pricing is tiered: requests exceeding 200k total tokens are billed at a higher rate.

by X-ai|1M Context|$1.25/M In|$2.50/M Out

Granite 4.1 8B

Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-token context window and is designed for enterprise tasks including tool calling, retrieval-augmented generation (RAG), code generation with fill-in-the-middle support, text summarization, classification, and extraction. The model handles 12 languages (English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese) and implements OpenAI-compatible tool calling. Released under the Apache 2.0 license.

by Ibm-granite|131K Context|$0.05/M In|$0.10/M Out

Mistral Medium 3.5

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex multi-step reasoning. It is particularly strong at reliable multi-tool calling and long-horizon tasks, with a 256K context window, configurable reasoning effort per request, and a custom vision encoder that handles variable image sizes and aspect ratios. Self-hostable on as few as four GPUs and available under open weights.

by Mistralai|262K Context|$1.50/M In|$7.50/M Out

GPT-4o Transcribe

GPT-4o Transcribe is OpenAI's high-quality speech-to-text model built on GPT-4o audio capabilities. It's priced per token (input and output), making it suitable for workflows that benefit from token-level billing transparency.

by Openai|128K Context|$2.50/M In|$10.00/M Out

Qwen3.5 Plus 2026-04-20

Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces text output, with a 1M token context window. This is an updated version of Qwen3.5 Plus with tiered pricing above 256K tokens.

by Qwen|1M Context|$0.40/M In|$2.40/M Out

Qwen3.6 Flash

Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window. Tiered pricing kicks in above 256K tokens. Prompt caching is supported, with both explicit cache read and cache creation pricing.

by Qwen|1M Context|$0.25/M In|$1.50/M Out

Qwen3.6 35B A3B

Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token. It uses a hybrid sparse mixture-of-experts architecture combining Gated DeltaNet linear attention with standard gated attention layers, enabling efficient inference at a fraction of the compute cost. The model supports a 262K token native context window (extensible to 1M via YaRN) and accepts text, image, and video inputs. It includes integrated thinking mode with reasoning traces preserved across multi-turn conversations, function calling, and structured output. Released under the Apache 2.0 license.

by Qwen|262K Context|$0.15/M In|$1.00/M Out

Qwen3.6 Max Preview

Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture with approximately 1 trillion total parameters. It is optimized for agentic coding, tool use, and long-context reasoning, supporting a 262K token context window. The model includes an integrated thinking mode that preserves reasoning traces across multi-turn conversations and supports structured output and function calling. Access is available exclusively through the Alibaba Cloud Model Studio and Qwen Studio APIs; no open weights are provided.

by Qwen|262K Context|$1.04/M In|$6.24/M Out

Qwen3.6 27B

Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities — accepting text, image, and video inputs — and supports a 262,144-token context window. The model is designed for agentic coding and reasoning tasks, with particular strength in repository-level code comprehension, front-end development workflows, and multi-step problem solving. It includes a built-in thinking mode for extended reasoning and preserves thinking context across conversation history. Qwen3.6 27B supports 201 languages and dialects and is released under the Apache 2.0 license.

by Qwen|262K Context|$0.32/M In|$3.20/M Out

GPT-5.5 ProPlus

GPT-5.5 Pro is OpenAI’s high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs, and is designed for long-horizon problem solving, agentic coding, and precise execution across multi-step workflows.

by Openai|1M Context|$30.00/M In|$180.00/M Out

GPT-5.5Plus

GPT-5.5 is OpenAI’s frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency on hard tasks. It features a 1M+ token context window (922K input, 128K output) with support for text and image inputs, enabling large-scale reasoning, coding, and multimodal workflows within a single system.

by Openai|1M Context|$5.00/M In|$30.00/M Out