AI Models

356 models Free & Paid Cập nhật: 13 giờ trước

Fast-mode variant of [Opus 4.7](/anthropic/claude-opus-4.7) - identical capabilities with higher output speed at premium 6x pricing. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode

by |Th5 2026 |1M context |$30.00/M input |$150.00/M output
1M tokens

Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model for video and embodied reasoning.** It accepts image and video inputs paired with natural language queries, and produces detailed visual understanding...

by |Th5 2026 |33K context |$0.1500/M input |$1.50/M output

Ring-2.6-1T is a 1T-parameter-scale thinking model with 63B active parameters, built for real-world agent workflows that require both strong capability and operational efficiency. It is optimized for coding agents, tool...

by |Th5 2026 |262K context |$0.0750/M input |$0.6250/M output
262K tokens

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic...

by |Th5 2026 |1M context |$0.2500/M input |$1.50/M output
1M tokens

CoBuddy is a code generation model from Baidu, optimized for coding tasks and AI Agent workflows. It features high inference throughput and low end-to-end latency, with native support for tool...

by |Th5 2026 |131K context |Miễn phí input |Miễn phí output
131K tokens

GPT Chat Latest points to OpenAI's stable API alias `chat-latest` that always resolves to the latest Instant chat model used in ChatGPT. As OpenAI rolls out new Instant model updates...

by |Th5 2026 |400K context |$5.00/M input |$30.00/M output
400K tokens

Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual...

by |Th4 2026 |1M context |$1.25/M input |$2.50/M output
1M tokens

Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-token context window and is designed for enterprise tasks...

by |Th4 2026 |131K context |$0.0500/M input |$0.1000/M output
131K tokens

Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex...

by |Th4 2026 |262K context |$1.50/M input |$7.50/M output
262K tokens

Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation, automated workflows, and complex instruction execution....

by |Th4 2026 |1M context |Miễn phí input |Miễn phí output
1M tokens

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...

by |Th4 2026 |256K context |Miễn phí input |Miễn phí output
256K tokens

Laguna XS.2 is the second-generation model in the XS size class from [Poolside](https://poolside.ai), their efficient coding agent series. It combines tool calling and reasoning capabilities with a compact footprint, offering...

by |Th4 2026 |131K context |Miễn phí input |Miễn phí output
131K tokens

Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 128K...

by |Th4 2026 |131K context |Miễn phí input |Miễn phí output
131K tokens

This model always redirects to the latest model in the Anthropic Claude Haiku family.

by |Th4 2026 |200K context |$1.00/M input |$5.00/M output
200K tokens

This model always redirects to the latest model in the OpenAI GPT Mini family.

by |Th4 2026 |400K context |$0.7500/M input |$4.50/M output
400K tokens

This model always redirects to the latest model in the Google Gemini Pro family.

by |Th4 2026 |1M context |$2.00/M input |$12.00/M output
1M tokens

This model always redirects to the latest model in the MoonshotAI Kimi family.

by |Th4 2026 |262K context |$0.7300/M input |$3.49/M output
262K tokens

This model always redirects to the latest model in the Google Gemini Flash family.

by |Th4 2026 |1M context |$0.5000/M input |$3.00/M output
1M tokens

This model always redirects to the latest model in the Anthropic Claude Sonnet family.

by |Th4 2026 |1M context |$3.00/M input |$15.00/M output
1M tokens

This model always redirects to the latest model in the OpenAI GPT family.

by |Th4 2026 |1.1M context |$5.00/M input |$30.00/M output
1.1M tokens

Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces text output, with a 1M token context window. This...

by |Th4 2026 |1M context |$0.3000/M input |$1.80/M output
1M tokens

Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window. Tiered pricing kicks in...

by |Th4 2026 |1M context |$0.1875/M input |$1.13/M output
1M tokens

Qwen3.6-35B-A3B is an open-weight multimodal model from Alibaba Cloud with 35 billion total parameters and 3 billion active parameters per token. It uses a hybrid sparse mixture-of-experts architecture combining Gated...

by |Th4 2026 |262K context |$0.1500/M input |$1.00/M output
262K tokens

Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture with approximately 1 trillion total parameters. It is optimized for agentic coding, tool use, and...

by |Th4 2026 |262K context |$1.04/M input |$6.24/M output
262K tokens

Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities — accepting text, image, and video inputs...

by |Th4 2026 |262K context |$0.3200/M input |$3.20/M output
262K tokens

GPT-5.5 Pro is OpenAI’s high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads. It features a 1M+ token context window (922K input, 128K output) with support for...

by |Th4 2026 |1.1M context |$30.00/M input |$180.00/M output
1.1M tokens

GPT-5.5 is OpenAI’s frontier model designed for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency on hard tasks. It features a 1M+ token...

by |Th4 2026 |1.1M context |$5.00/M input |$30.00/M output
1.1M tokens

DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning, coding,...

by |Th4 2026 |1M context |$0.4350/M input |$0.8700/M output
1M tokens

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and...

by |Th4 2026 |1M context |Miễn phí input |Miễn phí output
1M tokens

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and...

by |Th4 2026 |1M context |$0.1120/M input |$0.2240/M output
1M tokens

Ling-2.6-1T is an instant (instruct) model from inclusionAI and the company’s trillion-parameter flagship, designed for real-world agents that require fast execution and high efficiency at scale. It uses a “fast...

by |Th4 2026 |262K context |$0.3000/M input |$2.50/M output
262K tokens

Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning levels across disabled, low, and high modes, allowing it to...

by |Th4 2026 |262K context |$0.0660/M input |$0.2600/M output
262K tokens

MiMo-V2.5-Pro is Xiaomi’s flagship model, delivering strong performance in general agentic capabilities, complex software engineering, and long-horizon tasks, with top rankings on benchmarks such as ClawEval, GDPVal, and SWE-bench Pro....

by |Th4 2026 |1M context |$1.00/M input |$3.00/M output
1M tokens

MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding...

by |Th4 2026 |1M context |$0.4000/M input |$2.00/M output
1M tokens

[GPT-5.4](https://openrouter.ai/openai/gpt-5.4) Image 2 combines OpenAI's GPT-5.4 model with state-of-the-art image generation capabilities from GPT Image 2. It enables rich multimodal workflows, allowing users to seamlessly move between reasoning, coding, and...

by |Th4 2026 |272K context |$8.00/M input |$15.00/M output
272K tokens

Ling-2.6-flash is an instant (instruct) model from inclusionAI with 104B total parameters and 7.4B active parameters, designed for real-world agents that require fast responses, strong execution, and high token efficiency....

by |Th4 2026 |262K context |$0.0100/M input |$0.0300/M output
262K tokens

The Pareto Router maintains a tiered shortlist of strong coding models, ranked by [Artificial Analysis](https://artificialanalysis.ai/) coding percentiles. Set min_coding_score between 0 and 1 on the [pareto-router plugin](https://openrouter.ai/docs/guides/routing/routers/pareto-router#the-min_coding_score-parameter) to control how...

by |Th4 2026 |2M context |Miễn phí input |Miễn phí output
2M tokens

Qianfan-OCR-Fast is a domain-specific multimodal large model purpose-built for OCR. By leveraging specialized OCR training data while preserving versatile multimodal intelligence, it provides a powerful performance upgrade over Qianfan-OCR.

by |Th4 2026 |66K context |$0.6800/M input |$2.81/M output
66K tokens

Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, coding-driven UI/UX generation, and multi-agent orchestration. It handles complex end-to-end coding tasks across Python, Rust, and Go, and...

by |Th4 2026 |262K context |$0.7300/M input |$3.49/M output
262K tokens

Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the coding and agentic strengths of Opus 4.6, it delivers stronger performance on...

by |Th4 2026 |1M context |$5.00/M input |$25.00/M output
1M tokens

Fast-mode variant of [Opus 4.6](/anthropic/claude-opus-4.6) - identical capabilities with higher output speed at premium 6x pricing. Learn more in Anthropic's docs: https://platform.claude.com/docs/en/build-with-claude/fast-mode

by |Th4 2026 |1M context |$30.00/M input |$150.00/M output
1M tokens

GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minute-level interactions, GLM-5.1 can work independently and continuously on...

by |Th4 2026 |203K context |$0.9800/M input |$3.08/M output
203K tokens

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

by |Th4 2026 |262K context |Miễn phí input |Miễn phí output
262K tokens

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

by |Th4 2026 |262K context |$0.0600/M input |$0.3300/M output
262K tokens

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

by |Th4 2026 |262K context |Miễn phí input |Miễn phí output
262K tokens

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

by |Th4 2026 |262K context |$0.1200/M input |$0.3700/M output
262K tokens

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...

by |Th4 2026 |1M context |$0.3250/M input |$1.95/M output
1M tokens

GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, and text inputs, excels at long-horizon planning, complex coding,...

by |Th4 2026 |203K context |$1.20/M input |$4.00/M output
203K tokens

Trinity Large Thinking is a powerful open source reasoning model from the team at Arcee AI. It shows strong performance in PinchBench, agentic workloads, and reasoning tasks. Launch video: https://youtu.be/Gc82AXLa0Rg?si=4RLn6WBz33qT--B7...

by |Th4 2026 |262K context |$0.2200/M input |$0.8500/M output
262K tokens