AI Models

357 모델 무료 & Paid Cập nhật: 9 hours trước

GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments. While limited in reasoning depth compared to its larger...

by |Aug 2025 |400K context |$0.0500/M input |$0.4000/M output
400K tokens

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...

by |Aug 2025 |131K context |Miễn phí input |Miễn phí output
131K tokens

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...

by |Aug 2025 |131K context |$0.0390/M input |$0.1800/M output
131K tokens

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 특허. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...

by |Aug 2025 |131K context |Miễn phí input |Miễn phí output
131K tokens

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 특허. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for...

by |Aug 2025 |131K context |$0.0300/M input |$0.1400/M output
131K tokens

직장 마감 4.1 is an updated version of Anthropic’s flagship model, offering improved performance in coding, reasoning, and agentic tasks. It achieves 74.5% on SWE-bench Verified and shows notable gains...

by |Aug 2025 |200K context |$15.00/M input |$75.00/M output
200K tokens

Mistral's cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation. [Blog Post](https://mistral.ai/news/codestral-25-08)

by |Aug 2025 |256K context |$0.3000/M input |$0.9000/M output
256K tokens

Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the...

by |Jul 2025 |160K context |$0.0700/M input |$0.2700/M output
160K tokens

Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. It operates in non-thinking mode and is designed for high-quality instruction following, multilingual understanding, and...

by |Jul 2025 |262K context |$0.0900/M input |$0.3000/M output
262K tokens

GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly...

by |Jul 2025 |131K context |$0.6000/M input |$2.20/M output
131K tokens

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter...

by |Jul 2025 |131K context |Miễn phí input |Miễn phí output
131K tokens

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter...

by |Jul 2025 |131K context |$0.1300/M input |$0.8500/M output
131K tokens

Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144...

by |Jul 2025 |262K context |$0.1495/M input |$1.50/M output
262K tokens

GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It...

by |Jul 2025 |128K context |$0.1000/M input |$0.1000/M output
128K tokens

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, 도구 사용, and long-context reasoning over...

by |Jul 2025 |1M context |Miễn phí input |Miễn phí output
1M tokens

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, 도구 사용, and long-context reasoning over...

by |Jul 2025 |1M context |$0.2200/M input |$1.80/M output
1M tokens

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...

by |Jul 2025 |128K context |$0.1000/M input |$0.2000/M output
128K tokens

쌍둥이자리 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 가족, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance...

by |Jul 2025 |1M context |$0.1000/M input |$0.4000/M output
1M tokens

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following,...

by |Jul 2025 |262K context |$0.0710/M input |$0.1000/M output
262K tokens

Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library. As the world of LLMs advances, our router gets smarter, ensuring you...

by |Jul 2025 |131K context |$0.8500/M input |$3.40/M output
131K tokens

Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for...

by |Jul 2025 |131K context |$0.5700/M input |$2.30/M output
131K tokens

Devstral Medium is a high-performance code generation and agentic reasoning model developed jointly by Mistral AI and All Hands AI. Positioned as a step up from Devstral Small, it achieves...

by |Jul 2025 |131K context |$0.4000/M input |$2.00/M output
131K tokens

Devstral Small 1.1 is a 24B parameter open-weight language model for software engineering agents, developed by Mistral AI in collaboration with All Hands AI. Finetuned from Mistral Small 3.1 and...

by |Jul 2025 |131K context |$0.1000/M input |$0.3000/M output
131K tokens

Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving...

by |Jul 2025 |33K context |Miễn phí input |Miễn phí output

Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark...

by |Jul 2025 |131K context |$0.1400/M input |$0.5700/M output
131K tokens

Morph's high-accuracy apply model for complex code edits. ~4,500 tokens/sec with 98% accuracy for precise code transformations. The model requires the prompt to be in the following format: {instruction} {initial_code}...

by |Jul 2025 |262K context |$0.9000/M input |$1.90/M output
262K tokens

Morph's fastest apply model for code edits. ~10,500 tokens/sec with 96% accuracy for rapid code transformations. The model requires the prompt to be in the following format: {instruction} {initial_code} {edit_snippet}...

by |Jul 2025 |82K context |$0.8000/M input |$1.20/M output
82K tokens

ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 시리즈, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data...

by |6 월 2025 |131K context |$0.4200/M input |$1.25/M output
131K tokens

ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 시리즈. It activates 47B parameters per token and supports text generation in...

by |6 월 2025 |131K context |$0.2800/M input |$1.10/M output
131K tokens

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. 에 비해 3.1 release, 버전 3.2 significantly improves accuracy on...

by |6 월 2025 |128K context |$0.0750/M input |$0.2000/M output
128K tokens

MiniMax-M1 is a large-scale, open-weight reasoning model designed for extended context and high-efficiency inference. It leverages a hybrid Mixture-of-Experts (MoE) architecture paired with a custom "lightning attention" mechanism, allowing it...

by |6 월 2025 |1M context |$0.4000/M input |$2.20/M output
1M tokens

쌍둥이자리 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, 코딩, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...

by |6 월 2025 |1M context |$0.3000/M input |$2.50/M output
1M tokens

쌍둥이자리 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, 코딩, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

by |6 월 2025 |1M context |$1.25/M input |$10.00/M output
1M tokens

The o-series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o3-pro model uses more compute to think harder and provide consistently...

by |6 월 2025 |200K context |$20.00/M input |$80.00/M output
200K tokens

쌍둥이자리 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, 코딩, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

by |6 월 2025 |1M context |$1.25/M input |$10.00/M output
1M tokens

May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active...

by |5월 2025 |164K context |$0.5000/M input |$2.15/M output
164K tokens

직장 마감 4 is benchmarked as the world’s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in...

by |5월 2025 |200K context |$15.00/M input |$75.00/M output
200K tokens

클로드 소네트 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%),...

by |5월 2025 |1M context |$3.00/M input |$15.00/M output
1M tokens

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks...

by |5월 2025 |33K context |$0.0600/M input |$0.1200/M output

Mistral Medium 3 is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost...

by |5월 2025 |131K context |$0.4000/M input |$2.00/M output
131K tokens

쌍둥이자리 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, 코딩, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...

by |5월 2025 |1M context |$1.25/M input |$10.00/M output
1M tokens

Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text grounding tasks. It offers a 32 k‑token context window, enabling rich multimodal...

by |5월 2025 |131K context |$0.1800/M input |$0.1800/M output
131K tokens

Maestro Reasoning is Arcee's flagship analysis model: a 32 B‑parameter derivative of Qwen 2.5‑32 B tuned with DPO and chain‑of‑thought RL for step‑by‑step logic. Compared to the earlier 7 B...

by |5월 2025 |131K context |$0.9000/M input |$3.30/M output
131K tokens

Virtuoso‑Large is Arcee's top‑tier general‑purpose LLM at 72 B parameters, tuned to tackle cross‑domain reasoning, creative writing and enterprise QA. Unlike many 70 B peers, it retains the 128 k...

by |5월 2025 |131K context |$0.7500/M input |$1.20/M output
131K tokens

Coder‑Large is a 32 B‑parameter offspring of Qwen 2.5‑Instruct that has been further trained on permissively‑licensed GitHub, CodeSearchNet and synthetic bug‑fix corpora. It supports a 32k context window, enabling multi‑file...

by |5월 2025 |33K context |$0.5000/M input |$0.8000/M output

Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...

by |4월 2025 |164K context |$0.1800/M input |$0.1800/M output
164K tokens

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique...

by |4월 2025 |131K context |$0.0900/M input |$0.4500/M output
131K tokens

Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between "thinking" mode for math,...

by |4월 2025 |131K context |$0.0500/M input |$0.4000/M output
131K tokens

Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...

by |4월 2025 |132K context |$0.1000/M input |$0.2400/M output
132K tokens

Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...

by |4월 2025 |131K context |$0.0800/M input |$0.2800/M output
131K tokens