Recent open-weight model releases — downloadable weights, sorted newest first. Updated daily.
No models fit in this memory budget. Try a larger size.
Poolside · MoE: 33B total / 3B active (256 experts + 1 shared) · Apache 2.0
Min ~19 GB (Q4) · Rec ~75 GB (BF16)
First US-lab open-weight agentic coding model since Llama 4. 131K context, FP8 KV cache, mixed sliding-window/global attention 3:1 across 40 layers, native interleaved reasoning. SWE-bench Verified 68.2 / Pro 44.5 / Terminal-Bench 30.1; runs on a Mac with 36 GB RAM via Ollama.
DeepSeek · MoE: Pro 1.6T / 49B active · Flash 284B / 13B active · MIT
Min ~880 GB / 156 GB (Q4) · Rec ~3.5 TB / 625 GB (BF16) — Pro is multi-node only
Frontier-class two-variant drop, both with 1M-token context. Pro reportedly matches or exceeds Claude Opus 4.6 on coding and reasoning benchmarks at a fraction of the price; Flash trades capability for cheaper deployment. Both on Hugging Face under MIT.
Tencent Hy team · MoE: 295B total / 21B active (192 experts, top-8) · Tencent Hy Community License
Min ~162 GB (Q4) · Rec ~649 GB (BF16)
First open-weights release from the rebuilt HunYuan stack. 256K context, 80 layers plus an MTP layer; reports SWE-bench Verified 74.4 and Terminal-Bench 2.0 54.4. Trained in under three months from cold-start to release.
Alibaba · 27B dense · Apache 2.0
Min ~16 GB (Q4) · Rec ~60 GB (BF16)
Dense 27B variant of the Qwen3.6 family with 262K native context (extensible to ~1M). Targets agentic coding and frontend repo-level tasks; reportedly outperforms Qwen3.5-397B-A17B on coding benchmarks.
Ant Group · MoE: 104B total / 7.4B active · MIT
Min ~58 GB (Q4) · Rec ~230 GB (BF16)
Sparse MoE optimized for inference cost — Ant claims ~86% fewer tokens consumed than peers on the Artificial Analysis index, with 340 tok/s prefill on 4xH20. Tuned for tool-use and SWE-bench Verified.
Moonshot AI · MoE: 1T total / 32B active · Modified MIT
Min ~550 GB (Q4) · Rec ~2.2 TB (BF16) — multi-node only
Trillion-parameter MoE with 384 experts, 256K context, and a 400M MoonViT vision encoder. Scores 58.6 on SWE-Bench Pro and 54.0 on HLE-Full with tools, and supports 300-sub-agent swarms over 4,000 steps.
Alibaba · MoE: 35B total / 3B active · Apache 2.0
Min ~20 GB (Q4) · Rec ~77 GB (BF16)
Lightweight MoE Qwen3.6 variant with thinking-preservation across turns, designed for cheap agentic coding loops. Native 262K context; first Qwen3.6 open-weight drop.
Liquid AI · 350M dense · LFM Open License v1.0
Min ~400 MB (Q4) · Rec ~1 GB (BF16) — phone/edge
Tiny on-device model trained on a 28T-token corpus using RL plus Liquid's non-transformer hybrid architecture. Targets phone-class deployment for assistants and tool-use.
MiniMax · MoE: 230B total / 10B active · Custom (non-commercial free; commercial requires authorization)
Min ~127 GB (Q4) · Rec ~506 GB (BF16)
Sparse 256-expert MoE with 200K context, tuned for multi-file code edits and agentic tool-use loops. Trended top-30 on HF within 48 hours of release.
Liquid AI · 450M dense · LFM Open License v1.0
Min ~500 MB (Q4) · Rec ~1.2 GB (BF16) — phone/edge
Vision-language model with bounding-box prediction and multilingual prompt support, designed for sub-250ms edge inference on consumer hardware.
Allen Institute for AI (Ai2) · 4B / 8B (Qwen3-based) · Apache 2.0
Min ~3–5 GB (Q4) · Rec ~10–18 GB (BF16)
Open-weight visual web agent built on Molmo 2; operates a browser via screenshots. First fully-open agent to top closed systems on four web-navigation benchmarks; 30K human task trajectories shipped with the weights.
Z.ai (formerly Zhipu AI) · MoE: ~745B total / 44B active · MIT
Min ~410 GB (Q4) · Rec ~1.6 TB (BF16) — multi-node only
Tuning of GLM-5 for long-horizon coding agents; reported 58.4 on SWE-Bench Pro, ahead of GPT-5.4 and Claude Opus 4.6 on that benchmark. Trained on Huawei Ascend hardware.
Google DeepMind · E2B / E4B / 26B-A4B MoE / 31B dense · Apache 2.0
Min ~2–18 GB (Q4, smallest to 31B) · Rec ~6–68 GB (BF16)
First Gemma family shipped without a custom license. Multimodal incl. audio; 31B dense reports 89.2 on AIME 2026 and 80.0 on LiveCodeBench v6 — large jumps over Gemma 3 27B.
Mistral AI · ~4B · CC BY-NC 4.0
Min ~3 GB (Q4) · Rec ~9 GB (BF16)
Mistral's first text-to-speech model. Nine languages, ~70ms latency, zero-shot voice cloning from 3 seconds of reference audio. Non-commercial weights only; commercial via Mistral API.
Mistral AI · MoE: 119B total / 6B active · Apache 2.0
Min ~66 GB (Q4) · Rec ~262 GB (BF16)
Unifies the previously separate Magistral (reasoning), Pixtral (vision), and Devstral (agentic coding) lines into one MoE with 256K context. Day-zero support across vLLM, llama.cpp, SGLang, and NIM.
NVIDIA · MoE: 120B total / 12B active · NVIDIA Open Model License
Min ~66 GB (Q4) · Rec ~264 GB (BF16)
Hybrid MoE with multi-token prediction and a four-experts-for-the-cost-of-one activation trick, claimed ~3x faster inference than peers. NVIDIA also published 10T tokens of training data.
Allen Institute for AI (Ai2) · 7B dense · Apache 2.0
Min ~5 GB (Q4) · Rec ~16 GB (BF16)
Interleaves transformer attention with Gated DeltaNet (linear-RNN) layers; matches OLMo 3 MMLU using 49% fewer tokens, with ~75% better long-context throughput. Weights, data, and training logs all public.
Microsoft · 15B dense · MIT
Min ~9 GB (Q4) · Rec ~33 GB (BF16)
Multimodal reasoning model built on Phi-4-Reasoning + SigLIP-2. Includes a learned policy for when to enter chain-of-thought versus answer directly, aimed at saving tokens on easy queries.
Nous Research · N/A (agent framework) · MIT
Memory depends on backing LLM (framework only)
Self-hostable agent framework with three-layer memory and 118 skills, built atop existing open-weight LLMs. Listed because of architectural impact, not weights — reached 95K GitHub stars in 7 weeks.
Liquid AI · MoE: 24B total / 2B active · LFM Open License v1.0
Min ~14 GB (Q4) · Rec ~53 GB (BF16)
Hybrid attention+convolution MoE designed to scale Liquid's edge architecture upward. Day-zero llama.cpp/vLLM/SGLang support; runs on a single consumer GPU with 4-bit quant.
Cohere Labs · 3.35B dense · CC-BY-NC 4.0
Min ~2 GB (Q4, per variant) · Rec ~7 GB (BF16, per variant)
Five-model release covering 70+ languages, including three regional fine-tunes (Africa/W. Asia, S. Asia, Asia-Pacific/Europe). Targets phone-class multilingual deployment.
Alibaba · MoE: 397B total / 17B active · Apache 2.0
Min ~220 GB (Q4) · Rec ~875 GB (BF16) — multi-GPU
Headline Qwen3.5 release: native multimodal, 201-language support, and the largest open-weight MoE Alibaba has shipped. Smaller dense variants (122B, 35B, 27B, 9B → 0.8B) followed Feb 24 and Mar 2.
Z.ai (formerly Zhipu AI) · MoE: 745B total / 44B active · MIT
Min ~410 GB (Q4) · Rec ~1.6 TB (BF16) — multi-node only
First frontier-class model from a publicly traded Chinese AI company; trained entirely on Huawei Ascend chips. Replaced six weeks later by GLM-5.1.
Alibaba · MoE: 80B total / 3B active · Apache 2.0
Min ~45 GB (Q4) · Rec ~176 GB (BF16)
Hybrid sparse-MoE coding model that activates only 3B per token; >70% on SWE-Bench Verified using SWE-Agent scaffolding, putting it in range of much larger dense coding models.
Black Forest Labs · 4B (Apache) / 9B · Apache 2.0 (4B variant)
Min ~3–6 GB (Q4) · Rec ~9–20 GB (BF16)
Distilled image generators from the FLUX.2 family targeting sub-second generation. The 4B variant is the first FLUX model under a fully permissive license; pairs with the open Apache 2.0 FLUX.2 VAE.