Open-Weight Models

Today’s News Sources

Recent open-weight model releases — downloadable weights, sorted newest first. Updated daily.

Last updated — Wednesday, April 29, 2026

Show models that fit in

Poolside Laguna XS.2 Apr 28, 2026

Poolside · MoE: 33B total / 3B active (256 experts + 1 shared) · Apache 2.0

Min ~19 GB (Q4) · Rec ~75 GB (BF16)

First US-lab open-weight agentic coding model since Llama 4. 131K context, FP8 KV cache, mixed sliding-window/global attention 3:1 across 40 layers, native interleaved reasoning. SWE-bench Verified 68.2 / Pro 44.5 / Terminal-Bench 30.1; runs on a Mac with 36 GB RAM via Ollama.

DeepSeek V4 (Pro / Flash) Apr 24, 2026

DeepSeek · MoE: Pro 1.6T / 49B active · Flash 284B / 13B active · MIT

Min ~880 GB / 156 GB (Q4) · Rec ~3.5 TB / 625 GB (BF16) — Pro is multi-node only

Frontier-class two-variant drop, both with 1M-token context. Pro reportedly matches or exceeds Claude Opus 4.6 on coding and reasoning benchmarks at a fraction of the price; Flash trades capability for cheaper deployment. Both on Hugging Face under MIT.

Tencent Hy3-preview Apr 23, 2026

Tencent Hy team · MoE: 295B total / 21B active (192 experts, top-8) · Tencent Hy Community License

Min ~162 GB (Q4) · Rec ~649 GB (BF16)

First open-weights release from the rebuilt HunYuan stack. 256K context, 80 layers plus an MTP layer; reports SWE-bench Verified 74.4 and Terminal-Bench 2.0 54.4. Trained in under three months from cold-start to release.

Qwen3.6-27B Apr 22, 2026

Alibaba · 27B dense · Apache 2.0

Min ~16 GB (Q4) · Rec ~60 GB (BF16)

Dense 27B variant of the Qwen3.6 family with 262K native context (extensible to ~1M). Targets agentic coding and frontend repo-level tasks; reportedly outperforms Qwen3.5-397B-A17B on coding benchmarks.

Ling-2.6-Flash Apr 22, 2026

Ant Group · MoE: 104B total / 7.4B active · MIT

Min ~58 GB (Q4) · Rec ~230 GB (BF16)

Sparse MoE optimized for inference cost — Ant claims ~86% fewer tokens consumed than peers on the Artificial Analysis index, with 340 tok/s prefill on 4xH20. Tuned for tool-use and SWE-bench Verified.

Kimi K2.6 Apr 20, 2026

Moonshot AI · MoE: 1T total / 32B active · Modified MIT

Min ~550 GB (Q4) · Rec ~2.2 TB (BF16) — multi-node only

Trillion-parameter MoE with 384 experts, 256K context, and a 400M MoonViT vision encoder. Scores 58.6 on SWE-Bench Pro and 54.0 on HLE-Full with tools, and supports 300-sub-agent swarms over 4,000 steps.

Qwen3.6-35B-A3B Apr 16, 2026

Alibaba · MoE: 35B total / 3B active · Apache 2.0

Min ~20 GB (Q4) · Rec ~77 GB (BF16)

Lightweight MoE Qwen3.6 variant with thinking-preservation across turns, designed for cheap agentic coding loops. Native 262K context; first Qwen3.6 open-weight drop.

LFM2.5-350M Apr 15, 2026

Liquid AI · 350M dense · LFM Open License v1.0

Min ~400 MB (Q4) · Rec ~1 GB (BF16) — phone/edge

Tiny on-device model trained on a 28T-token corpus using RL plus Liquid's non-transformer hybrid architecture. Targets phone-class deployment for assistants and tool-use.

MiniMax-M2.7 Apr 13, 2026

MiniMax · MoE: 230B total / 10B active · Custom (non-commercial free; commercial requires authorization)

Min ~127 GB (Q4) · Rec ~506 GB (BF16)

Sparse 256-expert MoE with 200K context, tuned for multi-file code edits and agentic tool-use loops. Trended top-30 on HF within 48 hours of release.

LFM2.5-VL-450M Apr 12, 2026

Liquid AI · 450M dense · LFM Open License v1.0

Min ~500 MB (Q4) · Rec ~1.2 GB (BF16) — phone/edge

Vision-language model with bounding-box prediction and multilingual prompt support, designed for sub-250ms edge inference on consumer hardware.

MolmoWeb 4B / 8B Apr 10, 2026

Allen Institute for AI (Ai2) · 4B / 8B (Qwen3-based) · Apache 2.0

Min ~3–5 GB (Q4) · Rec ~10–18 GB (BF16)

Open-weight visual web agent built on Molmo 2; operates a browser via screenshots. First fully-open agent to top closed systems on four web-navigation benchmarks; 30K human task trajectories shipped with the weights.

GLM-5.1 Apr 7, 2026

Z.ai (formerly Zhipu AI) · MoE: ~745B total / 44B active · MIT

Min ~410 GB (Q4) · Rec ~1.6 TB (BF16) — multi-node only

Tuning of GLM-5 for long-horizon coding agents; reported 58.4 on SWE-Bench Pro, ahead of GPT-5.4 and Claude Opus 4.6 on that benchmark. Trained on Huawei Ascend hardware.

Gemma 4 Apr 2, 2026

Google DeepMind · E2B / E4B / 26B-A4B MoE / 31B dense · Apache 2.0

Min ~2–18 GB (Q4, smallest to 31B) · Rec ~6–68 GB (BF16)

First Gemma family shipped without a custom license. Multimodal incl. audio; 31B dense reports 89.2 on AIME 2026 and 80.0 on LiveCodeBench v6 — large jumps over Gemma 3 27B.

Voxtral 4B TTS Mar 26, 2026

Mistral AI · ~4B · CC BY-NC 4.0

Min ~3 GB (Q4) · Rec ~9 GB (BF16)

Mistral's first text-to-speech model. Nine languages, ~70ms latency, zero-shot voice cloning from 3 seconds of reference audio. Non-commercial weights only; commercial via Mistral API.

Mistral Small 4 (119B) Mar 16, 2026

Mistral AI · MoE: 119B total / 6B active · Apache 2.0

Min ~66 GB (Q4) · Rec ~262 GB (BF16)

Unifies the previously separate Magistral (reasoning), Pixtral (vision), and Devstral (agentic coding) lines into one MoE with 256K context. Day-zero support across vLLM, llama.cpp, SGLang, and NIM.

Nemotron 3 Super Mar 11, 2026

NVIDIA · MoE: 120B total / 12B active · NVIDIA Open Model License

Min ~66 GB (Q4) · Rec ~264 GB (BF16)

Hybrid MoE with multi-token prediction and a four-experts-for-the-cost-of-one activation trick, claimed ~3x faster inference than peers. NVIDIA also published 10T tokens of training data.

OLMo Hybrid 7B Mar 5, 2026

Allen Institute for AI (Ai2) · 7B dense · Apache 2.0

Min ~5 GB (Q4) · Rec ~16 GB (BF16)

Interleaves transformer attention with Gated DeltaNet (linear-RNN) layers; matches OLMo 3 MMLU using 49% fewer tokens, with ~75% better long-context throughput. Weights, data, and training logs all public.

Phi-4-reasoning-vision-15B Mar 4, 2026

Microsoft · 15B dense · MIT

Min ~9 GB (Q4) · Rec ~33 GB (BF16)

Multimodal reasoning model built on Phi-4-Reasoning + SigLIP-2. Includes a learned policy for when to enter chain-of-thought versus answer directly, aimed at saving tokens on easy queries.

Hermes Agent (framework) Feb 25, 2026

Nous Research · N/A (agent framework) · MIT

Memory depends on backing LLM (framework only)

Self-hostable agent framework with three-layer memory and 118 skills, built atop existing open-weight LLMs. Listed because of architectural impact, not weights — reached 95K GitHub stars in 7 weeks.

LFM2-24B-A2B Feb 24, 2026

Liquid AI · MoE: 24B total / 2B active · LFM Open License v1.0

Min ~14 GB (Q4) · Rec ~53 GB (BF16)

Hybrid attention+convolution MoE designed to scale Liquid's edge architecture upward. Day-zero llama.cpp/vLLM/SGLang support; runs on a single consumer GPU with 4-bit quant.

Tiny Aya (Base / Global / Earth / Fire / Water) Feb 17, 2026

Cohere Labs · 3.35B dense · CC-BY-NC 4.0

Min ~2 GB (Q4, per variant) · Rec ~7 GB (BF16, per variant)

Five-model release covering 70+ languages, including three regional fine-tunes (Africa/W. Asia, S. Asia, Asia-Pacific/Europe). Targets phone-class multilingual deployment.

Qwen3.5-397B-A17B Feb 16, 2026

Alibaba · MoE: 397B total / 17B active · Apache 2.0

Min ~220 GB (Q4) · Rec ~875 GB (BF16) — multi-GPU

Headline Qwen3.5 release: native multimodal, 201-language support, and the largest open-weight MoE Alibaba has shipped. Smaller dense variants (122B, 35B, 27B, 9B → 0.8B) followed Feb 24 and Mar 2.

GLM-5 Feb 11, 2026

Z.ai (formerly Zhipu AI) · MoE: 745B total / 44B active · MIT

Min ~410 GB (Q4) · Rec ~1.6 TB (BF16) — multi-node only

First frontier-class model from a publicly traded Chinese AI company; trained entirely on Huawei Ascend chips. Replaced six weeks later by GLM-5.1.

Qwen3-Coder-Next Feb 4, 2026

Alibaba · MoE: 80B total / 3B active · Apache 2.0

Min ~45 GB (Q4) · Rec ~176 GB (BF16)

Hybrid sparse-MoE coding model that activates only 3B per token; >70% on SWE-Bench Verified using SWE-Agent scaffolding, putting it in range of much larger dense coding models.

FLUX.2 [klein] Jan 17, 2026

Black Forest Labs · 4B (Apache) / 9B · Apache 2.0 (4B variant)

Min ~3–6 GB (Q4) · Rec ~9–20 GB (BF16)

Distilled image generators from the FLUX.2 family targeting sub-second generation. The 4B variant is the first FLUX model under a fully permissive license; pairs with the open Apache 2.0 FLUX.2 VAE.