Extended Catalogs 6 Families 54 Models Free to Download

Specialty Model Catalogs

The default NotesXML catalog ships 10 Pareto-optimised models. These extended family catalogs let you explore the full breadth of each model lineage — from sub-1 GB ultra-lights to 400B datacenter giants.

How to load a specialty catalog

  1. Download a catalog JSON file from the cards below.
  2. In NotesXML, open Settings → AI → Model Catalog → Import Catalog.
  3. Select the downloaded file. NotesXML will load the new model list immediately.
  4. To return to defaults, tap Reset to Default Catalog in the same menu.

⚠️ Specialty catalogs include models that are significantly larger than the defaults. Check the RAM and storage requirements for each model before downloading it inside the app. Large models (20 GB+) are intended for desktop hardware with 32 GB+ RAM.

Choose a model family

Each catalog focuses on one model lineage. Models that already ship in the default catalog are included here too — the full family gives you context about where the defaults sit within their lineage.

🟢

Gemma Family

Google · Apache 2.0 / Gemma ToU

↓ Download
10
Models
7
Vision-capable
0.8–19 GB
Model size

The complete Gemma lineage: Gemma 3 (1B–27B vision), Gemma 3n (E2B/E4B text-only), and Gemma 4 (E2B/E4B/26B/31B vision). Includes the Gemma 4 26B-A4B MoE and 31B dense models for high-RAM desktop systems.

Gemma 3 Gemma 3n Gemma 4 3 Desktop-only Apache 2.0 & Gemma ToU
🔶

Mistral Family

Mistral AI · Apache 2.0

↓ Download
14
Models
7
Vision-capable
2.1–80 GB
Model size

The broadest family catalog: Ministral 3/8/14B, Mistral 7B, Nemo 12B, Pixtral 12B (vision), Codestral 25.01, Mistral Small 3.1 & 3.2, Devstral Small, Mixtral 8x7B & 8x22B, Magistral Small 24B, and the new Mistral Small 4 119B MoE.

Ministral Mistral Small Mixtral MoE Pixtral 8 Desktop-only
🦙

Llama Family

Meta · Llama Community License

↓ Download
10
Models
4
Vision-capable
0.8–243 GB
Model size

From Llama 3.2 1B through Llama 4 Maverick 402B MoE: Llama 3.1 (8B/70B/405B), Llama 3.2 (1B/3B/11B/90B vision), Llama 3.3 70B, Llama 4 Scout 109B-16E, and Llama 4 Maverick 402B-128E. Warning: the 400B+ entries require 250+ GB RAM.

Llama 3.1 Llama 3.2 Llama 3.3 Llama 4 6 Desktop-only
🔷

Phi Family

Microsoft · MIT License

↓ Download
9
Models
1
Vision-capable
2.2–24 GB
Model size

Microsoft's Phi line: Phi-3.5 Mini & MoE 42B, Phi-4 14B, Phi-4 Mini, Phi-4 Multimodal 5.6B (vision), Phi-4 Reasoning, Phi-4 Reasoning Plus, Phi-4 Mini Reasoning, and Phi-4 Reasoning Vision 15B (text backbone only). All MIT licensed. Note: all Phi-4 models use hybrid Mamba1/SWA — flash-attn is disabled.

Phi-3.5 Phi-4 Reasoning 5 Desktop-only MIT
🪨

Granite Family

IBM · Apache 2.0

↓ Download
9
Models
0
Vision-capable
0.2–20 GB
Model size

IBM's enterprise-grade Granite stack: 3.0 (1B-A400M MoE, 2B, 8B), 3.2 8B with thinking, 4.0 350M (ultra-compact dense), 4.0 H-Small 32B MoE, and the full 4.1 generation (3B, 8B, 30B). All Apache 2.0. Text-only across the board — GGUF vision support not yet available.

Granite 3.0 Granite 3.2 Granite 4.0 Granite 4.1 Text-only Apache 2.0

GPT-OSS Family

OpenAI · Apache 2.0

↓ Download
2
Models
0
Vision-capable
14–63 GB
Model size

OpenAI's first open-weight releases (Aug 2025): GPT-OSS 20B (21B total / 3.6B active MoE) and GPT-OSS 120B (117B total / 5.1B active MoE). Both are reasoning models using the 'harmony' response format. Desktop-only. Minimum 16 GB RAM for 20B; 80+ GB RAM for 120B.

GPT-OSS 20B GPT-OSS 120B Reasoning MoE Desktop-only Apache 2.0

Why the default catalog has only 10 models

The default NotesXML catalog is constructed using Pareto-frontier curation: given the trade-off between inference speed (tokens per second) and benchmark quality (AvgScore), only models that are undominated on that frontier ship by default. If model A is both faster and higher quality than model B, model B is removed — no user would choose it.

The specialty catalogs here include every model that was evaluated but fell off the Pareto frontier. They are perfectly functional — they just aren't the optimal choice when faster or higher-quality alternatives exist at similar sizes. You may prefer them for architecture familiarity, licensing requirements, or specific use-cases not captured in the benchmarks.

Benchmark data and the full methodology are published in the AI Benchmarks article.

Ready to experiment?

Download NotesXML, import a specialty catalog, and run any of these models entirely on your own hardware.