100% Local11 Models1B–26B ParametersZero Cloud

AI that runs on your hardware

NotesXML ships with 11 AI models you can download and run entirely on your device. No API keys, no subscriptions, no data leaving your machine. Here's how they compare to the cloud giants.

Local models vs. cloud services

Every model listed below runs inside NotesXML via llama.cpp — the same inference engine used by researchers worldwide. Cloud models are shown for reference only.

PhD / Researcher Post-Graduate University High School Basic
Model Parameters RAM (peak) Context Vision Tier Educational Equivalent
GPT-5.4 Pro Cloud Proprietary API 1M+ Yes Cloud PhD / Researcher
Claude Opus 4.6 Cloud Proprietary API 1M Yes Cloud PhD / Researcher
Llama 3.2 1B Instruct Local Free Tier 1.2B 1.3 GB 128K Ultra-Light Basic / Elementary
SmolLM2 1.7B Instruct Local 1.7B 1.8 GB 8K Lightweight High School
Granite 4.0 H-1B (Hybrid) Local 1B 1.5 GB 128K Ultra-Light High School
Llama 3.2 3B Instruct Local 3.2B 3.2 GB 128K Lightweight High School
Granite 3.0 2B Instruct Local 2.6B 2.6 GB 4K Lightweight High School
Gemma 4 E2B Local 2.3B effective 4.5 GB 128K Yes Lightweight High School
Ministral 3 3B Local Free Tier 3B 2.6 GB 256K Yes Lightweight High School
Gemma 4 E4B Local Electron Default 4.5B effective 6.4 GB 128K Yes Enhanced Undergraduate
Ministral 3 8B Local 8B 7.0 GB 256K Yes Enhanced Undergraduate
Gemma 4 12B Local 12B 10.0 GB 256K Yes Enhanced Undergraduate
Gemma 4 26B-A4B (MoE) Desktop only 26B (4B active, MoE) 22.0 GB 256K Yes Desktop Pro Post-Graduate

RAM (peak) = peak memory used by the model during inference (per NX-AI-MODEL-015 RAM safety floor). This is the model’s own memory consumption, not the total device RAM required — your operating system, background apps, and the NotesXML application itself also consume RAM. As a guideline, add 4–6 GB to the peak RAM figure for Android devices or 5–8 GB for desktops to estimate the total device RAM needed. Context = native context window. MoE = Mixture of Experts (active parameters per token shown in parentheses). Vision-capable models accept images, PDFs, and screenshots as input. Free Tier models are available without a Professional license. Catalog source: notesxml-model-catalog.json v2026.05.27.02.

AI models run entirely on your device and produce results based on statistical patterns. Output quality varies by model and task. Always review AI-generated content before relying on it.

Want more models?

Specialty family catalogs are available for Gemma, Llama, Mistral, Phi, Granite, and GPT-OSS — 54 additional models across 6 families.

Browse Specialty Catalogs →

Four tiers, one app

The catalog is organized into four tiers based on RAM requirements and capability. Pick the tier that matches your hardware.

Ultra-Light

2 models · ~1.3–1.5 GB model runtime RAM · 6–8 GB total device RAM · Free Tier (Llama 3.2 1B)

Llama 3.2 1B Instruct (Android Free-tier default) and Granite 4.0 H-1B (Hybrid). Both fit comfortably on phones and low-memory devices. Llama 3.2 1B is the fastest model in the catalog and serves as the Android Free-tier default, ideal for auto-titling notes, simple text cleanup, and basic transcription polish. Granite 4.0 H-1B is an Apache 2.0 hybrid Mamba-2 + transformer model that delivers stronger long-context and tool-calling performance than Llama 3.2 1B at a comparable footprint. Both models offer a 128K-token context window.

Lightweight

5 models · 1.8–4.5 GB model runtime RAM · 8–14 GB total device RAM recommended

SmolLM2 1.7B, Llama 3.2 3B, Granite 3.0 2B Instruct, Gemma 4 E2B (vision), Ministral 3 3B (Free Tier & vision). Ideal for quick tasks: formatting, basic summarization, general conversation. Ministral 3 3B and Gemma 4 E2B both offer image analysis at the Lightweight footprint; Ministral 3 3B ships as the Free-tier upper model.

Enhanced

2 models · 4.5–7.0 GB model runtime RAM · 16 GB total system RAM recommended · Professional

Gemma 4 E4B (Electron default, vision), Ministral 3 8B (vision), and Gemma 4 12B (vision). Gemma 4 E4B delivers expert-level tool calling and image analysis. Ministral 3 8B leads the catalog on long-context retention. Gemma 4 12B is the highest-quality model that still runs cross-platform — second only to the desktop-only 26B MoE on overall quality, with 97% tool calling and 96% PhD-level reasoning. All three require Professional tier.

Desktop Pro

1 model · ~22 GB model runtime RAM · 32 GB total system RAM or 16+ GB VRAM · Desktop & Professional

Gemma 4 26B-A4B (MoE) — 26B total parameters with 4B active per token via Mixture-of-Experts, vision-capable, 256K-token native context. The highest-quality model in the catalog: top scores on HEE, PLE, and PhD Philosophy, plus the strongest SDB-100 deductive-reasoning result. Best with a discrete GPU with 16+ GB VRAM (e.g., RTX 4080, RTX 5070 Ti, or higher); CPU-only inference is possible but significantly slower. Apache 2.0.

Why local AI matters

Your data stays yours

Cloud AI means sending your notes, documents, and ideas to someone else's server. With NotesXML, the AI runs on your device. Your data never leaves your machine — not even for processing.

No subscriptions

Cloud AI services typically charge ongoing subscription fees — sometimes hundreds of dollars per month for top-tier models, plus per-token fees for usage. NotesXML Professional is $39.99 lifetime — and includes access to all 11 models with no per-token charges, ever.

Works offline

No internet? No problem. Once you've downloaded a model, it works everywhere — airplanes, rural areas, secure facilities. Cloud models require a constant internet connection.

Download NotesXML Free View Pricing