AI Model Comparison

Catalog

Local models vs. cloud services

Every model listed below runs inside NotesXML via llama.cpp — the same inference engine used by researchers worldwide. Cloud models are shown for reference only.

PhD / Researcher Post-Graduate University High School Basic

Model	Parameters	RAM (peak)	Context	Vision	Tier	Educational Equivalent
GPT-5.4 Pro Cloud	Proprietary	API	1M+	Yes	Cloud	PhD / Researcher
Claude Opus 4.6 Cloud	Proprietary	API	1M	Yes	Cloud	PhD / Researcher
Llama 3.2 1B Instruct Local Free Tier	1.2B	1.3 GB	128K	—	Ultra-Light	Basic / Elementary
Ministral 3 3B Local Free Tier Electron Default	3B	3.0 GB	256K	Yes	Lightweight	High School
Llama 3.2 3B Instruct Local	3.2B	3.2 GB	128K	—	Lightweight	High School
Phi-4 Mini 3.8B Local	3.8B	3.9 GB	128K	—	Lightweight	High School
Gemma 4 E2B Local	2.3B effective	4.5 GB	128K	Yes	Standard	High School
Gemma 4 E4B Local	4.5B effective	6.4 GB	128K	Yes	Enhanced	Undergraduate
Ministral 3 8B Local	8B	7.0 GB	256K	Yes	Enhanced	Undergraduate
Ministral 3 14B Desktop only Intel GPU pick	14B	11.0 GB	256K	Yes	Advanced	Undergraduate
Gemma 4 12B Local	12B	10.0 GB	256K	Yes	Enhanced	Undergraduate
Gemma 4 26B-A4B (MoE) Desktop only	26B (4B active, MoE)	22.0 GB	256K	Yes	Desktop Pro	Post-Graduate

RAM (peak) = peak memory used by the model during inference (per NX-AI-MODEL-015 RAM safety floor). This is the model’s own memory consumption, not the total device RAM required — your operating system, background apps, and the NotesXML application itself also consume RAM. As a guideline, add 4–6 GB to the peak RAM figure for Android devices or 5–8 GB for desktops to estimate the total device RAM needed. Context = native context window. MoE = Mixture of Experts (active parameters per token shown in parentheses). Vision-capable models accept images, PDFs, and screenshots as input. Free Tier models are available without a Professional license. Catalog source: notesxml-model-catalog.json v2026.07.08.03.

AI models run entirely on your device and produce results based on statistical patterns. Output quality varies by model and task. Always review AI-generated content before relying on it.

Want more models?

Specialty family catalogs are available for Gemma, Llama, Mistral, Phi, Granite, and GPT-OSS — 54 additional models across 6 families.

Browse Specialty Catalogs →

Tiers

Six tiers, one app

The catalog is organized into six tiers based on RAM requirements and capability. Pick the tier that matches your hardware.

Ultra-Light

1 model · ~1.3 GB model runtime RAM · 6–8 GB total device RAM · Free Tier (Llama 3.2 1B)

Llama 3.2 1B Instruct (Android Free-tier default). The fastest model in the catalog and the Android Free-tier default — ideal for auto-titling notes, simple text cleanup, and basic transcription polish. Fits comfortably on phones and low-memory devices, with a 128K-token context window.

Lightweight

3 models · 3.0–3.9 GB model runtime RAM · 8–14 GB total device RAM recommended

Ministral 3 3B (Free Tier, vision, Electron default), Llama 3.2 3B, and Phi-4 Mini 3.8B. Ideal for quick tasks: formatting, basic summarization, general conversation. Ministral 3 3B brings image analysis into the Free tier and is the Electron default, with a 256K context. Phi-4 Mini offers strong instruction-following, an MIT license, and a 128K context.

Standard

1 model · ~4.5 GB model runtime RAM · 8–14 GB total device RAM recommended

Gemma 4 E2B (vision). A compact, fast, vision-capable model (~2.3B effective parameters) that sits between the Lightweight trio and the Enhanced tier. Strong knowledge scores for its size, with image analysis at a modest footprint. Cross-platform, 128K context, Apache 2.0.

Enhanced

3 models · 6.4–10.0 GB model runtime RAM · 16 GB total system RAM recommended · Professional

Gemma 4 E4B (vision, tool calling), Ministral 3 8B (vision), and Gemma 4 12B (vision). Gemma 4 E4B delivers expert-level tool calling and image analysis. Ministral 3 8B is the catalog’s speed/quality leader for AI Chat and structured actions. Gemma 4 12B is the highest-quality model that still runs cross-platform, second only to the desktop-only 26B MoE. All three require Professional tier.

Advanced

1 model · ~11 GB model runtime RAM · 24 GB total system RAM recommended · Desktop & Professional

Ministral 3 14B (vision, Intel GPU pick). The large-model choice for Intel iGPU/GPU systems — it runs on the Intel GPU, where the Gemma 4 models fall back to CPU-only. 14B dense, vision-capable, 256K-token context, Apache 2.0. Desktop only (Windows and Linux); does not appear in the Android model picker.

Desktop Pro

1 model · ~22 GB model runtime RAM · 32 GB total system RAM or 16+ GB VRAM · Desktop & Professional

Gemma 4 26B-A4B (MoE) — 26B total parameters with 4B active per token via Mixture-of-Experts, vision-capable, 256K-token native context. The highest-quality model in the catalog: top scores on HEE, PLE, and PhD Philosophy, plus the strongest SDB-100 deductive-reasoning result. Best with a discrete GPU with 16+ GB VRAM (e.g., RTX 4080, RTX 5070 Ti, or higher); CPU-only inference is possible but significantly slower. Apache 2.0.

The Privacy Advantage

Why local AI matters

Your data stays yours

Cloud AI means sending your notes, documents, and ideas to someone else's server. With NotesXML, the AI runs on your device. Your data never leaves your machine — not even for processing.

No subscriptions

Cloud AI services typically charge ongoing subscription fees — sometimes hundreds of dollars per month for top-tier models, plus per-token fees for usage. NotesXML Professional is $39.99 lifetime — and includes access to all 10 models with no per-token charges, ever.

Works offline

No internet? No problem. Once you've downloaded a model, it works everywhere — airplanes, rural areas, secure facilities. Cloud models require a constant internet connection.

Download NotesXML Free View Pricing

AI that runs on your hardware