NotesXML ships with 11 AI models you can download and run entirely on your device. No API keys, no subscriptions, no data leaving your machine. Here's how they compare to the cloud giants.
Every model listed below runs inside NotesXML via llama.cpp — the same inference engine used by researchers worldwide. Cloud models are shown for reference only.
| Model | Parameters | RAM (peak) | Context | Vision | Tier | Educational Equivalent |
|---|---|---|---|---|---|---|
| GPT-5.4 Pro Cloud | Proprietary | API | 1M+ | Yes | Cloud | PhD / Researcher |
| Claude Opus 4.6 Cloud | Proprietary | API | 1M | Yes | Cloud | PhD / Researcher |
| Llama 3.2 1B Instruct Local Free Tier | 1.2B | 1.3 GB | 128K | — | Ultra-Light | Basic / Elementary |
| SmolLM2 1.7B Instruct Local | 1.7B | 1.8 GB | 8K | — | Lightweight | High School |
| Granite 4.0 H-1B (Hybrid) Local | 1B | 1.5 GB | 128K | — | Ultra-Light | High School |
| Llama 3.2 3B Instruct Local | 3.2B | 3.2 GB | 128K | — | Lightweight | High School |
| Granite 3.0 2B Instruct Local | 2.6B | 2.6 GB | 4K | — | Lightweight | High School |
| Gemma 4 E2B Local | 2.3B effective | 4.5 GB | 128K | Yes | Lightweight | High School |
| Ministral 3 3B Local Free Tier | 3B | 2.6 GB | 256K | Yes | Lightweight | High School |
| Gemma 4 E4B Local Electron Default | 4.5B effective | 6.4 GB | 128K | Yes | Enhanced | Undergraduate |
| Ministral 3 8B Local | 8B | 7.0 GB | 256K | Yes | Enhanced | Undergraduate |
| Gemma 4 12B Local | 12B | 10.0 GB | 256K | Yes | Enhanced | Undergraduate |
| Gemma 4 26B-A4B (MoE) Desktop only | 26B (4B active, MoE) | 22.0 GB | 256K | Yes | Desktop Pro | Post-Graduate |
RAM (peak) = peak memory used by the model during inference (per NX-AI-MODEL-015 RAM safety floor). This is the model’s own memory consumption, not the total device RAM required — your operating system, background apps, and the NotesXML application itself also consume RAM. As a guideline, add 4–6 GB to the peak RAM figure for Android devices or 5–8 GB for desktops to estimate the total device RAM needed. Context = native context window. MoE = Mixture of Experts (active parameters per token shown in parentheses). Vision-capable models accept images, PDFs, and screenshots as input. Free Tier models are available without a Professional license. Catalog source: notesxml-model-catalog.json v2026.05.27.02.
AI models run entirely on your device and produce results based on statistical patterns. Output quality varies by model and task. Always review AI-generated content before relying on it.
Specialty family catalogs are available for Gemma, Llama, Mistral, Phi, Granite, and GPT-OSS — 54 additional models across 6 families.
The catalog is organized into four tiers based on RAM requirements and capability. Pick the tier that matches your hardware.
2 models · ~1.3–1.5 GB model runtime RAM · 6–8 GB total device RAM · Free Tier (Llama 3.2 1B)
Llama 3.2 1B Instruct (Android Free-tier default) and Granite 4.0 H-1B (Hybrid). Both fit comfortably on phones and low-memory devices. Llama 3.2 1B is the fastest model in the catalog and serves as the Android Free-tier default, ideal for auto-titling notes, simple text cleanup, and basic transcription polish. Granite 4.0 H-1B is an Apache 2.0 hybrid Mamba-2 + transformer model that delivers stronger long-context and tool-calling performance than Llama 3.2 1B at a comparable footprint. Both models offer a 128K-token context window.
5 models · 1.8–4.5 GB model runtime RAM · 8–14 GB total device RAM recommended
SmolLM2 1.7B, Llama 3.2 3B, Granite 3.0 2B Instruct, Gemma 4 E2B (vision), Ministral 3 3B (Free Tier & vision). Ideal for quick tasks: formatting, basic summarization, general conversation. Ministral 3 3B and Gemma 4 E2B both offer image analysis at the Lightweight footprint; Ministral 3 3B ships as the Free-tier upper model.
2 models · 4.5–7.0 GB model runtime RAM · 16 GB total system RAM recommended · Professional
Gemma 4 E4B (Electron default, vision), Ministral 3 8B (vision), and Gemma 4 12B (vision). Gemma 4 E4B delivers expert-level tool calling and image analysis. Ministral 3 8B leads the catalog on long-context retention. Gemma 4 12B is the highest-quality model that still runs cross-platform — second only to the desktop-only 26B MoE on overall quality, with 97% tool calling and 96% PhD-level reasoning. All three require Professional tier.
1 model · ~22 GB model runtime RAM · 32 GB total system RAM or 16+ GB VRAM · Desktop & Professional
Gemma 4 26B-A4B (MoE) — 26B total parameters with 4B active per token via Mixture-of-Experts, vision-capable, 256K-token native context. The highest-quality model in the catalog: top scores on HEE, PLE, and PhD Philosophy, plus the strongest SDB-100 deductive-reasoning result. Best with a discrete GPU with 16+ GB VRAM (e.g., RTX 4080, RTX 5070 Ti, or higher); CPU-only inference is possible but significantly slower. Apache 2.0.
Cloud AI means sending your notes, documents, and ideas to someone else's server. With NotesXML, the AI runs on your device. Your data never leaves your machine — not even for processing.
Cloud AI services typically charge ongoing subscription fees — sometimes hundreds of dollars per month for top-tier models, plus per-token fees for usage. NotesXML Professional is $39.99 lifetime — and includes access to all 11 models with no per-token charges, ever.
No internet? No problem. Once you've downloaded a model, it works everywhere — airplanes, rural areas, secure facilities. Cloud models require a constant internet connection.