Ollama Model Library: Every Available Model

Updated May 2026
The Ollama model library hosts over 4,500 model variants from dozens of model families, ranging from 1 billion to 405 billion parameters. Every model is pre-quantized and ready for immediate local inference. This guide covers every major model family, their strengths, available sizes, and the best use cases for each.

How the Library Is Organized

The Ollama library at ollama.com/library organizes models by family name, with each family offering multiple size variants and quantization options. When you run ollama pull llama4, Ollama downloads the default variant, typically the Q4_K_M quantization of the most popular size. You can specify exact variants using tags like ollama pull qwen3:14b-q8_0 to get a specific size and quantization combination.

Models are stored locally in a layered format similar to Docker images. Shared base layers between models of the same family reduce total disk usage. The ollama list command shows all installed models with their sizes and modification dates. The ollama show command displays detailed metadata for a specific model, including its Modelfile parameters, template format, and license information.

New models are added to the library regularly as open source model releases continue at a rapid pace. The Ollama team curates the library, adding new models within days of their public release and providing pre-quantized variants at multiple quality levels for each supported model.

Meta Llama Family

Meta's Llama models are the most widely used open source language models and form the foundation of the Ollama ecosystem. The family includes several generations, each improving on its predecessor. Llama 3.2 introduced smaller variants at 1B and 3B parameters, making local inference accessible on minimal hardware. Llama 3.2 Vision added multimodal capability with an 11B model that processes both text and images.

Llama 4, released in April 2026, represents a major architectural shift with its mixture-of-experts design. Scout (17B active, 109B total) and Maverick (17B active, 400B total) use MoE to deliver quality far beyond what their active parameter count would suggest. Scout is the most popular model on Ollama, offering excellent all-around performance at a reasonable 10GB VRAM requirement.

All Llama models are released under the Llama Community License, which permits commercial use with some restrictions for very large deployments. For most users and organizations, the license places no practical limitations on how you use the models.

Alibaba Qwen Family

Qwen3 from Alibaba Cloud has become the fastest-growing model family on Ollama, driven by exceptional coding performance and strong multilingual support. The family spans sizes from 0.6B to 235B parameters, with the MoE variant at 30B active parameters out of 235B total offering the best quality per VRAM ratio for demanding tasks.

Qwen3 models support a thinking mode that reveals the model's internal reasoning process, similar to chain-of-thought prompting but built into the model architecture. This makes them particularly effective for complex reasoning, mathematical problem solving, and code generation where seeing the model's approach helps evaluate output quality.

The Qwen2.5 generation remains available and competitive, with Qwen2.5-Coder providing strong code-specific performance at sizes from 1.5B to 32B. For multilingual applications, Qwen models support Chinese, English, Japanese, Korean, and many other languages with native-level quality, reflecting Alibaba's diverse training data.

DeepSeek Family

DeepSeek models are known primarily for their reasoning capabilities. DeepSeek-R1 is a family of reasoning models that produce explicit chain-of-thought outputs, working through problems step by step before providing final answers. Available in sizes from 1.5B to 671B parameters, with the 7B, 14B, and 32B variants being the most commonly used on consumer hardware.

DeepSeek-V3 is the company's latest general-purpose model, harmonizing computational efficiency with strong reasoning and agent performance. Its MoE architecture provides frontier-level quality while keeping active parameter counts manageable for local deployment.

DeepSeek models are released under permissive open source licenses and have gained a strong reputation in the AI research community for their transparent training methodology and competitive benchmark performance. The R1 models in particular have become a standard recommendation for anyone needing strong reasoning capability from a local model.

Google Gemma Family

Google's Gemma models bring Google's research capabilities to the open source community. Gemma 4 at 9B parameters supports vision and text input with native tool calling, making it the most versatile small model available. It handles image analysis, text generation, and function calling in a single model, which is particularly valuable for building multimodal agent systems.

Earlier Gemma versions remain available, with Gemma 2 at 9B and 27B offering strong text-only performance. CodeGemma provides code-specific variants with fill-in-the-middle capability, making it useful for IDE integration and code completion scenarios. RecurrentGemma offers an alternative architecture based on recurrent neural networks rather than pure transformers, providing unique efficiency characteristics for certain deployment scenarios.

Mistral AI Family

Mistral AI from France produces some of the most efficient models available. Mistral 7B delivers remarkably strong performance for its size, consistently outperforming other 7B models on quality benchmarks. Mixtral 8x7B uses a mixture-of-experts design with 47B total parameters but only 13B active, providing quality well above its active parameter count at reasonable memory costs.

Mistral Large and Mistral Medium fill the higher-end slots in the Mistral lineup. Codestral is Mistral's code-focused variant, providing strong coding performance with particular emphasis on speed and accuracy for code completion tasks. Mistral models are popular among developers who need the best possible quality from smaller models that fit on modest hardware.

Microsoft Phi Family

Microsoft's Phi models are designed to maximize quality at small parameter counts. Phi-4 at 14B parameters delivers quality that competes with models twice its size on many benchmarks, thanks to Microsoft's focus on high-quality training data and efficient architecture design. Phi-4 Mini at 3.8B is one of the best options for extremely constrained hardware, running comfortably on machines with as little as 4GB of available memory.

The Phi family is particularly popular for edge deployment, mobile applications, and scenarios where you need the best possible quality from minimal hardware. Its small size also makes it one of the fastest models for time-sensitive applications where generation speed matters more than peak quality.

Specialized Model Categories

Beyond the major families, the Ollama library includes specialized models for specific tasks. Embedding models like nomic-embed-text and mxbai-embed-large generate vector representations for semantic search and RAG pipelines. These models are small, fast, and essential for any application that involves document retrieval or similarity matching.

Vision and multimodal models support image input alongside text. In addition to Gemma 4 and Llama 3.2 Vision, models like LLaVA provide image understanding at various parameter scales. These models can describe images, answer questions about visual content, and extract information from screenshots, diagrams, and photographs.

Medical, legal, and domain-specific fine-tuned models appear in the library as well, though their availability varies. These models have been further trained on specialized datasets to improve performance in specific professional domains, trading general breadth for domain depth.

Key Takeaway

The Ollama library covers every major open source model family with pre-quantized, ready-to-run variants. Browse at ollama.com/library, use tags to select specific size and quantization combinations, and run ollama list to manage your locally installed models.