Guides

Model Management

Pull, serve, route, and govern local or hosted models.

Prompt pack
M-07
Source material
  • maple/docs/core/llama-first.md

Model Management

MAPLE treats models like governed runtime dependencies. The user-facing mental model is intentionally close to Ollama, but the operating model is broader: local models, hosted APIs, routing policy, and cost-aware enforcement all sit behind the same control surface.

Core commands

maple model pull llama3.2:8b-q4
maple model ls
maple model run llama3.2:8b-q4
maple model serve --openai-compatible
maple model inspect llama3.2:8b-q4
maple model rm llama3.2:8b-q4

Local and cloud backends

  • Local: Ollama, llama.cpp, private inference services
  • Hosted: OpenAI, Anthropic, Gemini, Grok, or other provider adapters

The point of the abstraction is policy control. Agent packages can ask for a capability class or a preferred model profile while the platform decides which backend is allowed in the active environment.

MapleModelfile

FROM llama3.2:8b-q4
PARAMETER temperature 0.2
PARAMETER top_p 0.9
SYSTEM "You are the governed support planner for MAPLE."

Use this when you need a reusable, versioned local model profile instead of one-off CLI arguments.

Routing policy

routes:
  - match:
      dataClass: regulated
    use: ollama:llama3.2:8b-q4

  - match:
      taskClass: deep-reasoning
    use: anthropic:claude-sonnet

MAPLE routing is where model neutrality becomes operational rather than aspirational.

Governance controls

  • Backend allowlists by tenant
  • Jurisdiction-aware routing
  • Cost ceilings per workload
  • Approval paths for model swaps
  • Benchmark gating before promotion

Benchmarking and improvement

maple model benchmark llama3.2:8b-q4 --suite support-routing

Foundry can then use the benchmark output to decide whether a student model is ready to take traffic or should remain in shadow mode.