Live Model Testing

This guide verifies WebTest AI's model router, session workflows, discovery, and guarded maintenance against real model providers.

Unit tests use fake transports so they are deterministic and safe. Live model testing is a separate smoke layer for provider compatibility and model behavior.

Provider Options

Ollama
Local model runner. WebTest AI talks to it through the ollama adapter at http://127.0.0.1:11434. This is the best first path for open-source and local-first testing.
Moonshot API
Official hosted Kimi API from Moonshot AI. Use this when you want full hosted Kimi performance without local hardware. This should be tested through an OpenAI-compatible or future dedicated Moonshot profile, depending on endpoint support.
OpenRouter
Hosted gateway for many model families. Use this for quick cross-model comparison with one API key. WebTest AI should use the openrouter-compatible adapter profile.

Start with these Chinese/open-source reasoning and agentic models:

ModelProvider pathWhy test it
qwen3:8bOllamaStrong open Chinese reasoning and agentic baseline.
deepseek-r1:8bOllamaReasoning-focused Chinese open model.
kimi-k2:1t-cloudOllama cloud modelAgentic/coding-oriented Kimi K2 compatibility check.
kimi-k2-thinking:cloudOllama cloud modelThinking/agentic Kimi path; most relevant for discovery/maintenance sessions.
glm4OllamaChinese multilingual/general reasoning compatibility check.
yi:9bOllamaChinese/English bilingual model compatibility check.

Useful model references:

Ollama Setup

Install and start Ollama, then pull the models you want to test:

ollama serve
ollama pull qwen3:8b
ollama pull deepseek-r1:8b
ollama pull kimi-k2:1t-cloud
ollama pull kimi-k2-thinking:cloud
ollama pull glm4
ollama pull yi:9b

Large/cloud-tagged models may require network access, an Ollama account, or provider-side availability. If a pull fails, skip that row and record the provider error.

Local Config Template

For the recommended first local smoke test, use the checked-in Qwen3 config at examples/config/ollama-qwen3.config.json.

The ollama adapter disables model thinking for structured JSON calls when a profile declares "reasoning": true. This keeps Qwen3-style reasoning traces out of machine-readable responses while still documenting that the model is reasoning-capable.

To compare other models, copy examples/config/ollama-qwen3.config.json to a temporary local file and change only the profile model value.

Generic temporary config shape:

{
  "models": {
    "activeProfile": "live-model",
    "profiles": {
      "live-model": {
        "provider": "ollama",
        "model": "qwen3:8b",
        "endpoint": "http://127.0.0.1:11434",
        "apiKeyEnv": null,
        "capabilities": {
          "structuredJson": true,
          "reasoning": true,
          "toolCalling": false,
          "streaming": false,
          "vision": false
        },
        "limits": {
          "timeoutMs": 240000,
          "retries": 0,
          "maxInputBytes": 120000,
          "maxOutputTokens": 1024,
          "maxSessionTurns": 3
        }
      }
    },
    "writePolicy": {
      "roots": ["specs", "artifacts", ".webtest-ai"],
      "extensions": [".md", ".json", ".js"]
    }
  }
}

For each model in the matrix, change only models.profiles.live-model.model.

Smoke 1: Router JSON Completion

This verifies the adapter can call the model and parse structured JSON.

node - <<'NODE'
const { complete } = require("./src/models/router");
const config = require("./examples/config/ollama-qwen3.config.json");

complete({
  config,
  purpose: "live.router_smoke",
  messages: [
    { role: "system", content: "Return strict JSON only." },
    { role: "user", content: "Return {\"ok\":true,\"steps\":[\"Open \\\"/\\\"\"]}." }
  ],
  modelCalls: []
}).then((result) => {
  console.log(JSON.stringify({
    success: result.success,
    status: result.status,
    provider: result.provider,
    model: result.model,
    output: result.output,
    warnings: result.warnings,
    error: result.error
  }, null, 2));
}).catch((error) => {
  console.error(error);
  process.exitCode = 1;
});
NODE

Pass criteria:

Common failures:

Smoke 2: Discovery Workflow

This verifies the session layer and webtest-ai discover command.

Start the demo site in another terminal:

npm run demo:site

Then run:

node ./src/cli/index.js discover \
  --url http://127.0.0.1:4010 \
  --config ./examples/config/ollama-qwen3.config.json \
  --dry-run

Pass criteria:

no-op means the integration worked but the model did not propose a useful flow.

Smoke 3: Guarded Maintenance

This verifies that model-proposed writes stay policy-gated. Use a safe target under artifacts first:

node - <<'NODE'
const path = require("path");
const { runMaintenanceWorkflow } = require("./src/autonomy/maintenance");
const config = require("./examples/config/ollama-qwen3.config.json");

runMaintenanceWorkflow({
  config,
  baseDir: process.cwd(),
  targetPaths: ["artifacts/live-maintenance/proposal.md"],
  apply: false,
  context: {
    instruction: "Propose a tiny markdown test note only. Do not include secrets."
  }
}).then((result) => {
  console.log(JSON.stringify({
    status: result.status,
    success: result.success,
    proposedWrites: result.proposedWrites.map((write) => ({
      path: write.path,
      allowed: write.allowed,
      reason: write.reason
    })),
    blockedWrites: result.blockedWrites,
    warnings: result.warnings
  }, null, 2));
}).catch((error) => {
  console.error(error);
  process.exitCode = 1;
});
NODE

Pass criteria:

Do not use apply: true until proposal behavior is reviewed.

Recording Results

Track each model with:

Example:

ModelRouterDiscoveryMaintenanceNotes
qwen3:8bokproposal, 2 flowsproposal, 1 allowed writeGood JSON after one retry.
deepseek-r1:8bemptynot runnot runReturned reasoning text before JSON.

Safety Notes