Logo

vLLM

Track inference requests, batches, latency, tokens, and serving behavior.

Built by

Resource

Overview

vLLM integration helps Matcha understand inference workload behavior, including request timing, batching, latency, and token flow. Matcha uses this context to estimate energy by request group, model, tenant, and serving workload.

Configuration Steps

  1. Enable request and batch logging in vLLM.

  2. Connect logs or traces to Matcha.

  3. Map model names, tenants, and request IDs.

  4. Sync inference windows with GPU telemetry.

  5. Review energy by model, request, and batch.