vLLM

Track inference requests, batches, latency, tokens, and serving behavior.

Built by

vLLM

Overview

vLLM integration helps Matcha understand inference workload behavior, including request timing, batching, latency, and token flow. Matcha uses this context to estimate energy by request group, model, tenant, and serving workload.

Configuration Steps

Enable request and batch logging in vLLM.
Connect logs or traces to Matcha.
Map model names, tenants, and request IDs.
Sync inference windows with GPU telemetry.
Review energy by model, request, and batch.

Explore more apps in

Workload Traces

PyTorch

Attach training runs, steps, duration, model metadata, and experiment context.

PyTorch

Attach training runs, steps, duration, model metadata, and experiment context.

OpenTelemetry

Use traces, spans, and service events to connect workloads with infrastructure signals.

OpenTelemetry

Use traces, spans, and service events to connect workloads with infrastructure signals.

Hugging Face

Connect model, fine-tuning, and experiment metadata to energy usage.

Hugging Face

Connect model, fine-tuning, and experiment metadata to energy usage.

vLLM

Overview

Configuration Steps

Explore more apps in

Workload Traces

PyTorch

PyTorch

OpenTelemetry

OpenTelemetry

Hugging Face

Hugging Face

Matcha by Keeya Labs

Energy Observability for AI workloads

Matcha by Keeya Labs

Energy Observability for AI workloads

Matcha by Keeya Labs

Energy Observability for AI workloads