Integration
Connect Matcha to the infrastructure signals you already use.
Bring together GPU telemetry, workload traces, cluster metadata, and cost data without replacing your existing observability stack.
Integration
Connect Matcha to the infrastructure signals you already use.
Bring together GPU telemetry, workload traces, cluster metadata, and cost data without replacing your existing observability stack.
Integration
Connect Matcha to the infrastructure signals you already use.
Bring together GPU telemetry, workload traces, cluster metadata, and cost data without replacing your existing observability stack.
All
GPU Telemetry
Workload Traces
Cluster Metadata
Observability
Cost & Export

NVIDIA DCGM
Collect GPU power, utilization, memory, temperature, and health metrics.

NVIDIA DCGM
Collect GPU power, utilization, memory, temperature, and health metrics.
Kubernetes
Map pods, jobs, namespaces, nodes, and scheduling context to GPU energy.
Kubernetes
Map pods, jobs, namespaces, nodes, and scheduling context to GPU energy.

Slurm
Connect training jobs, allocations, users, and cluster scheduling metadata.

Slurm
Connect training jobs, allocations, users, and cluster scheduling metadata.

PyTorch
Attach training runs, steps, duration, model metadata, and experiment context.

PyTorch
Attach training runs, steps, duration, model metadata, and experiment context.

vLLM
Track inference requests, batches, latency, tokens, and serving behavior.

vLLM
Track inference requests, batches, latency, tokens, and serving behavior.

OpenTelemetry
Use traces, spans, and service events to connect workloads with infrastructure signals.

OpenTelemetry
Use traces, spans, and service events to connect workloads with infrastructure signals.

Prometheus
Scrape, store, and export attributed energy metrics.

Prometheus
Scrape, store, and export attributed energy metrics.

Grafana
Visualize energy, cost, utilization, and workload attribution dashboards.

Grafana
Visualize energy, cost, utilization, and workload attribution dashboards.

Datadog
Send energy insights into existing infrastructure monitoring and logs.

Datadog
Send energy insights into existing infrastructure monitoring and logs.

Hugging Face
Connect model, fine-tuning, and experiment metadata to energy usage.

Hugging Face
Connect model, fine-tuning, and experiment metadata to energy usage.

Amazon S3
Store telemetry, traces, and reports for downstream workflows.

Amazon S3
Store telemetry, traces, and reports for downstream workflows.

NVML
Access low-level NVIDIA GPU telemetry for per-device energy monitoring.

NVML
Access low-level NVIDIA GPU telemetry for per-device energy monitoring.

NVIDIA DCGM
Collect GPU power, utilization, memory, temperature, and health metrics.
Kubernetes
Map pods, jobs, namespaces, nodes, and scheduling context to GPU energy.

Slurm
Connect training jobs, allocations, users, and cluster scheduling metadata.

PyTorch
Attach training runs, steps, duration, model metadata, and experiment context.

vLLM
Track inference requests, batches, latency, tokens, and serving behavior.

OpenTelemetry
Use traces, spans, and service events to connect workloads with infrastructure signals.

Prometheus
Scrape, store, and export attributed energy metrics.

Grafana
Visualize energy, cost, utilization, and workload attribution dashboards.

Datadog
Send energy insights into existing infrastructure monitoring and logs.

Hugging Face
Connect model, fine-tuning, and experiment metadata to energy usage.

Amazon S3
Store telemetry, traces, and reports for downstream workflows.

NVML
Access low-level NVIDIA GPU telemetry for per-device energy monitoring.
Bring workload-level energy visibility to your AI infrastructure.
We’re working with early AI infrastructure teams, GPU operators, and enterprises running training or inference workloads.

Bring workload-level energy visibility to your AI infrastructure.
We’re working with early AI infrastructure teams, GPU operators, and enterprises running training or inference workloads.

Bring workload-level energy visibility to your AI infrastructure.
We’re working with early AI infrastructure teams, GPU operators, and enterprises running training or inference workloads.
