Integration

Connect Matcha to the infrastructure signals you already use.

Bring together GPU telemetry, workload traces, cluster metadata, and cost data without replacing your existing observability stack.

Integration

Connect Matcha to the infrastructure signals you already use.

Bring together GPU telemetry, workload traces, cluster metadata, and cost data without replacing your existing observability stack.

Integration

Connect Matcha to the infrastructure signals you already use.

Bring together GPU telemetry, workload traces, cluster metadata, and cost data without replacing your existing observability stack.

All

GPU Telemetry

Workload Traces

Cluster Metadata

Observability

Cost & Export

NVIDIA DCGM

Collect GPU power, utilization, memory, temperature, and health metrics.

NVIDIA DCGM

Collect GPU power, utilization, memory, temperature, and health metrics.

Kubernetes

Map pods, jobs, namespaces, nodes, and scheduling context to GPU energy.

Kubernetes

Map pods, jobs, namespaces, nodes, and scheduling context to GPU energy.

Slurm

Connect training jobs, allocations, users, and cluster scheduling metadata.

Slurm

Connect training jobs, allocations, users, and cluster scheduling metadata.

PyTorch

Attach training runs, steps, duration, model metadata, and experiment context.

PyTorch

Attach training runs, steps, duration, model metadata, and experiment context.

vLLM

Track inference requests, batches, latency, tokens, and serving behavior.

vLLM

Track inference requests, batches, latency, tokens, and serving behavior.

OpenTelemetry

Use traces, spans, and service events to connect workloads with infrastructure signals.

OpenTelemetry

Use traces, spans, and service events to connect workloads with infrastructure signals.

Prometheus

Scrape, store, and export attributed energy metrics.

Prometheus

Scrape, store, and export attributed energy metrics.

Grafana

Visualize energy, cost, utilization, and workload attribution dashboards.

Grafana

Visualize energy, cost, utilization, and workload attribution dashboards.

Datadog

Send energy insights into existing infrastructure monitoring and logs.

Datadog

Send energy insights into existing infrastructure monitoring and logs.

Hugging Face

Connect model, fine-tuning, and experiment metadata to energy usage.

Hugging Face

Connect model, fine-tuning, and experiment metadata to energy usage.

Amazon S3

Store telemetry, traces, and reports for downstream workflows.

Amazon S3

Store telemetry, traces, and reports for downstream workflows.

NVML

Access low-level NVIDIA GPU telemetry for per-device energy monitoring.

NVML

Access low-level NVIDIA GPU telemetry for per-device energy monitoring.

NVIDIA DCGM

Collect GPU power, utilization, memory, temperature, and health metrics.

Kubernetes

Map pods, jobs, namespaces, nodes, and scheduling context to GPU energy.

Slurm

Connect training jobs, allocations, users, and cluster scheduling metadata.

PyTorch

Attach training runs, steps, duration, model metadata, and experiment context.

vLLM

Track inference requests, batches, latency, tokens, and serving behavior.

OpenTelemetry

Use traces, spans, and service events to connect workloads with infrastructure signals.

Prometheus

Scrape, store, and export attributed energy metrics.

Grafana

Visualize energy, cost, utilization, and workload attribution dashboards.

Datadog

Send energy insights into existing infrastructure monitoring and logs.

Hugging Face

Connect model, fine-tuning, and experiment metadata to energy usage.

Amazon S3

Store telemetry, traces, and reports for downstream workflows.

NVML

Access low-level NVIDIA GPU telemetry for per-device energy monitoring.

Bring workload-level energy visibility to your AI infrastructure.

We’re working with early AI infrastructure teams, GPU operators, and enterprises running training or inference workloads.

Talk to us

Bring workload-level energy visibility to your AI infrastructure.

We’re working with early AI infrastructure teams, GPU operators, and enterprises running training or inference workloads.

Talk to us

Bring workload-level energy visibility to your AI infrastructure.

We’re working with early AI infrastructure teams, GPU operators, and enterprises running training or inference workloads.

Talk to us