AgentInfra Radar
Free directory

LLM observability and evaluation tools

A public directory of observability, tracing, monitoring, and evaluation tools for LLM applications and agentic systems.

Public project examples
Langfuse
Open-source LLM observability
Public-source checked
Why it matters

Combines traces, prompt management, evals, and debugging for LLM apps, making it a core benchmark in AI observability research.

Helicone
LLM observability and gateway
Public-source checked
Why it matters

Pairs request logging and observability with gateway functions, useful for teams watching the overlap between monitoring and model access.

Arize Phoenix
Open-source AI observability
Public-source checked
Why it matters

Provides tracing, evals, datasets, and experiments for LLM applications, making it relevant for teams moving from prototype to production.

OpenLLMetry
OpenTelemetry for LLM apps
Public-source checked
Why it matters

Extends OpenTelemetry patterns to LLM applications, useful for teams that want AI traces to connect with existing observability stacks.

Braintrust
AI evals and observability
Public-source checked
Why it matters

Focuses on tracing production behavior, running evals, and catching regressions, which is central to quality-focused AI teams.

Comet Opik
Open-source AI observability
Public-source checked
Why it matters

Covers tracing, evaluation, prompt and agent optimization, making it useful for LLM apps, RAG systems, and agentic workflows.

W&B Weave
LLM observability and evals
Public-source checked
Why it matters

Brings W&B-style tracking, debugging, and evaluation into generative AI workflows, useful for teams already close to MLOps.

Literal AI
LLM observability and monitoring
Needs verification for current status
Why it matters

Appears in LLM observability integrations and production-grade LLM monitoring workflows, but should be checked against current product status.

Research boundary

Public-source research, not a private contact database

This directory is based on public websites, documentation, GitHub repositories, and public product pages. It is meant for research and market mapping, not automated outreach.

Manual verification

Manually verify before outreach

Categories and URLs can change. Before using any project for sales, partnership, investment, or security research, review the current public source and confirm the contact angle is accurate.

Next step

Want the lead version of this directory?

The free directory is only a starting point. AgentInfra Radar can turn a category into a reviewed lead pack with possibleNeed, leadScore, contactAngle, source, and verificationStatus.