Set up AI Evals & Logs with Maxim

Santiago Cardona Updated by Santiago Cardona

Turn.io integrates with Maxim to give you real-time monitoring and evaluation of your AI-powered journeys. Once connected, every AI interaction in your journeys is automatically traced and sent to Maxim, where you can review logs, run evaluations, and improve your AI agents over time.

What are AI Logs?

AI Logs capture every interaction between your users and the AI blocks in your Journeys. Each log records what the user said, how the AI responded, and the underlying details like which model was used and how long it took. This gives you full visibility into how your AI agents are performing in production.

In Maxim, logs are organized as:

  • Traces — Individual AI interactions (a single question and response).
  • Sessions — Full multi-turn conversations that group related traces together.

What are AI Evaluations?

Evaluations let you automatically assess the quality of your AI responses. Instead of manually reading through every conversation, you can set up evaluators that score responses for things like bias, clarity, relevance, tone, or safety.

Maxim supports automated evaluations that run continuously on your logs, so you can catch quality issues early and track improvements over time. You can learn more about setting up evaluations in Maxim's evaluation guide.

Prerequisites

Before you begin, you'll need:

  1. Maxim account. Sign up if you don't have one yet.
  2. Your Maxim API Key and Repository ID — both found in your Maxim dashboard.

Connect Maxim to Turn.io

Step 1: Get your Repository ID

It looks something like cmkys49u1027tnsmqkw22tjqg. You can find this in your Maxim dashboard under Logs:

Step 2: Get your API Key

You can find this in your Maxim dashboard under Settings:

Step 3: Open the Maxim integration

There are two ways to access the integration setup:

  • Navigate to Settings → AI in the sidebar, then click Evals.
  • Or, from the Journeys AI Evals & Logs in the sidebar, click the Set up button.

Enter both your Repository ID and your API key. It will automatically validate the data against Maxim's API.

Step 4: Save

Once both fields are validated, click Save. Your AI traces will now be exported to Maxim automatically.

Edit or remove the integration

To edit your API key, open the Maxim integration modal and click the Edit button next to the masked key. Enter your new key and save.

To remove the integration entirely, open the modal and click Delete. This will stop sending AI traces to Maxim.

What happens next?

Once the integration is active, every AI interaction in your journeys — text generation, classification, AI agent conversations — is automatically logged to your Maxim repository.

From there, you can:

  • Browse logs to see exactly what your AI agents are saying to users.
  • Set up evaluations to automatically score response quality using LLM-as-a-Judge or human review.
  • Run simulations to test and iterate on your prompts before deploying changes.

For more on getting the most out of Maxim, visit the Maxim documentation.

Set up Evaluations

Once your logs are flowing into Maxim, you can set up auto evaluations — automated quality checks that continuously score your AI responses as they come in. This means you don't have to manually review every conversation to spot issues.

You can add and configure different types of evaluators in the Evaluators section. Choose the ones you consider relevant for your use case.

Types of evaluators

Maxim offers three categories of evaluators:

  • AI evaluators (LLM-as-a-Judge) use a language model to assess your AI's responses. These are the most flexible and are great for nuanced quality checks:
    • Faithfulness — Is the response grounded in the provided context? Catches hallucinations.
    • Output Relevance — Does the response actually answer what the user asked?
    • Toxicity — Does the response contain harmful, offensive, or inappropriate content?
    • Clarity — Is the response easy to understand?
    • Conciseness — Is the response appropriately brief without losing important information?
    • Task Success — Did the AI accomplish the intended goal?
    • PII Detection — Does the response inadvertently expose personal information?
    • Bias — Does the response show unfair bias?
  • Programmatic evaluators use rule-based logic for deterministic checks:
    • Format validators (valid JSON, valid URL, valid email, etc.)
    • Pattern matching and content analysis (word counts, special characters)
  • Statistical evaluators compare outputs against expected results using metrics like cosine similarity, BLEU, and ROUGE scores. These are useful if you have reference answers to compare against.
How evaluations work

Evaluations run automatically on your logs based on rules you define. Each evaluator scores a specific quality dimension of your AI responses. You can combine multiple evaluators to get a comprehensive quality picture.

Maxim supports evaluations at different levels of granularity:

  • Trace-level — Evaluate individual AI responses. Best for checking if a single answer was accurate, relevant, or safe. This is the most common starting point.
  • Session-level — Evaluate entire multi-turn conversations. Useful for assessing overall conversation quality, coherence, and whether the user's goal was met.
Step-by-step: Set up auto evaluation on logs
  1. Navigate to the Logs section in Maxim and open your Repository.
  2. Click Manage evaluation in the top right corner.
  3. Click Add configuration and choose the evaluation level — Trace is the best starting point for most use cases.
  4.  Select the evaluators you want to use.
  5. Map your variables — connect evaluator inputs to your log data. For example, map trace.output to the evaluator's "response" input so it knows what to score. For session-level evaluations, use trace[*].output to reference outputs across the full conversation.
  6. (Optional) Add filter rules to narrow which logs get evaluated. You can filter by model type, error status, tags, latency, token usage, and more. Combine multiple conditions with AND/OR logic.
  7. (Optional) Set a sampling rate to control costs. For example, evaluate 20% of traces instead of all of them — useful for high-traffic journeys.
  8. Click Save. New logs will be evaluated automatically as they arrive.

Reviewing evaluation results

Open any trace in your Maxim logs and click the Evaluation tab. You'll see scores from each evaluator, along with explanations for why the score was given (for AI evaluators).

Tips for effective evaluations

Start small, then expand. Begin with 2–3 evaluators that address your biggest concerns — typically Faithfulness, Output Relevance, and Toxicity. Add more once you're comfortable reading the results.

Use sampling for high-volume journeys. If your journey handles thousands of conversations daily, evaluating every single trace gets expensive. A 10–20% sample rate still gives you strong statistical confidence while keeping costs manageable.

Combine AI and programmatic evaluators. AI evaluators are great for subjective quality, but programmatic evaluators give you deterministic guarantees. For example, if your AI agent returns JSON, add a isValidJSON evaluator alongside your quality checks.

Filter out noise. Use filter rules to skip evaluating error traces or traces from test users. This keeps your quality metrics clean and focused on real user interactions.

Set up alerts. Once your evaluations are running, configure alerts in Maxim to notify you when quality scores drop below a threshold. This turns evaluations from a passive dashboard into an active monitoring system.

For the full reference on evaluation configuration, see Maxim's auto evaluation guide.

Was this article helpful?

Data Handling and Privacy

Contact