All Resources
How to use Turn.io

Evals & Logs

Evals & Logs

Updated 2 weeks ago by Santiago Cardona

Turn.io integrates with Maxim to give you real-time monitoring and evaluation of your AI-powered journeys. Once connected, every AI interaction in your journeys is automatically traced and sent to Maxim, where you can review logs, run evaluations, and improve your AI agents over time.

What are AI Logs?

AI Logs capture every interaction between your users and the AI blocks in your Journeys. Each log records what the user said, how the AI responded, and the underlying details like which model was used and how long it took. This gives you full visibility into how your AI agents are performing in production.

In Maxim, logs are organized as:

Traces — Individual AI interactions (a single question and response).
Sessions — Full multi-turn conversations that group related traces together.

What are AI Evaluations?

Evaluations let you automatically assess the quality of your AI responses. Instead of manually reading through every conversation, you can set up evaluators that score responses for things like bias, clarity, relevance, tone, or safety.

Maxim supports automated evaluations that run continuously on your logs, so you can catch quality issues early and track improvements over time. You can learn more about setting up evaluations in Maxim's evaluation guide.

Prerequisites

Before you begin, you'll need:

A Maxim account. Sign up if you don't have one yet.
Your Maxim API Key and Repository ID — both found in your Maxim dashboard.

Connect Maxim to Turn.io

Step 1: Sign up to Maxim

First, you need to sign up to Maxim. Just hit the button below to sign up:

Step 2: Create a repository

Repositories are where Logs live. On the main navigation, click on Logs:

And then Create a new Repository. The default settings are fine.

Step 3: Get your Repository ID

On the repository you just created, find the (...) button and click on Copy ID. It looks something like this: cmkys49u1027tnsmqkw22tjqg.

Copy & paste this somewhere, you'll need this later in this process.

Step 4: Get your API Key

You can find this in your Maxim dashboard under Settings.

Copy and paste this somewhere, you'll need this on the next step.

Step 5: On Turn.io, Integrate with Maxim

Head over to Settings → AI in the sidebar, then click Evals on the top.

Now, paste the Repository ID and API key you copied earlier.

Step 6: Save

Once we checked if everything is OK, click Save.

And you're done! 🎉 From now on, Turn.io will automatically send AI logs to Maxim. If you're already using AI in Journeys and people are using your journeys, you should start seeing them appear in Maxim in a few minutes.

Edit or remove the integration

To edit your API key, open the Maxim integration modal and click the Edit button next to the masked key. Enter your new key and save.

To remove the integration entirely, open the modal and click Delete. This will stop sending AI traces to Maxim.

What happens next?

Once the integration is active, every AI interaction in your journeys — text generation, classification, AI agent conversations — is automatically logged to your Maxim repository.

From there, you can:

Browse logs to see exactly what your AI agents are saying to users.
Set up evaluations to automatically score response quality using LLM-as-a-Judge or human review.
Run simulations to test and iterate on your prompts before deploying changes.

For more on getting the most out of Maxim, visit the Maxim documentation.

Browse Logs

Once your Maxim integration is active, every AI interaction from your journeys is automatically exported as a log entry. Here's how to find, navigate, and make sense of your logs in Maxim.

Where to find your logs

Open your Repository in the Maxim dashboard.
Click the Logs tab. You'll see a table of all ingested AI interactions.

Each row in the table shows a summary of one trace:

Column	What it shows
Timestamp	When the AI interaction happened
Name	Trace's name, set to your journey block (e.g., "AI Agent" or "Text Generation").
Input	The user's message or prompt
Output	The AI's response
Model	Which AI model was used (e.g. `gpt-4o`, `claude-sonnet-4-5-20250514`)
Tokens	Total tokens consumed (input + output)
Cost	Estimated cost in USD for the interaction
Latency	How long the AI took to respond
Tags	Metadata from Turn.io (journey name, block name, etc.)

Understand the log hierarchy

Turn.io organizes AI logs using a hierarchy that maps to how conversations flow through your journeys:

Session — A full conversation between a user and your AI agent. All interactions within the same journey session share a session ID.
Trace — A single AI "run" within that session. For example, if your AI agent block processes one user message, that's one trace. A multi-turn conversation produces multiple traces grouped under the same session.
Generation — The actual LLM call details: the messages sent to the model, the model's response, token counts, and parameters.
Span — Individual steps within a trace, such as individual LLM calls or tool executions. These are the building blocks of each trace.

Drill down to the generation details

The most useful information — the exact prompts and responses — is a few clicks deep. Here's how to get there:

In the Logs tab, click on any row to open the trace detail panel.
You'll see a breakdown of all spans within the trace. Look for the root span, which is named after your journey block (e.g. "AI Agent" or "Text Generation").
Click on the root span to expand it. You'll see child spans representing individual events like LLM calls.
In the span detail view, you can see the generation data: the full input messages (system prompt, conversation history, user message), the model's output, token usage, and latency.

This is where you can verify exactly what your AI agent received and how it responded — invaluable for debugging unexpected behavior.

Sessions: following multi-turn conversations

In WhatsApp, users rarely interact with your AI agent just once — they ask a question, get a response, follow up, clarify, and continue the conversation. Sessions in Maxim capture these multi-turn interactions as a single coherent unit, making it easy to review the full arc of a conversation rather than piecing together isolated traces.

How Turn.io maps conversations to sessions

Every AI interaction in a Turn.io journey runs within a journey session. Turn.io automatically exports this session ID to Maxim as session_id, which means:

All traces from the same journey run are grouped under the same session.
A user's back-and-forth with an AI agent block — every user message and every AI response — ends up as separate traces within one session.
When the user starts a new journey, a new session begins.

This mapping mirrors how conversations actually happen: one session = one continuous interaction between a user and your AI agent.

Why sessions matter

Individual traces only tell you about one AI response in isolation. Sessions give you the full conversational context, which is essential when:

Debugging drift — An AI agent gradually loses track of what the user wants. You can only see this by looking at the full session, not a single trace.
Evaluating coherence — Did the AI stay on topic and remember earlier user inputs? Session-level evaluations answer this.
Measuring task success — Did the user actually accomplish their goal by the end of the conversation? That's a session-level question.
Reviewing user experience — When a user reports a problem, you want to read the entire conversation, not just the moment things went wrong.

Finding and reviewing sessions

To follow a specific conversation:

In the Logs tab, click on the Sessions button.
All traces of the same journey session will be grouped together in chronological order. Click on any session you want to explore.
Then, click through each trace to see how the conversation evolved — user input, AI response, user input, AI response — along with full generation details at every step.

Filter logs to find what you need

With high-traffic journeys, you'll want to narrow down your logs. Maxim supports filtering using the tags that Turn.io attaches to every trace:

Tag	What it contains	Use it to...
`journey_name`	The name of the journey that triggered the AI interaction	Filter all AI activity for a specific journey
`journey_uuid`	The unique ID of the journey. You can get this inside your Journey:	Precisely identify a journey (useful if names change)
`journey_block_name`	The name of the specific AI block within the journey	Compare performance across different AI blocks
`session_id`	The conversation session ID	Follow an entire user conversation across multiple traces
`context_type`	The type of AI evaluation context	Filter by production journeys or simulations

To filter your logs:

In the Logs tab, click the filter controls above the table.
Select the tag you want to filter by (e.g. journey_name).
Enter the value to match (e.g. "Customer Support Bot").
You can combine multiple filters with AND/OR logic to narrow down further — for example, filter by journey_name AND a specific date range.

Common scenarios

"I want to compare how two AI blocks perform" Filter by journey_block_name to isolate logs for each block. Compare latency, token usage, and costs across them.

"I want to see all AI activity for a specific journey" Filter by journey_name or journey_uuid. This shows every AI interaction triggered by that journey, across all users and sessions.

"I want to monitor costs" Sort the Logs table by the Cost column to identify expensive traces. Check which models and journeys are consuming the most tokens.

Tips for working with logs

Use the Overview tab for trends. Before diving into individual logs, check the Overview tab in your repository. It shows aggregate metrics like total traces, latency graphs, and error rates over time — useful for spotting patterns before drilling into specifics.

Set up alerts for errors. In the Alerts tab, configure notifications (Slack, PagerDuty, or Opsgenie) for when error rates spike or latency exceeds a threshold. This way, you don't have to manually check logs every day.

Keep tags clean. The tags Turn.io sends are automatic — you don't need to configure them. But be aware that having descriptive journey and block names in Turn.io makes filtering in Maxim much easier. A block named "AI Agent" is harder to find than "Claims Processing Agent".

Set up Evaluations

Once your logs are flowing into Maxim, you can set up auto evaluations — automated quality checks that continuously score your AI responses as they come in. This means you don't have to manually review every conversation to spot issues.

You can add and configure different types of evaluators in the Evaluators section. Choose the ones you consider relevant for your use case.

Types of evaluators

Maxim offers three categories of evaluators:

AI evaluators (LLM-as-a-Judge) use a language model to assess your AI's responses. These are the most flexible and are great for nuanced quality checks:
- Faithfulness — Is the response grounded in the provided context? Catches hallucinations.
- Output Relevance — Does the response actually answer what the user asked?
- Toxicity — Does the response contain harmful, offensive, or inappropriate content?
- Clarity — Is the response easy to understand?
- Conciseness — Is the response appropriately brief without losing important information?
- Task Success — Did the AI accomplish the intended goal?
- PII Detection — Does the response inadvertently expose personal information?
- Bias — Does the response show unfair bias?
Programmatic evaluators use rule-based logic for deterministic checks:
- Format validators (valid JSON, valid URL, valid email, etc.)
- Pattern matching and content analysis (word counts, special characters)
Statistical evaluators compare outputs against expected results using metrics like cosine similarity, BLEU, and ROUGE scores. These are useful if you have reference answers to compare against.

How evaluations work

Evaluations run automatically on your logs based on rules you define. Each evaluator scores a specific quality dimension of your AI responses. You can combine multiple evaluators to get a comprehensive quality picture.

Maxim supports evaluations at different levels of granularity:

Trace-level — Evaluate individual AI responses. Best for checking if a single answer was accurate, relevant, or safe. This is the most common starting point.
Session-level — Evaluate entire multi-turn conversations. Useful for assessing overall conversation quality, coherence, and whether the user's goal was met.

Step-by-step: Set up auto evaluation on logs

Navigate to the Logs section in Maxim and open your Repository.
Click Manage evaluation in the top right corner.
Click Add configuration and choose the evaluation level — Trace is the best starting point for most use cases.
Select the evaluators you want to use.
Map your variables — connect evaluator inputs to your log data. For example, map trace.output to the evaluator's "response" input so it knows what to score. For session-level evaluations, use trace[*].output to reference outputs across the full conversation.
(Optional) Add filter rules to narrow which logs get evaluated. You can filter by model type, error status, tags, latency, token usage, and more. Combine multiple conditions with AND/OR logic.
(Optional) Set a sampling rate to control costs. For example, evaluate 20% of traces instead of all of them — useful for high-traffic journeys.
Click Save. New logs will be evaluated automatically as they arrive.

Reviewing evaluation results

Open any trace in your Maxim logs and click the Evaluation tab. You'll see scores from each evaluator, along with explanations for why the score was given (for AI evaluators).

Tips for effective evaluations

Start small, then expand. Begin with 2–3 evaluators that address your biggest concerns — typically Faithfulness, Output Relevance, and Toxicity. Add more once you're comfortable reading the results.

Use sampling for high-volume journeys. If your journey handles thousands of conversations daily, evaluating every single trace gets expensive. A 10–20% sample rate still gives you strong statistical confidence while keeping costs manageable.

Combine AI and programmatic evaluators. AI evaluators are great for subjective quality, but programmatic evaluators give you deterministic guarantees. For example, if your AI agent returns JSON, add a isValidJSON evaluator alongside your quality checks.

Filter out noise. Use filter rules to skip evaluating error traces or traces from test users. This keeps your quality metrics clean and focused on real user interactions.

Set up alerts. Once your evaluations are running, configure alerts in Maxim to notify you when quality scores drop below a threshold. This turns evaluations from a passive dashboard into an active monitoring system.

For the full reference on evaluation configuration, see Maxim's auto evaluation guide.

Simulating conversations

While you're developing or iterating a prompt, before you ship your changes to your users, it's super helpful to simulate conversations using AI — and then using Evals to check if your changes had a positive (or negative) impact.

To do that, Maxim also provides a simulation feature. You can see how to set it up here: Simulate & Test AI Journeys.

Evals & Logs

What are AI Logs?

What are AI Evaluations?

Prerequisites

Connect Maxim to Turn.io

Step 1: Sign up to Maxim

Step 2: Create a repository

Step 3: Get your Repository ID

Step 4: Get your API Key

Step 5: On Turn.io, Integrate with Maxim

Step 6: Save

Edit or remove the integration

What happens next?

Browse Logs

Where to find your logs

Understand the log hierarchy

Drill down to the generation details

Sessions: following multi-turn conversations

How Turn.io maps conversations to sessions

Why sessions matter

Finding and reviewing sessions

Filter logs to find what you need

Common scenarios

Tips for working with logs

Set up Evaluations

Types of evaluators

How evaluations work

Step-by-step: Set up auto evaluation on logs

Reviewing evaluation results

Tips for effective evaluations

Simulating conversations

Was this article helpful?

Start from a Playbook

Simulate & Test AI Journeys

Related Articles

3 Apr 2026: AI Evals, Logs and Simulations with Maxim

Simulate & Test AI Journeys

Related Articles

20 Sept 2021: The end of Chatbase

9 August 2021: Thread data export

21 June 2021: Sample content for templates

17 June 2021: Important security improvements

8 June 2021: Community Playbooks

18 March 2021: Improved billing transparency

17 March 2021: Share the amazing work you do

8 March 2021: Contextual replies and profile settings that suit you

22 Feb 2021: Button automation

18 Feb 2021: Updates to the fallback channel

17 Feb 2021: Contacts and labels data added to BQ

10 Feb 2021: Temporary limitations while we do a Search infrastructure upgrade

5 Feb 2021: Snappier and easier-to-use modals

3 Feb 2021: Add media to templates

27 Jan 2021: Add media or stickers to custom replies

21 Dec 2020: A last big release to celebrate the year

2 Dec 2020: Delete a template, get more insights and send stickers via the API

17 Nov 2020: Delete a user message

19 October 2020: Hidden numbers by default

8 September 2020: Contact Profile

3 August 2020: Add a list to your Exact match automations

30 July 2020: Manage your message template spend

15 July 2020: Data storage and rejection reasons

10 July 2020: WhatsApp profile and Login

21 Nov 2022: A new look user interface!

18 Nov 2022: Build with feedback

13 Oct 2022: Reminders are in Beta!

10 Oct 2022: New Turn.io Developer Docs

19 Sept 2022: More fields available in the Data Export API

15 Sept 2022: Shorter messages are better

What are AI Logs?

What are AI Evaluations?

Prerequisites

Connect Maxim to Turn.io

Step 1: Sign up to Maxim

Step 2: Create a repository

Step 3: Get your Repository ID

Step 4: Get your API Key

Step 5: On Turn.io, Integrate with Maxim

Step 6: Save

Edit or remove the integration

What happens next?

Browse Logs

Where to find your logs

Understand the log hierarchy

Drill down to the generation details

Sessions: following multi-turn conversations

How Turn.io maps conversations to sessions

Why sessions matter

Finding and reviewing sessions

Filter logs to find what you need

Common scenarios

Tips for working with logs

Set up Evaluations

Types of evaluators

How evaluations work

Step-by-step: Set up auto evaluation on logs

Reviewing evaluation results

Tips for effective evaluations

Simulating conversations

Was this article helpful?

Start from a Playbook

Simulate & Test AI Journeys

Related Articles

3 Apr 2026: AI Evals, Logs and Simulations with Maxim

Simulate & Test AI Journeys

Contact