Tracking AI queries: how to quickly find a failure

◷ 14 min read 3/4/2026

ai-assisted development observation logic tracing production

Next step

Open the bot or continue inside this section.

$ cd section/ $ open @mmorecil_bot

Article -> plan in AI

Paste this article URL into any AI and get an implementation plan for your project.

How to use

Copy this prompt and send it to your AI chat.
Attach your project or open the repository folder in the AI tool.
Ask for file-level changes, risks, and a quick verification checklist.

Introduction

Simple words

When an AI function breaks down in a sale, only the result is usually visible: the user received an error, the bot hovered, the task did not reach the end. But it's not clear where exactly everything broke: in the API, in the queue, in the model, in the database, or in your code.

This article is for a beginner who has already launched the first AI script and now wants to stop fixing malfunctions blindly. At the output, you will have a working minimum template: what to log, how to add trace_id, how to link the steps of a single query and how to quickly find the root of the problem.

How to do it in practice

Just remember the main principle: one user request = one track identifier (trace_id) at all steps. If you don’t, you can’t say for sure where the failure occurred.

Mini difficulty ladder:

We write structured logs in JSON.
Working minimum: add trace_id and throw it through all calls.
Strengthening: connect OpenTelemetry and watch tracks in UI.

What to do right now:

Select one critical scenario (e.g., “create an assistant response”).
Check if there is a single trace_id in each step.
If not, make it to your next task as a requirement.

In short: the essence in 5 points

Simple words

To put it to a minimum, the AI scenario’s observability is built not around “beautiful dashboards,” but around answering the question “why did this request break down or become slow.”.

How to do it in practice

Logs without structure are almost useless at the time of the incident.
Without trace_id, you cannot link the steps of a single request.
First, cover only 3 points: input, model call, record the result.
Set up 3 alerts: error growth, delay growth, timeout growth.
Regularly check 5-10 real tracks after release.

Mini difficulty ladder:

Base: readable logs with single fields.
Working minimum: tracing and basic allerts.
Strengthening: sampling, deshbords according to the versions of prompts, cost control.

What to do right now:

Add the log field template to the ticket (trace_id, span, status, duration_ms).
Assign someone responsible for the quality of the trace.

Dictionary of terms

Simple words

Below is a short dictionary to avoid confusion in terms.

How to do it in practice

Observability (observability): the ability to understand the state of the system from logs, metrics and tracks.
Trace: A complete chain of steps from entry to result.
Span (Span): One separate step inside the track, such as calling an LLM or recording in a database.
Trace ID: A unique track number that links all the steps of a single request.
Structured logs (structured logs): JSON logs with the same fields.
Latency (latency): How long did the step or the entire request take.
Sampling: selecting only part of the tracks to reduce load and storage costs.
Alert (alert): automatic notification when the metric has passed the threshold.
SLO (target service level): a pre-agreed quality goal, e.g. "95% of requests under 3 seconds".

What to do right now:

Make sure the entire team understands the terms trace and span in the same way.
Add a dictionary to the README service.

Base and context: why AI scripts are hard to debug

Simple words

In a regular web service, a request often takes 1-2 steps. In the AI scenario, there are more steps: getting context, calling the model, calling the tool, post-processing, recording the result. There may be a delay or error at any site.

The problem with a beginner is that there are logs, but they are scattered. One service writes to a file, another in a stdout, the third is completely silent. This makes even a simple mistake look like a detective.

How to do it in practice

Divide the query path into blocks and assign each a span name:

request.in – HTTP/webhook request input;
context.load - reading data from the database / cache;
llm.call – query to the model;
tool.call is an external API or internal tool;
response.save – recording the result;
request.out: Return of response.

Mini difficulty ladder:

Base: Write logs on each block.
Operating minimum: each block has trace_id and duration_ms.
Strengthening: add model_name, prompt_version, retry_count, token_usage.

What to do right now:

Draw the current query path in 5-7 steps.
For each step, write down where context is lost.
Identify the 3 steps to be covered first.

Practical part by steps

Step 1. Enter a single log format

Simple words

If every developer writes logs as they want, you won’t be able to quickly search and filter events. A single format removes chaos.

How to do it in practice

Minimum fields of JSON log:

timestampXX
levelXX
serviceXX
trace_idXX
spanXX
messageXX
statusXX
duration_msXX

Example: Node.js + pino

bash

npm i pino pino-http

import pino from "pino";

const logger = pino({ level: process.env.LOG_LEVEL || "info" });

export function logStep(data) {
  logger.info({
    timestamp: new Date().toISOString(),
    service: "ai-gateway",
    ...data,
  });
}

Mini difficulty ladder:

Base: Unified JSON format.
Working minimum: mandatory fields are validated in the code.
Strengthening: The log scheme is checked in CI.

What to do right now:

Set the required log fields in docs/logging.md.
Check that trace_id is not empty.

Step 2. Put trace id through the entire script

Simple words

trace_id should be created once at the input and go on to each call. Then you can always get the full picture.

How to do it in practice

At the HTTP input, take traceparent from the header or generate a new trace_id.
Transfer it to functions, background tasks, and external APIs.
Return trace_id in error so that the sappor quickly finds the track.

Example: Express middleware

import { randomUUID } from "node:crypto";

export function traceMiddleware(req, res, next) {
  const incoming = req.headers["x-trace-id"];
  const traceId = typeof incoming === "string" && incoming ? incoming : randomUUID();
  req.traceId = traceId;
  res.setHeader("x-trace-id", traceId);
  next();
}

Mini difficulty ladder:

Base: trace_id in the HTTP layer.
Working minimum: trace_id in queues and workmen.
Support for W3C traceparent.

What to do right now:

Check that trace_id reaches the background tasks.
Add x-trace-id to API responses.

Step 3. Add LLM call tracing and tools

Simple words

Usually, the biggest delays and errors are in calling the model and external tools. These steps should have separate spas.

How to do it in practice

What to write in llm.call:

provider (who answers: OpenAI, Anthropic, etc.);
model (model);
prompt_version (the template version);
duration_ms;
status (ok, timeout, error);
token_usage (if available).

What to write in tool.call:

tool_name;
http_status;
retry_count;
duration_ms;
a brief cause of error without sensitive data.

Important: Do not log secrets, tokens and personal data. If necessary, do a mask.

Mini difficulty ladder:

Base: separate spans for llm.call and tool.call.
Operating minimum: fix the duration and status of each spa.
Reinforcement: Add prompt and agent versions.

What to do right now:

Add 2 Spans: llm.call and tool.call.
Make sure there are no tokens and passwords in the logs.

Step 4. Connect OpenTelemetry and view trails

Simple words

OpenTelemetry is an open standard and set of libraries for collecting telemetry: logs, metrics and tracks. It is needed so as not to collect everything manually and not to be attached to one vendor.

How to do it in practice

Basic launch in Node.js:

bash

npm i @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node

import { NodeSDK } from "@opentelemetry/sdk-node";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";

const sdk = new NodeSDK({
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Next, send the tracks to a convenient backend (such as Tempo, Jaeger, or Sentry) and make sure the entire query chain is displayed.

Mini difficulty ladder:

Base: Auto-collection of base spans.
Working minimum: manual spans on critical steps.
Strengthening: custom attributes (release version, prompt, client).

What to do right now:

Raise locally one backend for trails.
Make sure that the entire request is visible.

Step 5. Set up allergies that are really useful

Simple words

Without adhesives, you will learn about the problem from users. But too many alerts are also bad: the team stops responding. We need a minimum of work.

How to do it in practice

Starting set of allerts:

Errors: status=error is above the threshold in 5 minutes.
Delay: p95 response time is above the threshold.
External provider timeouts: timeout growth in llm.call.

Example of starting thresholds:

error rate > 3% in 5 minutes;
p95 > 6 seconds in 10 minutes;
lLM timeout > 2% in 10 minutes.

Mini difficulty ladder:

Base: 1 alert for errors.
Working minimum: 3 alerts (errors, delay, timeouts).
Strengthening: individual thresholds by model and type of request.

What to do right now:

Configure at least one focus on error growth.
Check who gets the notification and how.

Real use cases

Simple words

Below are three typical cases where tracing saves hours of manual parsing.

How to do it in practice

The “500” error only affects some users. On the track, one crm.lookup instrument falls with a rare input field format.
A sharp increase in delay. Spans show: not LLM, and slow recording in the database after the response model.
Floating take of messages. trace_id shows the re-delivery of the task from the queue and the lack of verification of the operation key.

Mini difficulty ladder:

Base: Looking for a problem on one track.
Minimum operation: group similar tracks by mistake.
Strengthening: build a deshboard by incident classes.

What to do right now:

Disassemble the latest sales failure through one track.
Find out which field was missing for quick analysis.

Tools and technologies

Simple words

You don't have to say "all at once." To start, one logger, one track collector and one interface for viewing are enough.

How to do it in practice

Beginner's work kit:

pino or winston for structured logs in Node.js.
OpenTelemetry SDK for Span Collection.
Jaeger/Tempo/Sentry to view the tracks.
PostgreSQL or ClickHouse for storing aggregated logs.

How to choose simply:

If you need a quick start at the locale: Jaeger.
If you already have Grafana: Tempo.
If you need a product with ready-made allerts and issue workflow: Sentry.

Mini difficulty ladder:

Base: One track visualization tool.
Working minimum: logs + tracks + 3 alerts.
Strengthening: a single deshboard for the quality of AI scripts.

What to do right now:

Select one backend track and fix it as a team standard.
Don’t change the stack until you’ve gone through 2-3 actual incidents.

Comparative table of approaches

Simple words

The table helps to choose the approach according to the maturity of the team, not the fashion.

How to do it in practice

Подход	Что видно	Что не видно	Когда подходит
Только текстовые логи	Отдельные ошибки и сообщения	Полный путь запроса	Первый день, очень маленький проект
Логи + trace_id	Связь шагов одного запроса	Подробная визуализация задержек	Рабочий минимум для большинства команд
OpenTelemetry + backend трасс	Полный путь, узкие места, проблемные спаны	Бизнес-контекст без ваших полей	Прод и регулярные релизы
Полный observability-стек (логи + трассы + метрики + алерты)	Состояние системы в реальном времени	Ничего критичного, если верно настроен	Команда с постоянной нагрузкой и SLA

What to do right now:

Honestly note where you are in the table.
The next step is to go to the “Logs + trace id” level.

Implementation checklist

Simple words

The checklist is needed so as not to miss basic things and not get stuck in theory.

How to do it in practice

There is a single JSON log format.
trace_id is created or received at the input.
trace_id goes through the API, queue and workman.
llm.call is written status and duration_ms.
tool.call is written http_status and retry_count.
No secrets leaked in the logs.
Adjustments for errors and delays.
The team is able to find the cause of the incident on one track.

Mini difficulty ladder:

Base: first 4 points.
Working minimum: the first 7 points.
Strengthening: All points + regular track review after release.

What to do right now:

Go through the checklist on one service.
Mark 2 gaps and set a correction time.

Typical errors and how to fix them

Simple words

Mistakes are repeated by almost all teams. The good news is that they can be closed with simple rules.

How to do it in practice

Error: Logic only "error occurred" without context. Fix: Add trace_id, span, duration_ms, status.
Error: trace_id is in the API, but gets lost in line. Correction: Send trace_id as a mandatory message field.
Error: logging all prompt and personal data. Fix: mask sensitive fields, store only the desired metadata.
Mistake: Allerts are too noisy. Fix: Start with 3 alerts and adjust the thresholds to the actual load.
Mistake: After the release, no one watches the tracks. Fix: Enter a short post-release review for 10 minutes.

Mini difficulty ladder:

Remove the loss of trace_id.
Minimum operating time: Reduce the noise of allerts.
Strengthening: Regular review of trace quality.

What to do right now:

Take the last crash and check which fields were missing.
Add these fields to the log standard.

FAQ

Simple words

Brief answers to questions that beginners usually have.

How to do it in practice

**1. Do I need a full observability stack? ** Nope. Start with JSON logs and trace_id, then connect the tracks for critical steps.

**2. Which is more important: tracks or metrics? ** To find the cause of a particular failure is more useful than the track. Metrics are important to control overall stability. Do both at the start, but with minimal coverage.

**3. Is it possible to live only in logs? ** It is possible on a very small project, but with an increase in the load and number of services, this quickly ceases to work.

**4. Do I need to log a full prompt? ** Usually not. It is better to store prompt_version, size and key metadata. Keep the full text only under strict security rules.

**5. How do you know if the implementation has succeeded? ** If the team finds the root of a typical incident in minutes rather than hours, you're on the right track.

What to do right now:

Collect your 3 internal FAQs for the project.
Add answers to internal documentation.

Outcome and next practical step

Simple words

Tracing AI queries is not about “beautiful analytics”, but about the speed of service recovery and quiet operation. The most important result is that you stop guessing and start to see the exact cause of the failure.

How to do it in practice

Your next step for today:

In one critical scenario, implement trace_id from input to output.
Add llm.call and tool.call.
Set one focus on increasing errors.

If this is done, you already have a working minimum that provides real benefits in the sale.

Tracking AI queries: how to quickly find a failure

## Introduction

### Simple words

### How to do it in practice

## In short: the essence in 5 points

### Simple words

### How to do it in practice

## Dictionary of terms

### Simple words

### How to do it in practice

## Base and context: why AI scripts are hard to debug

### Simple words

### How to do it in practice

## Practical part by steps

### Step 1. Enter a single log format

Simple words

How to do it in practice

### Step 2. Put trace id through the entire script

Simple words

How to do it in practice

### Step 3. Add LLM call tracing and tools

Simple words

How to do it in practice

### Step 4. Connect OpenTelemetry and view trails

Simple words

How to do it in practice

### Step 5. Set up allergies that are really useful

Simple words

How to do it in practice

## Real use cases

### Simple words

### How to do it in practice

## Tools and technologies

### Simple words

### How to do it in practice

## Comparative table of approaches

### Simple words

### How to do it in practice

## Implementation checklist

### Simple words

### How to do it in practice

## Typical errors and how to fix them

### Simple words

### How to do it in practice

## FAQ

### Simple words

### How to do it in practice

## Outcome and next practical step

### Simple words

### How to do it in practice

## What to read next

Redis to speed up projects: cache, queues and rate limiting

PostgreSQL for beginner: tables, indexes, migrations

Introduction

Simple words

How to do it in practice

In short: the essence in 5 points

Simple words

How to do it in practice

Dictionary of terms

Simple words

How to do it in practice

Base and context: why AI scripts are hard to debug

Simple words

How to do it in practice

Practical part by steps

Step 1. Enter a single log format

Step 2. Put trace id through the entire script

Step 3. Add LLM call tracing and tools

Step 4. Connect OpenTelemetry and view trails

Step 5. Set up allergies that are really useful

Real use cases

Simple words

How to do it in practice

Tools and technologies

Simple words

How to do it in practice

Comparative table of approaches

Simple words

How to do it in practice

Implementation checklist

Simple words

How to do it in practice

Typical errors and how to fix them

Simple words

How to do it in practice

FAQ

Simple words

How to do it in practice

Outcome and next practical step

Simple words

How to do it in practice

What to read next