---
title: "Building an AI-Powered Debugging Assistant in 72 Hours"
date: "2026-04-15T07:15:00"
url: "https://www.godaddy.com/resources/news/building-an-ai-powered-debugging-assistant-in-72-hours"
---
# Building an AI-Powered Debugging Assistant in 72 Hours

## How on-call debugging actually works (and why it hurts)

If you've ever covered on-call for a platform that handles payment processing, you know the drill. An alert fires. Terminals at a bunch of retail locations can't download over-the-air (OTA) firmware updates. You open Kibana to start digging, and immediately hit the first wall: which field name applies — serialNumber or serial_number? Which index should you search? What time range makes sense here?

At GoDaddy, we run a smart terminal platform that processes payments for many retail businesses. When something goes wrong, the investigation often looks like this: spend a few minutes remembering Kibana Query Language (KQL) syntax, another few minutes guessing at field names across 80+ log fields, then manually scan through log entries trying to spot patterns. Once you've figured out the root cause, you still have to switch over to your IDE, write a fix, create a branch, and open a PR.

For engineers who've worked on the team a while, this whole loop takes 15 to 30 minutes per incident. For someone newer who hasn't memorized the field names and common patterns, it can easily take an hour.

## The hackathon idea

We had a 72-hour internal hackathon coming up, and this debugging workflow kept coming up in conversations. LLMs handle natural language and code generation well. What if we could skip the query-writing step entirely and just ask a question in plain English? And what if the tool could also suggest a code fix and open a PR?

**logiq** grew out of that idea. The pitch: type a question, get AI-analyzed results, generate a pull request.

## What logiq does

**logiq** ties together three systems:

- **Elasticsearch/Kibana** for production log access (read-only)
- **Claude Haiku** for query generation and log analysis
- **GitHub** for automatic PR creation

Elasticsearch calls use credentials tied to a read-only role and search-only API usage, so logiq can query indices but cannot write to data or change cluster settings.

The following sections describe how a typical session plays out.

### 1. Ask a question in plain English

No KQL syntax. No field names. Just ask a question.

```
$ logiq --es
&gt; How many OTA failures occurred in the last 24 hours?
```

### 2. logiq generates the KQL query

Behind the scenes, the tool translates that into:

It maps "OTA" to specific API endpoints, traces the relevant code paths, and highlights which fields matter.

### 3. The AI analyzes and categorizes the results

Instead of handing you dozens of raw log entries to scroll through, Claude groups them by severity and explains the failure pattern:

**CRITICAL (1 error)** /submit_ota

> Context cancellation during database operations indicates timeouts or interrupted requests. With 100% failure rate across all submission attempts in a 4-second window, this prevents any new OTA packages from being added to the system, blocking the entire firmware release pipeline.

**MEDIUM (5 errors)** /submit_ota, /configure_ota, /get_otas

> Multiple admin token authentication failures across OTA management endpoints. Could be an expired token, compromised credential, or integration issue with admin tooling.

The model doesn't just cluster the errors. It considers API criticality, downstream impact, error frequency, and whether failures look one-off or systematic.

### 4. Click "Suggest Fix" and get a code patch

Each category includes a button to generate a fix. When you click it, Claude reads the stack trace, locates the relevant source file in Mothership (our Go monorepo that powers the smart terminal management platform), and produces a patch:

```
--- a/ota/submit_ota_handler.go
+++ b/ota/submit_ota_handler.go
@@ -180,39 +180,91 @@
 	logger := logging.GetDefaultLogger()
 	does_ota_already_exist := false
 
-	// Check if OTA already exists
+	// Create a child context with extended timeout for database operations
+	dbCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
+	defer cancel()
+
+	// Check if OTA already exists with retry logic
 	var fingerprint, md5sum string
 	var enabled bool
-	...
```

*We simplified the preceding diff for brevity; the full patch includes extra retry and error-handling logic.*

Along with the code, you get a root cause explanation and the reasoning behind the fix.

### 5. One more click and you have a PR

Click "Create PR on GitHub" and logiq handles the rest: creates a branch (logiq-fix/[timestamp]), commits the change with a descriptive message, and opens a pull request that includes the error pattern, root cause analysis, fix description, and the code diff.

The PR awaits code review. No context switching, no copy-pasting error messages into commit descriptions, no manual branch creation.

## The technical details

The following sections describe the technical details of logiq.

### Teaching Claude about the platform

This part proved hardest. LLMs don't know that serialNumber identifies a terminal in our system, or that OTA logs live in specific Elasticsearch indices. The API wiring looked straightforward by comparison.

We built a field registry that documents the entire log schema:

```
type FieldRegistry struct {
    Fields map[string]FieldDefinition
    Indices []IndexMapping
    Aliases map[string]string  // &quot;terminal&quot; → &quot;serialNumber&quot;
}
```

This registry lives in the Mothership monorepo and gets injected into every prompt. When you ask about "terminal errors," the LLM knows to filter on serialNumber and query the right indices.

### How query generation works

The prompt includes the full field registry with type information, index mappings, routing rules, common query patterns, and the user's natural language question. Claude returns structured JSON:

```
{
  &quot;query&quot;: &quot;level: \&quot;ERROR\&quot; AND ...&quot;,
  &quot;indices&quot;: [&quot;mothership-dp*&quot;, &quot;logs-gdelastic.katana.mothership-*&quot;],
  &quot;explanation&quot;: &quot;Searching ERROR logs across OTA endpoints...&quot;
}
```

Users always see the generated query. When the query misses, you can inspect what happened and adjust it in Kibana manually.

### How the analysis works

For the analysis step, we send Claude up to 50 log entries along with the field registry and instructions to group by pattern. The prompt asks the model to weigh these factors: whether the log itself marks something as CRITICAL/ERROR/WARN, endpoint criticality (payment processing versus metrics collection), frequency patterns, and downstream impact.

You get a categorized summary with explanations instead of a wall of raw JSON.

### From logs to pull request

When you click "Suggest Fix," logiq:

1. Parses the stack trace to find the source file and function.
2. Reads the actual Go source code from your local Mothership checkout (logiq runs inside that codebase).
3. Sends the error context and code to Claude with a fix generation prompt.
4. Gets back a git diff with proper Go syntax.
5. Uses the GitHub REST API to create a branch and open a PR.

The whole thing takes about 30 seconds.

## A live example

During the hackathon demo, we ran a live investigation:

1. Typed "OTA failures in the last 24 hours".
2. logiq found 53 errors across distinct patterns.
3. Claude flagged a critical context cancellation bug in /submit_ota.
4. Clicked "Suggest Fix" then "Create PR".
5. 30 seconds later, a mergeable PR sat in GitHub.

The following image displays logiq's main interface (a natural language search with AI-powered error analysis and suggested fixes):

![a graphical user interface](https://www.godaddy.com/resources/wp-content/uploads/2026/04/logiq_display.png?size=1024x1024)

The following image shows a pull request generated by logiq, complete with root cause analysis and code diff.

![graphical user interface, text, application](https://www.godaddy.com/resources/wp-content/uploads/2026/04/logiq_pr.png?size=1024x1024)

## Next steps

We plan to roll this out to the full Device OS and Payments team after we iron out a few things: better edge case handling, query validation, rate limiting, caching the field registry to reduce API calls, and a proper onboarding guide. We also queued a security review to make sure the read-only access constraints stay bulletproof.

Longer term, a few directions still merit exploration: cross-index correlation (joining terminal logs with payment transaction data), deploy correlation (overlaying deployment timestamps to see which release broke something), and anomaly detection (flagging unusual error patterns before alerts even fire).

Beyond that, we want to evolve logiq from a reactive tool into something that watches proactively. The next iteration would connect directly to Application Performance Monitoring (APM) metrics and CloudWatch, detecting abnormal log patterns and metric spikes before an engineer even opens a terminal. We're also interested in adding AWS cost observability — surfacing unexpected spend changes alongside error patterns so the team can weigh reliability decisions against budget in the same workflow. The end goal is to shift on-call from "respond to an alert and investigate" to "the system already investigated and has a suggestion ready when you look."

## How to build something like logiq

The core pieces you'd need:

1. Document your log schema. Field names, types, relationships, common aliases.
2. Build prompts that inject this schema into the LLM context.
3. Stream the LLM output to a terminal UI. We used Go's [Bubble Tea](https://github.com/charmbracelet/bubbletea) library for this.
4. Connect to your log storage with read-only access.
5. Optionally, add local code access for fix generation.

Each of those steps involves real work, especially building out the field registry so the LLM has enough context to generate accurate queries. But the individual pieces need not feel exotic. Most of the value comes from the domain knowledge you feed the model, not the plumbing around it.

## A few things we learned

**Domain knowledge beats model size.** We use Claude Haiku, the smaller and faster model. The field registry gives it comprehensive context about the platform, which helps it perform well. A bigger model without that context would perform worse.

**Show the generated query.** Engineers need to verify the AI's output, and that builds trust fast. When the query misses, they can correct it and continue in Kibana.

**AI should speed things up, not take over.** logiq doesn't merge PRs. It creates them, and a human reviews and merges. We aim to cut out the tedious 80% so engineers can focus on the 20% that actually requires judgment.

**It's not always right.** The generated queries sometimes miss the correct index or misinterpret a field name, especially for questions that span multiple services. The suggested fixes can be directionally correct but incomplete — they might add retry logic where the real problem is a misconfigured timeout upstream. That's exactly why the tool always shows the generated query for inspection and why every fix goes through code review. We treat logiq's output as a strong first draft, not a final answer.

## Cost

Each query costs about $0.004 including the summarization step. With prompt caching for the system prompt and field registry, cost drops even lower. For a team of 10 with medium usage, expect roughly $5 to $25 a month. The tool runs locally, with no infrastructure cost.

## Wrapping up

Before logiq, investigating an OTA failure meant writing KQL by hand, scanning through logs, switching to an IDE, writing a fix, creating a branch, and opening a PR — 15 to 30 minutes of tedious context switching.

Today that means a question and two clicks. Under a minute from alert to PR.

The approach generalizes to any platform with structured logs and a codebase the LLM can read. The secret ingredient isn't the model — it's the domain knowledge you give it. If you invest the time to document your log schema well enough for an LLM to reason about, the rest of the pipeline almost builds itself.