From Scans to Insights: Building AI-Powered Lab Result Extraction and Interpretation

From Scans to Insights: Building AI-Powered Lab Result Extraction and Interpretation

Apr 30, 2025

Sodiq Ridwan

Recently, we've been integrating with a digital health platform designed to help users take control of their health by making personal medical data—such as lab results—more accessible, understandable, and actionable. Our mission? To improve how users interact with their personal health data. In fact, a recent feature we worked on enables users to upload lab results—either as PDFs or image files—which are then automatically parsed, stored, and interpreted by the platform. The goal is to turn unstructured medical data into structured, actionable insights with minimal user input.

Sounds simple enough, right? In reality, it turned into a deep dive into the messy world of OCR, data structure, and prompt engineering. Here's the journey we went on—from inaccurate scans to AI-powered interpretations.

The Problem: Lab Results Are a Mess

Medical lab results are often delivered as unstructured documents—scanned images, blurry PDFs, or photos taken on a phone. These documents don’t follow a universal template, making it incredibly hard to extract information programmatically.

We needed a way to:

  1. Accurately extract the test name, result, unit, and reference range.

  2. Interpret whether a test result was normal, high, or low.

  3. Display all of this clearly to the user.

All sound like a job for AI? We thought so too—but it wasn’t straightforward.

Early Attempts: OCR and Structure Wars

Our first instinct was to use Tesseract, a popular open-source OCR engine. It did a good job maintaining the layout and structure, which is important for preserving rows of test results. But it struggled with:

  • Low-quality scans

  • Blurred or skewed text

  • Misreading characters (e.g., mixing up 0 and O)

Next, we tried Google Document AI. It excelled at recognizing characters—even in tough image conditions—but came with its own issues:

  • It completely lost the structural layout, sometimes assigning values from one row to the wrong test in another row.

  • It lacked configurability to preserve spatial relationships or tables.

We even experimented with GPT models to extract structured data. These preserved layouts surprisingly well and seemed to “understand” what was going on. But again, character recognition wasn't perfect. GPT sometimes mistook numbers or misaligned data if the OCR wasn't already accurate.

Turning Point: Image Preprocessing

We realized that most of the OCR failures were due to one thing: bad input. So we stopped seeking new tools and focused on improving what we already had.

We added a preprocessing step before sending the image to an OCR or GPT, which:

  • Increased contrast

  • Boosted sharpness

  • Deskewed the image

This small change made a big difference. Tesseract’s accuracy improved drastically. Even GPT-based models began outputting much more reliable results when fed the cleaner extracted text.

The Pipeline: Clean Input, Clean Output

At this point, our workflow looked like this:

  1. User uploads a lab result (PDF or image)

  2. We preprocess the image

  3. OCR extracts the text (Tesseract or GPT-based parser)

  4. Structured results are stored with the client for display

This was good enough to extract the raw data. But we weren’t done yet.

Next Challenge: Interpretation

We didn’t just want to show users a bunch of numbers—we wanted to tell them what those numbers meant.

We had a reference dataset that included:

  • Lab test names

  • Measurement units

  • Acceptable ranges (e.g., Glucose → mmol → 10–20)

The plan was to feed this into the GPT alongside the extracted test results and have it interpret each result as normal, high, or low.

But… we ran into context size limits.

With large panels of tests, the GPT model either:

  • Skipped interpreting some results entirely

  • Used the wrong value or range in its analysis

We tried condensing the dataset to a compact format, and even tested a retrieval system that would fetch only the relevant reference ranges. But performance was inconsistent, and we didn’t have time to build a fully-optimized retriever—this was still an MVP.

What Worked: Letting the Lab Speak for Itself

The breakthrough came when we flipped the approach.

Instead of trying to feed GPT external data like reference ranges, we asked: What if we just passed the preprocessed lab result and nothing else?

Turns out, this worked incredibly well.

Most lab results already include the reference range in the document. With the right prompt, GPT was able to:

  • Identify the test name, result, and unit

  • Read the reference range from the same row

  • Determine if the result was high, low, or normal

  • Even explain the test in plain English

No need to engineer huge prompts. No need to pass our full dataset. Just clean, structured input and thoughtful prompt design.

Lessons Learned

  • Garbage in, garbage out. Clean images make all the difference in OCR.

  • You don’t always need more data. Sometimes what you need is already in the document.

  • Prompt engineering matters. You can get great results with the right framing—even from general-purpose AI models.

  • Keep it simple. In MVPs, simplicity and reliability beat complexity and fragility.

What’s Next

With extraction and interpretation up and running, our next steps are:

  • Enhancing user-facing explanations (What does "high LDL" mean?)

  • Adding alerts for abnormal values

  • Mapping tests to medical codes like LOINC

  • Exploring longitudinal tracking of lab results over time

We're excited about what's next, and even more excited to put these capabilities into the hands of users ready to take ownership of their health journey!