Extracting data from medical records and contracts is hard. Here’s how AI is making it easier

By Lauren Pendo, Rachael Burns, and Linda Schilling

While LLMs are great at understanding text, when it comes to medical records, that understanding isn’t always enough. After all, when you look at the average medical record, you see not just text but also images and tables – often in non-standard layouts.

Fortunately, most of the major models today are multimodal, meaning that they process not just text but images, audio, and even video. This has major implications for our work at Oscar. The big question: How can multimodal capabilities help us improve our understanding of medical records while also preserving the structure of those records, which is often critical to that comprehension?

In this article, we share lessons we’ve learned from our recent efforts to answer that question. First we dig into three approaches to extracting data from patient records, then we lay out a few ways we’ve applied these approaches to common healthcare applications.

Three approaches to extracting medical record data

Approach 1 - PDF → Text using OCR → GPT: In this approach, we extract text using traditional optical character recognition (OCR) from PDFs of patient records. Then, we pass that extracted text to GPT.

Approach 2 - PDF → Image → GPT: In this approach, we leverage the multi-modal capabilities of GPT. First, we convert PDFs to images. Then, we pass images directly to GPT and ask questions directly on the images provided. For example, “Does the medical record and claim violate our provider billing policies?”

Approach 3 - PDF → Image → Text using GPT → GPT: In this final approach, we leverage GPT as a text extractor in place of traditional OCR. We convert PDFs to images, then prompt GPT to extract text from the images in one call. Finally, we pass the text into GPT and ask questions about the text. For example, “Extract the section on reimbursement policies from the contract and then pass the text from the extraction into GPT to ask a question about our reimbursement policies.”

Use case 1: Identifying image-based signatures

Winner: Approach 2 - PDF → Image → GPT

Oscar’s policies stipulate that medical records must be correctly signed and dated by supervising providers. This dated provider signature is a requirement for three important clinical workflows:

Detecting inconsistent billing practices from providers to prevent fraud, waste, and abuse
Ensuring the accurate risk profile of Oscar’s membership base as part of the government’s annual risk transfer program
Determining medical necessity of procedures requested for prior authorization for complex and often expensive procedures.

Through our experiments, we saw that the second approach (PDF → Image → GPT) provided a 3-4% accuracy boost to AI's ability to detect whether a medical record is signed within that stipulated window. Adding in a scaling factor, (i.e., making the images larger) increased the accuracy even further.

Each of these signature identifications have different requirements around the timeliness of the provider signature. Here are the full results:

These results make intuitive sense: While many providers sign their medical records digitally, some still depend on handwritten signatures.

Use case #2: Enhancing disease classification prompts

Winner: Approach 2 - PDF → Image → GPT:

In addition to using vision models to OCR medical records before feeding them into a text-based model, we're also using these models to improve our prompts.

For example, when we extract ICD-10 codes (which are used to classify and code diagnoses, symptoms, and procedures) we need to include some general guidelines in our prompts. These guidelines come in PDF format and have lots of charts with complex logic and explanations. By applying vision-based OCR to these guidelines, we can deliver clean and clear text to our prompt, helping our extraction model accurately identify which ICD-10 codes should or shouldn't be extracted.

Use case #3: Extracting content from contracts

Winner: Approach 2 - PDF → Image → GPT

Provider contracts are another important natural language data source for various workflows at Oscar.

Contracts determine how we should pay claims, which providers are in our network, and whether there are legally binding obligations for how we operate with respect to our provider partners. Considering that contracts tend to be quite long – sometimes hundreds of pages – the right application of AI can dramatically improve how quickly contracts can be parsed.

To that end, the Oscar AI team built a tool that:

Takes in a query or set of queries from the user based on their use case
Searches the relevant parts of the contract necessary to answer the query
Returns an answer (or set of answers)

We tested Approach 1 (PDF → Text using OCR → GPT) and Approach 2 (PDF → Image → GPT) on two different types of queries that users have for Oscar’s contracts:

Whether a contract is exclusive, i.e., whether we are allowed to add providers from a different health system to the network
Whether Oscar is allowed to audit the claims for the provider (an audit is when we request medical records for a claim to validate the amount we paid is correct)

We observed that Approach 2 (PDF → Image → GPT) yielded better search results than Approach 1 (PDF → Text using OCR → GPT). Using vision to interpret legal documents likely preserves information stored from the visual format of the contract. Provider contracts often use tables, which GPT can report in markdown. Other methods we tried for markdown table extraction proved unreliable and/or prone to hallucination.

Here’s an excerpt of a contract with hierarchy in sections and what traditional OCR’ing - yields.

Additionally, less related to the visual format of the contract, we observed that using vision to translate to text reduced typos relative to traditional OCR.

Use case #4: Identifying billing inconsistencies

Winner: Approach 1 : PDF → Text using OCR → GPT

We found that when asking for an answer that required synthesis instead of extraction or identification, passing in natural language to GPT led to better results.

We compared the performance of GPT at identifying billing inconsistencies using images of a medical record vs. using the OCR’ed text. Approach 1 (PDF → Text using OCR → GPT) outperformed Approach 2 (PDF → Image → GPT) at identifying billing violations across multiple prompts and samples. An example billing violation would be an incident-to billing case where the practitioner on the medical record is not the practitioner billed on the claim.

The accuracy of Approach 2 (text extracted from the medical record using traditional OCR) was 91% and 83% on two respective samples. Meanwhile, the accuracy of Approach 1 (vision) was 71% and 76% on the same samples. This suggests that directly swapping OCR’ed text for images from medical records does not necessarily yield sweeping performance gains, despite the often unstructured and messy nature of medical records.

Conclusion

For specific image based tasks such as identifying signatures or extracting a section of a contract, we can increase our performance with images over OCR. However, we found that when we need to answer a complex question about a source document, it is better to pass in an excerpt or the full document as text as opposed to as an image.

At Oscar, we’re pushing the boundaries of AI to make healthcare smarter and more efficient. Interested in joining us? Check out our open roles.