Transforming the contract auditing process with GPT

Apr 24

By Rachael Burns, AI Tech Lead at Oscar

Managing contracts is a growing challenge in healthcare.

Contracts in our industry are rarely uniform, and contain multiple versions, amendments, and critical clauses that require careful scrutiny. At Oscar alone, we have over 15,000 contracts with doctors and hospitals all over the country, making the traditional approach of manually reviewing these documents highly time-consuming.

Recognizing this reality, we recently partnered with Oscar’s Payment Integrity team to develop an AI-driven solution aimed at streamlining contract audits, remediating billing discrepancies, and ultimately reducing costs for our members.

At Oscar, we’re pushing the limits of AI to make healthcare smarter and more efficient. Interested in joining us? Check out our open roles.

Prototype: Extracting structured data from contract excerpts

First, we focused on determining whether contracts were eligible for audits by analyzing hundreds of contract excerpts that had been extracted by a third-party vendor. The goal of this first step was to determine whether GPT’s answers to audit questions were consistent with our existing audit configurations.

Here’s an example of the kind of clause we examined:

Upon thirty (30) calendar days prior written notice, or sooner if required by a governmental agency, Oscar may review and/or audit any claims submitted by Provider for payment of Covered Services either prior to or after adjudication, including claims which are paid by Oscar, in order to verify the accuracy of claims submitted (“Claims Reviews”) in accordance with this Agreement. For a period of up to nine (9) months following the date of payment for inpatient claims, Oscar may perform Claims Reviews."

To facilitate this process, we used an open-source platform called Argilla, which provided a user-friendly interface for reviewing AI-generated outputs. After each review cycle, we ensured our findings aligned with both team feedback and the existing data on configured audit rights.

As we progressed, we realized that the excerpts were becoming outdated due to new contract amendments. To address this, we developed a search algorithm to extract relevant information directly from the raw contracts, ensuring our data remained current. This approach allowed us to verify that we were capturing all necessary information and refine our extraction process.

Despite being developed with basic prompt engineering and local Python scripts, our prototype proved to be highly successful, with 98% question answering accuracy and 99.5% search recall. The business value delivered by these early experiments was also clear and significant.

Soon, we expanded our efforts to extract relevant excerpts from contracts for various question categories, such as audits, recoveries, exclusivity, appeals, and notice addresses. We refined our extraction algorithm to include nearly all relevant excerpts, which it did nearly perfectly. Confident in our results, we used Argilla again, this time involving more reviewers from each relevant domain to provide feedback on the questions derived from these excerpts. Overall, performance across all of these additional experiments exceeded 95%.

For the notice address use case, we also employed address and email validation services to verify that the extracted information corresponded to actual physical and email addresses. Through this process, we were able to extract notice address extraction for over 4,000 provider partners.

Lessons learned

Credit to the the out-of-the-box capabilities of GPT, this workstream has been one of the most high-return AI projects Oscar has undertaken so far. With basic prompt engineering and local Python scripts, we have delivered business impact across a number of key business use cases. The process also led to significant lessons that will inform how we develop this and other AI-driven solutions in the future.

First, a key result from this work was the decision to build a solution in-house rather than outsourcing it. Given the high performance of GPT and immediate business impact resulting from relatively low tech investment, we were able to offer our internal team a strong point of view that we do not need to partner with an outside vendor for this use case.

Second, we also gained some key lessons about GPT, where it excels, and how to maximize its potential for our use cases.

GPT is better at answering questions from document excerpts than whole documents, both because it can focus on a smaller body of text, and because of the significant size of some contracts.
Metaprompting, which involves using a language model to enhance your initial prompt, is effective when there is a reliable ground truth. It's important to provide the model with the flexibility to challenge or disagree with the ground truth when necessary.
Careful phrasing of questions is crucial when working with GPT. It struggles with negative constructions and may misinterpret ambiguous language unless explicitly clarified. For instance, instead of asking "Are we allowed to conduct pre-payment audits?", where we are allowed unless specific language says we are prohibited, or asking "Are we not allowed to conduct pre-payment audits", ask "Are we prohibited from conducting pre-payment audits", and clearly define and list all synonyms for "pre-payment?"
The GPT o1 reasoning model is helpful when interpretation is more ambiguous, and where speed/cost isn’t a concern. For other cases, the 4o or 4o-mini models work well.

In the coming months, we plan to harden and productize our prototype to enable more self-service applications across our teams. While our current solution relies on scripts, we are developing an application to standardize contracts over time. This will involve renegotiating contracts to fit a standard template, facilitating automatic term extraction and analysis.

Integrating our AI system with this new standardization tool will require further development and productionization, but it promises to unlock even greater business value.

At Oscar, we’re pushing the boundaries of AI to make healthcare smarter and more efficient. Interested in joining us? Check out our open roles.

Rachael Burns is AI Tech Lead at Oscar

Ricardo Bilton

Transforming the contract auditing process with GPT

Prototype: Extracting structured data from contract excerpts

Lessons learned

AI has changed how we understand member calls – and how we respond to them

Extracting data from medical records and contracts is hard. Here’s how AI is making it easier