Patient Identification

Why Clinical Sites Miss Eligible Patients (And What the EHR Already Knows)

Rebecca Nwosu November 4, 2024 6 min read

Clinical research coordinator at a computer with patient chart data

Every site coordinator has had this experience. You're two weeks post-SIV on a Phase III metabolic trial, you've got twelve active inclusion criteria and six exclusion criteria in front of you, and you're running through the EHR trying to find patients who fit. You open a chart. Check the HbA1c value. Check the date — it's 14 months old, outside the 90-day window. You close the chart. Open the next one. The process takes about four minutes per patient, and you have 800 patients in your active panel.

This is not a coordinator skill problem. It is a structural mismatch between how protocols are written and how EHR data is organized. Understanding the mechanics of that mismatch explains why the average time from site initiation visit to first screened patient has been measured at approximately eleven weeks across Phase II and III trials — and why the problem persists despite coordinators who are diligent, experienced, and working hard.

The SIV handoff and what it asks of coordinators

A site initiation visit marks the point at which a site becomes operational for a trial. The sponsor or CRO has completed site qualification and selection. The protocol has been finalized. The IRB approval is in hand. The investigational product is authorized. From that moment, the site is on the clock to enroll.

What happens next at most sites is this: the coordinator assigned to the trial reads the protocol, identifies the I/E criteria that can be pre-screened using EHR data (usually the objective, data-captured criteria — lab values, diagnosis codes, age, BMI), and begins manually searching the EHR to find patients who might qualify.

The manual search typically follows a logic like this: identify a broad patient population by pulling on one or two primary diagnosis codes (in a T2DM trial, something like ICD-10 E11.x), then open individual records and check the remaining criteria by eye. Some coordinators build saved searches in the EHR. Some maintain spreadsheets. Some use printed protocol I/E checklists. The common thread is that each individual criterion check requires a human to read a value from a record and mentally compare it to the protocol's specification.

Four minutes per patient is generous. A coordinator working a 400-patient panel with twelve I/E criteria, taking three minutes per record after the initial population pull, is looking at approximately 20 hours of chart review before having a shortlist. In practice, that work is distributed across a coordinator's full trial portfolio, so it stretches across two to three weeks.

What the EHR actually contains

Here is the counterintuitive part: the data to identify eligible patients is already in the EHR, and it is structured. This is not a problem of missing information. A patient with T2DM, HbA1c of 8.4%, eGFR of 56, BMI of 31, and no current insulin prescription has all of that information in the record. It is in Observation resources with LOINC codes. It is in Condition resources with ICD-10 codes. It is in MedicationRequest records with RxNorm identifiers.

The gap is not data availability. The gap is that the protocol's I/E criteria have never been mapped to those data fields in a machine-readable way. The protocol says "HbA1c > 7.5% and ≤ 11.0% within the prior 12 weeks." The EHR contains an Observation with LOINC code 4548-4, value 8.4%, date 2024-10-12. These two pieces of information could be compared programmatically in milliseconds. Instead, a coordinator reads one, opens the other, and writes the result in a spreadsheet cell.

The disconnect exists because protocols are written in regulatory English — language designed for FDA reviewers and IRB submissions, not for database queries. And until recently, there was no practical mechanism to translate one into the other at the speed enrollment timelines require.

The compounding effect of coordinator workload

The chart review bottleneck would be problematic even if coordinators had only one trial to manage. Most do not. A coordinator at a mid-size academic site commonly manages three to six active protocols simultaneously, each at a different enrollment stage. One trial is in follow-up and requires only visit scheduling. One is in active enrollment and requires daily chart review attention. One just had its SIV and is in the pre-screening build phase.

Chart review for the new trial competes with visit documentation, query resolution for existing subjects, sponsor-required data entry in the EDC, adverse event reporting, and protocol deviation documentation. ICH E6(R2) GCP guidance requires coordinators to maintain complete and accurate records across all of these domains. The work is non-discretionary.

The result is that pre-screening for a newly activated trial gets done in fragments — an hour here, two hours there — stretched across two to three weeks rather than concentrated in the first few days after SIV. During those weeks, eligible patients continue to be seen in clinic, potentially becoming ineligible: a patient who met the eGFR cutoff four weeks ago may have had a lab redrawn that now shows a value below threshold. A patient who was insulin-naive last month may have been started on basal insulin by their endocrinologist. The pool of eligible patients is not static, and delayed pre-screening means some of those patients age out of eligibility before they are ever identified.

The three categories of missed patients

When enrollment runs behind target, it is tempting to attribute the shortfall to a low prevalence of eligible patients in the catchment population. That is sometimes true — narrow I/E criteria in a rare disease indication genuinely constrain the available pool. But in common condition trials (metabolic disease, cardiology, oncology), the available pool is usually larger than sites realize. The missed patients typically fall into three categories.

The first category is patients who were never searched. Coordinators begin chart review with a broad diagnostic query, but the query logic determines who gets considered. A coordinator searching for T2DM patients using E11.9 (T2DM without complications) may miss patients coded as E11.65 (T2DM with hyperglycemia) or E11.40 (T2DM with diabetic neuropathy, unspecified). ICD-10 code specificity varies significantly across sites and even within sites, depending on the physician. A coordinator unfamiliar with the full ICD-10 subcategory tree for a condition may inadvertently exclude a meaningful fraction of the eligible population from the search entirely.

The second category is patients whose records were reviewed but whose eligibility was incorrectly assessed. This happens most often with temporal criteria — criteria that specify a time window for a lab value or clinical event. A coordinator reviewing a chart under time pressure may note that the patient has a qualifying HbA1c result without verifying whether the result falls within the protocol's 12-week window. The inverse also happens: a coordinator notes that the most recent lab result is outside the window, but an older qualifying result exists within the window period and goes unnoticed in a dense medication history.

The third category is patients who were identified as candidates but were not contacted before the enrollment window closed — either because the coordinator ran out of time to work through the full shortlist, or because the shortlist itself was prioritized in a suboptimal order. If a coordinator has 60 candidates to contact for 12 enrollment slots and contacts them in the order they appeared in a diagnostic code search, the high-probability candidates may be buried at the bottom of the list. A patient who meets 11 of 12 criteria with confirmatory lab data gets a call in week four; a patient who meets 3 criteria and requires five additional chart reviews gets a call in week one.

What structured protocol-to-EHR mapping changes

The bottleneck is not the EHR and not the coordinator. It is the absence of structured mapping between the protocol's I/E logic and the EHR's data model. When that mapping exists — when each criterion has been translated into a specific FHIR R4 resource type, data element, value range, and temporal constraint — the pre-screening process becomes a database query rather than a manual review.

A coordinator who receives a ranked candidate list with criterion-level match detail — this patient meets 11 of 12 criteria, the 12th is the BMI field which was last recorded eight months ago — is doing a fundamentally different kind of work than a coordinator who opens 400 charts. The coordinator with the ranked list is making outreach decisions and confirming data gaps. The coordinator without the ranked list is doing the initial identification work that should have been automated.

The information the EHR already holds is sufficient to pre-screen the majority of common-criteria trials without manual abstraction. The missing piece is the mapping step — translating protocol language into structured queries and running those queries at population scale before the coordinator has opened a single chart.

That is a solvable problem. It is, in fact, the only problem that needs solving to recover those three months.

The priority ordering problem and candidate-to-consent ratios

Even when coordinators do produce a working shortlist of pre-screened candidates, how that list is ordered has a disproportionate effect on enrollment outcomes. The candidate-to-consent ratio — the number of candidates a site must contact to produce one consented subject — varies considerably depending on whether high-probability candidates are contacted first or whether the list is worked in an arbitrary sequence.

Consider a coordinator with a shortlist of 55 candidates for 10 enrollment slots. The coordinator needs to contact enough candidates to fill the slots, accounting for the outreach call failure to convert (some candidates will be unreachable, some will decline participation after hearing the protocol requirements, some will disclose exclusionary information during the call). If high-probability candidates — those with all I/E criteria confirmed from the EHR and no data gaps — are concentrated at the top of the list, the coordinator reaches the 10 enrollment slots after contacting perhaps 28–32 candidates. If the list is unordered, the coordinator may exhaust 45–50 contacts before filling the slots, because lower-probability candidates consumed outreach time earlier in the process.

The difference matters for enrollment timeline. In a trial with weekly enrollment targets, working through a disordered candidate list adds two to three weeks to the time required to reach target enrollment — not because the eligible patients were not there, but because the outreach effort was sequenced suboptimally. The candidates who consented in week six of the campaign were available in week one; the ordering prevented them from being reached until week six.

We are not saying that every coordinator is ordering their contact list incorrectly — experienced coordinators develop intuition for which candidates to prioritize, and that intuition is often accurate. What we are saying is that intuition is not scalable across a portfolio of three to six concurrent active protocols, each with its own candidate list and outreach timeline. An eligibility confidence score that ranks candidates by the completeness and recency of their I/E criterion data gives a coordinator a consistent, data-derived starting point that does not depend on the coordinator's recollection of each individual patient's clinical context.

The enrollment window and the cost of late identification

The structural cost of delayed pre-screening extends beyond the site level. From a sponsor's perspective, enrollment lag at individual sites compounds across the trial network. A 100-patient Phase III trial distributed across 15 sites, where each site takes four weeks longer than planned to produce its first enrolled subject, results in a trial-level timeline delay measured in months, not weeks. The enrollment curve starts late, ramps slowly, and reaches completion well past the planned date.

Under ICH E6(R2) GCP principles, sponsors are responsible for monitoring enrollment progress and taking action when sites deviate from projected timelines. In practice, the response to enrollment lag is often to add rescue sites — new sites brought in six months after trial initiation to make up for underperforming original sites. Rescue site activation requires a fresh round of SIV preparation, IRB submissions at the new sites, and coordinator training. The logistical overhead of rescue site addition typically runs 8–14 weeks from the decision to add a site to first enrollment at that site. By that point, the trial timeline has already absorbed most of the impact that better pre-screening at the original sites would have prevented.

The FHIR R4 standard, now required under the 21st Century Cures Act's interoperability provisions for certified EHR systems, provides the technical foundation for population-scale pre-screening queries. Sites with SMART on FHIR access to their EHR's patient population can, in principle, run a structured pre-screening query the day of the SIV rather than three weeks later. The API infrastructure is increasingly in place. The operational bottleneck is the mapping step — translating protocol I/E language into executable FHIR queries — and the workflow integration to surface the results to coordinators in a form they can act on without requiring informatics expertise.

None of this requires the coordinator to become a database engineer. It requires the mapping work to be done once per protocol, upstream of the site, and the query results to be delivered in a format that fits the coordinator's existing workflow. The EHR already knows which patients are eligible. The question is when — and in what order — that knowledge reaches the people who can act on it.