Site Operations

Site Feasibility Scoring Beyond Spreadsheets: What Data-Driven Selection Actually Looks Like

Rebecca Nwosu April 21, 2025 7 min read

Site feasibility scoring dashboard with enrollment metrics by therapeutic area

Site feasibility assessment is how sponsors and CROs decide which clinical sites to include in a trial. Done well, it predicts enrollment performance accurately enough that a 100-patient Phase III trial can be allocated across 20 sites with confidence that each site will meet its quarterly enrollment targets. Done poorly, it selects sites that look good on paper and consistently under-enroll, generating rescue site additions six months into the trial and pushing completion timelines out by 12–18 months.

Most feasibility assessment is still done primarily through questionnaires. A site receives a feasibility questionnaire, usually a spreadsheet or PDF from the sponsor's clinical operations team, and is asked to estimate: patient population size for the target indication, historical enrollment rate per month for similar trials, available coordinator capacity, PI experience with the therapeutic area, and infrastructure availability (specialized equipment, imaging, laboratory capabilities). The site completes the questionnaire based on the coordinator's or PI's recollection, the sponsor scores the responses, and sites above a threshold score are selected.

The limitations of this approach are well-documented among clinical operations professionals, if not always stated directly to sponsors. Questionnaire-based feasibility scoring has three structural problems that data-driven assessment can largely address.

The three structural problems with questionnaire feasibility

Self-reported population estimates are systematically overestimated. When a coordinator is asked "how many patients with T2DM and HbA1c > 7.5% do you have in your active panel," the answer is almost always a guess. The coordinator may have a rough sense of the practice's T2DM caseload — perhaps they know the endocrinology service sees approximately 800 T2DM patients per year. What they do not know, without running a structured EHR query, is how many of those patients have HbA1c values above the 7.5% threshold, how many of those have qualifying eGFR values, how many are on medications that would trigger exclusion, and how many of the remainder have already participated in other trials and are therefore enrolled elsewhere or in a washout period.

Studies examining the correlation between feasibility questionnaire estimates and actual enrollment performance have consistently found that sites overestimate their patient populations by a factor of two to four. A site that estimates 150 eligible patients per questionnaire response typically produces 40–60 candidates when the EHR population is analyzed against the actual I/E criteria. That overestimation is not dishonesty — it reflects the genuine difficulty of estimating a structured query result without running the query.

Historical enrollment rate data is not protocol-specific. When a site reports enrolling 3 patients per month in "similar trials," that figure is drawn from some mental average of past performance across multiple trials with different I/E criteria stringency, different therapeutic areas with different patient prevalence, and different protocol burden profiles. A site that enrolled 3 patients per month in a broad-criteria hypertension trial may enroll 0.8 patients per month in a narrow-criteria treatment-resistant hypertension trial, because the patient population overlap is much smaller than the therapeutic area overlap implies.

Protocol-specific enrollment rate projection requires knowing how many patients at the site actually satisfy the specific protocol's I/E criteria — which requires running the criteria against the EHR, not applying historical rates from a different protocol to a new enrollment estimate.

Coordinator capacity is rarely validated against actual protocol burden. A site reports having two FTE coordinators available for the trial. What the questionnaire does not capture is whether those coordinators are already managing a high-burden Phase III trial that generates 40 sponsor queries per month, whether one coordinator is scheduled for leave during the critical enrollment window, or whether the site has a monitoring visit due that will consume two days of coordinator time in the month projected as high-enrollment. Available capacity at the time of questionnaire completion is not the same as available capacity averaged across the enrollment period.

What data-driven feasibility actually measures

Data-driven feasibility assessment replaces self-reported estimates with structured queries run against the EHR, supplemented by historical CTMS data for sites that have existing performance records.

The core query is the population analysis: how many patients in the site's active EHR panel satisfy the draft I/E criteria for the protocol? This is not a questionnaire question. It is a FHIR query run against the EHR population using the actual protocol criteria — the same criteria that will govern enrollment — rather than a broad therapeutic area descriptor. The query produces a count of patients who meet the confirmed criteria, a count who have data gaps requiring follow-up, and a count who are definitively ineligible.

The population analysis output has several immediate uses for feasibility scoring. First, it produces a realistic enrollment rate projection. If the site has 80 patients who appear to meet the criteria, and historical conversion rates for similar trials at the site suggest that approximately 35% of pre-screened candidates complete the screen visit and 70% of those consent — conservative but realistic figures — then the projected enrollment capacity is approximately 28 subjects. If the trial requires 15 subjects from this site, that projection suggests adequate capacity with buffer. If the trial requires 25 subjects, the site is near its estimated capacity limit, and a sponsor that understands this will assign a modest enrollment target and monitor closely rather than over-assigning a target that the site cannot meet.

Second, the population analysis reveals the distribution of data completeness across the candidate pool. If 60% of the 80 qualifying patients have at least one data gap in the I/E criteria fields, the site's coordinator workload for pre-screening will be substantially higher than if 90% have complete data. A site with a large qualifying population but widespread data gaps requires more coordinator time per candidate than a site with a smaller population and higher data completeness. That distinction is invisible to questionnaire-based feasibility scoring.

The role of historical CTMS data

Population analysis tells you what the EHR contains today. Historical CTMS data tells you how that population translated into enrollment performance in prior trials. The two data sources are complementary.

A site with a large qualifying population in the EHR but a history of low screen-to-consent rates in prior trials has a predictable risk profile: the patients are there, but something in the site's conversion process — outreach approach, consent discussion quality, patient travel burden, coordinator follow-through — is limiting enrollment yield. That risk can be addressed through coordinator workflow support, and the sponsor can factor it into site selection and resource allocation decisions.

A site with a small qualifying EHR population but a history of high screen-to-consent rates in prior trials has a different risk profile: each candidate is efficiently converted, but there are not many candidates. That site may still be selected for a trial with a modest enrollment target, and its performance projection should be based on the smaller qualifying population, not on its historical monthly rate from trials with broader criteria.

Sponsors and CROs that have access to CTMS performance data across their site networks can build enrollment rate models that incorporate both EHR-based population counts and historical conversion rates. Sites where both data sources point to the same enrollment capacity estimate are lower risk selections. Sites where the two data sources diverge — large EHR population but poor historical conversion, or small EHR population but high conversion rates — warrant closer review during feasibility scoring.

Stated capabilities vs. actual patient counts

Site feasibility questionnaires typically ask about site capabilities as proxies for patient availability: "does your site see patients with treatment-resistant hypertension?" The answer is almost always yes — sites selected for feasibility review are generally selected because they have the relevant therapeutic expertise. The capability question is rarely the discriminating one.

The discriminating question is: how many patients at this site satisfy the specific I/E criteria for this trial, within the current active patient panel, with data available in the EHR at a recency that satisfies the protocol's temporal constraints?

That question cannot be answered by a questionnaire. It requires a structured data query. Sites that can run that query against their EHR before the feasibility questionnaire deadline — rather than substituting an estimate — are providing sponsors with information that is genuinely predictive of enrollment performance rather than information that looks like a prediction but is actually a confidence estimate.

Implementation realities for site teams

Running a structured EHR population analysis for a feasibility questionnaire requires three things: access to a FHIR R4 API on the EHR, a mechanism to translate the draft I/E criteria into structured queries, and time to run and review the output before the questionnaire deadline.

The last constraint is the most pressing in practice. Feasibility questionnaire turnaround times are typically one to two weeks from receipt. A coordinator managing three active trials who receives a feasibility questionnaire for a potential fourth trial on a Monday with a deadline two Fridays later does not have two days to build a custom EHR query from scratch. The manual EHR query approach is not operationally realistic at most sites without a dedicated informatics resource.

The sites that are currently doing data-driven feasibility well tend to be academic medical centers with dedicated research informatics staff who can build I&-based population queries against the institutional EHR. Community sites and independent practice networks — which represent a large fraction of the total clinical trial site network — typically do not have those resources.

The opportunity for technology to improve this situation is real. A tool that ingests draft I/E criteria from the feasibility questionnaire, runs structured FHIR queries against the site's EHR, and returns a population count with data completeness breakdown within minutes — rather than requiring a custom informatics build — changes the feasibility response from an estimate to a measurement. That change in data quality ripples through the site selection process, producing enrollment projections that are accurate enough to be planned against rather than merely cited as baseline assumptions before the actual enrollment data comes in.

Questionnaire-based feasibility scoring will persist because it is low-cost and familiar. But for sponsors running trials where enrollment timeline accuracy is operationally critical — late-phase pivotal trials where a 6-month enrollment delay costs tens of millions in development timeline — the value of EHR-based population analysis at the feasibility stage is substantial. Sites that can offer that analysis, and sponsors that can request it systematically, are operating with a fundamentally different view of enrollment capacity than the industry average.

Where data-driven feasibility has limits

The case for EHR-based population analysis in feasibility is strong, but it does not eliminate all sources of feasibility scoring error. It is worth being precise about what the analysis does and does not capture, so that sites and sponsors do not substitute one form of overconfidence for another.

EHR population analysis accurately measures the size of the qualifying patient pool as documented in the accessible record. It does not measure patient willingness to participate. A site with 120 protocol-qualifying patients and a population with strong cultural or practical barriers to trial participation — long travel distances, language barriers, demanding protocol visit schedules — may produce lower enrollment yields than a site with 80 qualifying patients and a population with high research participation rates. The conversion from qualifying candidate to consented subject is influenced by factors that do not appear in FHIR data.

Similarly, EHR population analysis does not capture patients who are currently enrolled in competing trials and therefore unavailable. A site in an academic medical center with high research participation rates may have a smaller effective qualifying pool than the EHR analysis suggests, because a meaningful fraction of protocol-qualifying patients are already enrolled elsewhere and in washout periods or exclusion windows from prior study participation. Competing trial enrollment is rarely documented as a structured field in the EHR; it is tracked in the CTMS but typically not surfaced through FHIR queries.

The coordinator's clinical knowledge of the patient panel — who is likely to be interested in research participation, who has a history of participating in prior trials, which patients have logistical constraints that would make them unlikely completers — remains a necessary complement to the structured data analysis. The goal of data-driven feasibility is not to replace that clinical knowledge with a query result. It is to give the coordinator's clinical judgment a more accurate empirical foundation to work from: a realistic patient count and data completeness map rather than an optimistic guess.

A feasibility process that combines EHR-based population analysis with the site team's qualitative assessment of that population's engagement profile is more predictive than either input alone. The structured query tells you how many patients are there. The coordinator's knowledge tells you how many of those patients are realistically reachable and retainable. Together, they produce an enrollment projection that sponsors can actually plan against.