Site Operations

Site Feasibility Scoring Beyond Spreadsheets: What Data-Driven Selection Actually Looks Like

Rebecca Nwosu April 21, 2025 7 min read

Site feasibility scoring dashboard with enrollment metrics by therapeutic area

Site feasibility assessment is how sponsors and CROs decide which clinical sites to include in a trial. Done well, it predicts enrollment performance accurately enough that a 100-patient Phase III trial can be allocated across 20 sites with reasonable confidence that each will meet quarterly targets. Done poorly, it selects sites that look good on paper and consistently under-enroll, generating rescue site additions and pushing completion timelines 12–18 months.

Most feasibility assessment is still done primarily through questionnaires. A site receives a spreadsheet or PDF and is asked to estimate: patient population size for the target indication, historical enrollment rate for similar trials, coordinator capacity, PI experience, and infrastructure availability. The site completes this based on coordinator or PI recollection. The sponsor scores responses and selects sites above a threshold.

The limitations are well-understood among clinical operations professionals, if not always stated directly to sponsors.

Self-reported population estimates are systematically overestimated

When a coordinator is asked how many patients with T2DM and HbA1c >7.5% are in their active panel, the answer is almost always a guess. The coordinator may know the practice sees approximately 800 T2DM patients per year. What they do not know, without a structured EHR query, is how many of those patients have HbA1c values above the threshold within the required time window, how many have qualifying eGFR values, how many are on excluded medications, and how many are already enrolled in other trials or in washout periods.

Studies examining the correlation between questionnaire estimates and actual enrollment performance consistently find sites overestimate their patient populations by a factor of two to four. A site estimating 150 eligible patients typically produces 40–60 candidates when the EHR is analyzed against the actual I/E criteria. This overestimation is not dishonesty — it reflects the genuine difficulty of estimating a structured query result without running the query.

Historical enrollment rate data is not protocol-specific

When a site reports enrolling three patients per month in "similar trials," that figure averages performance across multiple trials with different I/E criteria stringency, different therapeutic areas with different patient prevalence, and different protocol burden. A site that enrolled three patients per month in a broad-criteria hypertension trial may enroll 0.8 per month in a treatment-resistant hypertension trial, because the patient population overlap is much smaller than the therapeutic area overlap implies.

Protocol-specific enrollment rate projection requires knowing how many patients at the site satisfy the specific protocol's I/E criteria — which requires running those criteria against the EHR, not applying historical rates from a different protocol.

What data-driven feasibility actually measures

Data-driven feasibility replaces self-reported estimates with structured queries run against the EHR, supplemented by historical CTMS data for sites with existing performance records.

The core analysis: how many patients in the site's active EHR panel satisfy the draft I/E criteria for the protocol? This is not a questionnaire question. It is a FHIR query run against the EHR population using the actual protocol criteria — the same criteria that will govern enrollment. The output is a count of patients who meet confirmed criteria, a count who have data gaps requiring follow-up, and a count who are definitively ineligible.

This output has immediate value for feasibility scoring. If the site has 80 patients who appear to meet the criteria, and historical conversion rates for similar trials suggest approximately 35% of pre-screened candidates complete screening and 70% of those consent, the projected enrollment capacity is roughly 28 subjects. If the trial requires 15 from this site, there is adequate capacity with buffer. If it requires 25, the site is near its estimated limit — a sponsor who understands this will assign a modest target and monitor closely rather than over-assign a target the site cannot meet.

Stated capabilities vs. actual patient counts

Site feasibility questionnaires typically ask about capabilities as proxies for patient availability: "does your site see patients with treatment-resistant hypertension?" The answer is almost always yes — sites selected for feasibility review are selected because they have relevant expertise. The capability question is rarely the discriminating one.

The discriminating question is: how many patients at this site satisfy the specific I/E criteria for this trial, within the current active panel, with data at a recency satisfying the protocol's temporal constraints? That question cannot be answered by a questionnaire. It requires a structured data query.

Sites that can run that query before the feasibility questionnaire deadline — rather than substituting an estimate — are providing sponsors with information genuinely predictive of enrollment performance. The sites currently doing data-driven feasibility well tend to be academic medical centers with dedicated research informatics staff. Community sites and independent practice networks — a large fraction of the total trial site network — typically lack those resources.

A tool that ingests draft I/E criteria from the feasibility questionnaire, runs structured FHIR queries against the site's EHR, and returns a population count with data completeness breakdown in minutes changes the feasibility response from an estimate to a measurement. That shift in data quality propagates through site selection into enrollment projections that are accurate enough to plan against — rather than baseline assumptions that get abandoned once actual enrollment data arrives.

Questionnaire-based feasibility scoring will persist because it is low-cost and familiar. But for trials where enrollment timeline accuracy is operationally critical — late-phase pivotal studies where a six-month delay has significant development cost implications — the value of EHR-based population analysis at the feasibility stage is real. Sites that can offer it, and sponsors that request it systematically, are operating with a fundamentally different view of enrollment capacity than the industry average.