Are We Paying for Value or Just Activity? Rethinking Vendor Performance in Health Benefits

Dr. Olu Albert

4/27/2026

During my tenure as Assistant Division Director for Health Benefits Operations and Contract Compliance for the State of New Jersey, I encountered a recurring pattern that continues to shape my viewpoint about population health today. Vendors consistently reported strong performance, high engagement rates, extensive outreach, and robust program activity. However, when program teams examined the data more closely, when we looked at outcomes, the story was often far more complex. This experience raises a fundamental question for those leading health policy, overseeing public programs, or managing large-scale federal and state contracts: are we truly improving outcomes, or are we rewarding activity that appears meaningful, but fails to deliver measurable value?

Billions of dollars flow annually through federal programs, such as the Centers for Medicare and Medicaid Services (CMS), as well as state-administered health benefits programs and Medicaid managed care arrangements. These investments increasingly rely on external vendors to deliver care management, analytics, and member engagement. Yet despite this scale, performance evaluation frameworks often lag behind the complexity and financial exposure of the systems they are intended to govern.

a group of people sitting around a laptop computer
a group of people sitting around a laptop computer

In many cases, vendor performance is still assessed through process-oriented metrics, such as outreach volume, program participation, or service delivery counts. These indicators are easy to track, align with reporting requirements, and are often embedded in procurement and contract deliverables. However, they provide limited insight into whether interventions are improving health outcomes in real-world settings.

In practice, this creates a dangerous illusion of progress. Activity, no matter how well-documented, is not the same as impact. And when public programs and contracted entities conflate the two, the consequences extend beyond inefficiency: they affect program integrity, fiscal stewardship, and population health outcomes at scale. Through contract oversight and performance evaluation in a state-administered health benefits environment, one lesson became clear: effective vendor management is not about monitoring what vendors do; it is about holding them accountable for what they achieve. We need to move toward a structured, integrated evaluation model that aligns with the broader value-based transformation priorities of CMS and state health authorities.

Such a model must assess performance across clinical outcomes, financial results, patient experience, and health equity. These domains must not be treated as independent silos, but as a dynamic system in which each dimension influences the others. When patient experience declines, engagement weakens. When engagement weakens, adherence suffers. As adherence declines, clinical outcomes deteriorate, and costs rise. These relationships are not theoretical; they are observable across Medicare Advantage, Medicaid managed care, and state employee health programs. Yet many evaluation frameworks fail to capture this interdependence, instead relying on fragmented metrics that obscure true performance.

One of the most persistent challenges observed in public-sector oversight is the reliance on aggregate performance data. At a high level, results often appear strong, meeting contractual thresholds and reporting standards. However, when performance is stratified by race, socioeconomic status, geography, or clinical risk, the narrative often shifts. Certain populations consistently experience poorer outcomes. This reflects a well-established concept in epidemiology: the ecological fallacy, where group-level averages mask meaningful variation within subpopulations. In the context of federal and state programs, this has significant implications. Without stratified accountability, vendors can meet overall performance targets while underperforming for the very populations these programs are designed to serve.

Someone analyzes financial data on a tablet.
Someone analyzes financial data on a tablet.

Recognizing this, CMS and organizations, such as the National Committee for Quality Assurance (NCQA), have increasingly emphasized health equity measurement and stratified reporting. However, measurement alone is insufficient without aligning these metrics to contractual accountability and financial incentives.

Financial performance introduces an equally critical dimension. Within federal and state contracts, cost savings are often cited as indicators of success. Yet from an oversight perspective, it becomes essential to distinguish between true value creation and what might be described as “false savings.” Some savings reflect improved care coordination, reduced unnecessary utilization, and stronger alignment with primary care. Others, however, may result from reduced access, delayed care, or cost shifting across systems. In federally funded programs, where accountability for taxpayer dollars is paramount, this distinction is not academic; it is fundamental.

This is where claims target guarantees become particularly important. In practice, these guarantees are designed to limit the growth in cost per claim over a defined measurement period, often compared against a baseline period with a specified runout window for claims completion. While conceptually straightforward, their interpretation and enforcement can be complex. In contract oversight, I encountered instances where the definition of “runout” became a point of contention: whether a standard three-month runout applied, or whether a broader interpretation would dilute accountability. These nuances are not technicalities; they materially affect whether guarantees are enforceable and whether financial penalties are appropriately assessed.

Similarly, performance guarantees, including claims accuracy, timeliness, reporting quality, and member service standards, are often structured with a portion of administrative fees placed at risk. In theory, these guarantees create accountability. In practice, however, their effectiveness depends entirely on how rigorously they are measured, validated, and enforced.

In my experience, one of the most common gaps in contract compliance is not the absence of guarantees, but the failure to operationalize them. Vendors may meet reporting requirements while underlying performance issues remain unaddressed. Disputes often arise not because standards are unclear, but because measurement methodologies, data sources, or reconciliation timelines are inconsistently applied.

This challenge is compounded by poor underlying data infrastructure. Within the New Jersey State Health Benefits Program (SHBP) and School Employees’ Health Benefits Program (SEHBP), core eligibility and administrative data are housed in legacy systems, such as the State Health Benefits Information Processing System (SHIPS). While historically foundational, this architecture remains largely antiquated and insufficiently integrated with claims, clinical, and vendor-reported data systems. The lack of interoperability creates fragmentation across data sources, delays reconciliation, and limits the ability to conduct timely, accurate performance validation.

From a contract compliance perspective, this is not merely a technical limitation; it is a structural barrier to accountability. When eligibility, claims, and program data are not seamlessly integrated, performance measurement becomes reactive rather than proactive. Discrepancies take longer to identify, disputes are more difficult to resolve, and enforcement of guarantees becomes more complex. In effect, antiquated data architecture can dilute the very accountability mechanisms that contracts are designed to uphold.

This is where emerging technologies, particularly artificial intelligence (AI), present a meaningful opportunity to strengthen oversight and performance management. AI-driven analytics can be leveraged to identify abnormal claim patterns and detect utilization spikes in near real time, allowing organizations to intervene before costs escalate. Similarly, AI can enhance eligibility validation by identifying inconsistencies or anomalies across enrollment data, reducing errors that can distort both financial reporting and performance measurement. Beyond operational oversight, AI also enables predictive analysis that can inform public health prevention initiatives. By analyzing historical claims, clinical indicators, and utilization patterns, AI can help identify at-risk populations earlier and guide targeted interventions before high-cost events occur. In this way, AI shifts the model from reactive cost management to proactive population health strategy. When implemented thoughtfully, these capabilities can play a critical role in cost containment while improving outcomes.

However, the value of AI is contingent on the same foundational principles that govern all performance evaluation: data integrity, interoperability, and clear governance structures. Without reliable and integrated data, even the most advanced analytical tools will produce limited or misleading insights. Technology does not replace accountability; it amplifies it when properly aligned.

Implementation science provides a valuable framework for integrating these capabilities into real-world systems. It emphasizes that evaluation must extend beyond whether an intervention was implemented to whether it was effective, for whom it was effective, under what conditions, and whether it can be sustained within complex environments. Frameworks, such as RE-AIM (Reach, Effectiveness, Adoption, Implementation, and Maintenance) and CFIR (Consolidated Framework for Implementation Research), reinforce the importance of assessing both outcomes and context, particularly in large-scale federal and state programs where variability in population needs, provider networks, and operational environments is significant.

Two concepts are particularly relevant in this context: fidelity and validity. Fidelity assesses whether a program was delivered as intended, while validity ensures that the measures used accurately reflect meaningful outcomes. In contract oversight, overlooking either dimension can lead to a false sense of performance—programs may appear successful on paper while failing to deliver real-world impact. Perhaps the most critical lesson from public-sector health benefits operations is that measurement without enforcement does not drive accountability. Performance expectations must be clearly defined within procurement specifications, embedded in contract language, and tied to financial consequences.

Mechanisms, such as performance guarantees, at-risk fee structures, liquidated damages, and claims target guarantees are essential tools for aligning vendor incentives with program goals. Equally important are structured governance processes: regular performance reviews, escalation pathways, and corrective action plans that ensure issues are identified and addressed proactively. Without these mechanisms, performance evaluation remains largely symbolic, and vendor accountability becomes difficult to enforce in practice.

The greatest risk in population health, particularly within federally and state-funded programs, is not vendor failure. It is the misinterpretation of vendor performance. When activity is mistaken for value, the consequences extend beyond inefficiency. They affect program integrity, resource allocation, and the ability of public systems to deliver equitable, high-quality care.

The path forward requires a deliberate paradigm shift. Public programs must move from monitoring to accountability, from activity-based reporting to outcome-based evaluation, and from fragmented oversight to integrated performance management. This includes modernizing data architecture, strengthening interoperability across eligibility, claims, and clinical systems, responsibly leveraging artificial intelligence for real-time monitoring and predictive insights, embedding equity into performance measurement, and ensuring that contractual structures reinforce, not undermine, accountability.

The Population Health Vendor Accountability Model (PH-VAM) provides a structured, contract-driven approach to evaluating vendor performance in health benefits programs by integrating four core domains: clinical outcomes, cost, patient experience, and health equity, into a unified system of accountability. It shifts the focus from activity to measurable value by determining whether interventions improve outcomes, lower the total cost of care through true efficiency (not “false savings”), strengthen engagement, and deliver equitable results across populations. Grounded in implementation science, PH-VAM assesses both execution (fidelity) and measurement accuracy (validity), supported by robust data governance across eligibility, claims, and clinical systems. It reinforces accountability through performance guarantees, claims target guarantees, and financial risk arrangements.

To operationalize the model, organizations should establish outcome-based contract metrics, build interoperable data infrastructure, align performance with financial incentives, stratify results to address equity, and embed continuous improvement through the Plan-Do-Check-Act cycle. The future of population health, particularly within federal and state systems, will not be defined by the number of vendors engaged or programs implemented. It will be defined by organizations' ability to rigorously measure, validate, and enforce meaningful performance across diverse populations and complex delivery systems. In the end, value is not what vendors report. It is what public programs are willing to define, verify, and hold accountable.

About the Author Olu Albert is the President and CEO of Mello Health Strategy Group, a consulting firm specializing in health care strategy and population health solutions.