Unsupervised Machine Learning for Explainable Health Care Fraud Detection

Working Paper: NBER ID: w30946

Authors: Shubhranshu Shekhar; Jetson Lederluis; Leman Akoglu

Abstract: The US spends more than 4 trillion dollars per year on health care, largely conducted by private providers and reimbursed by insurers. A major concern in this system is overbilling, waste and fraud by providers, who face incentives to misreport on their claims in order to receive higher payments. In this work, we develop novel machine learning tools to identify providers that overbill insurers. Using large-scale claims data from Medicare, the US federal health insurance program for elderly adults and the disabled, we identify patterns consistent with fraud or overbilling among inpatient hospitalizations. Our proposed approach for fraud detection is fully unsupervised, not relying on any labeled training data, and is explainable to end users, providing reasoning and interpretable insights into the potentially suspicious behavior of the flagged providers. Data from the Department of Justice on providers facing anti-fraud lawsuits and case studies of suspicious providers validate our approach and findings. We also perform a post-analysis to understand hospital characteristics, those not used for detection but associate with a high suspiciousness score. Our method provides an 8-fold lift over random targeting, and can be used to guide investigations and auditing of suspicious providers for both public and private health insurance systems.

Keywords: health care fraud detection; machine learning; Medicare; unsupervised learning

JEL Codes: C19; D73; I13; K42; M42

Causal Claims Network Graph

Edges that are evidenced by causal inference methods are in orange, and the rest are in light blue.

Causal Claims

Cause	Effect
provider behavior (I11)	healthcare fraud (I18)
higher-than-expected billing (L97)	healthcare fraud (I18)
rare ICD-10 coding patterns (I12)	healthcare fraud (I18)
peer-based analysis of billing codes (I11)	healthcare fraud (I18)
provider behavior (I11)	suspiciousness ranking (D80)
suspiciousness ranking (D80)	further investigation (Y50)

Back to index