Journal: Cancer discovery
This study developed and validated PRE-Screen-HCC, an interpretable machine-learning framework to stratify hepatocellular carcinoma (HCC) risk using routinely collected clinical and biomarker data.
Using prospectively collected, multimodal data from more than 900,000 individuals and 983 HCC cases across two large population cohorts (one for model development and one for external validation), the authors integrated:
- Demographics
- Lifestyle factors
- Health records
- Blood measurements
- Genomics
- Metabolomics
They systematically evaluated how each of these data modalities, alone and in combination, contributed to HCC risk prediction.
The final models, based on random forests, showed superior performance compared with all currently available HCC risk scores on both internal and external test sets. The framework maintained robustness across different ethnic subgroups.
The authors emphasized interpretability, detailing individual and cumulative feature contributions, and they made the full code, model weights, and an online calculator available to enable external validation and potential integration into clinical or research workflows.
Overall, PRE-Screen-HCC is presented as a robust, interpretable, and externally validated risk stratification tool that could facilitate earlier detection of HCC using data types already common in large health systems and cohorts.