A Classification Approach for Cancer Survivors from Those Cancer-Free, Based on Health Behaviors: Analysis of the Lifelines Cohort
Abstract:
Health behaviors affect health status in cancer survivors. We hypothesized that nonlinear algorithms would identify distinct key health behaviors compared to a linear algorithm and better classify cancer survivors. We aimed to use three nonlinear algorithms to identify such key health behaviors and compare their performances with that of a logistic regression for distinguishing cancer survivors from those without cancer in a population-based cohort study. We used six health behaviors and three socioeconomic factors for analysis. Participants from the Lifelines population-based cohort were binary classified into a cancer-survivors group and a cancer-free group using either nonlinear algorithms or logistic regression, and their performances were compared by the area under the curve (AUC). In addition, we performed case-control analyses (matched by age, sex, and education level) to evaluate classification performance only by health behaviors. Data were collected for 107,624 cancer free participants and 2760 cancer survivors. Using all variables resulted an AUC of 0.75 ± 0.01, using only six health behaviors, the logistic regression and nonlinear algorithms differentiated cancer survivors from cancer-free participants with AUCs of 0.62 ± 0.01 and 0.60 ± 0.01, respectively. The main distinctive classifier was age. Though not relevant to classification, the main distinctive health behaviors were body mass index and alcohol consumption. In the case-control analyses, algorithms produced AUCs of 0.52 ± 0.01. No key health behaviors were identified by linear and nonlinear algorithms to differentiate cancer survivors from cancer-free participants in this population-based cohort.