The research investigates whether five key blood markers (MCV, alkaline phosphatase, SGPT, SGOT, and GGT) can be used to classify levels of alcohol consumption and predict potential liver disorders. Unsupervised techniques such as Principal Component Analysis and k-means clustering were initially applied to assess the separability of drinking behaviors; however, these methods yielded limited differentiation. Subsequently, supervised learning models—including Logistic Regression, SVM, Random Forest, and Decision Trees—were implemented, with Logistic Regression emerging as the top performer by effectively capturing linear relationships among the biomarkers.
Despite these encouraging results, the models achieved only moderate accuracy and recall, suggesting that the current set of biomarkers provides only a partial picture of liver health related to alcohol intake. The study concludes that expanding the dataset with additional relevant biomarkers and exploring more advanced modeling techniques may enhance predictive performance, thereby offering valuable insights for early detection and intervention in liver disorders.
More details in report below.