AI better predicts back surgery outcomes – Futurity: Research News
Posted: June 11, 2024 at 2:48 am
Share this Article
You are free to share this article under the Attribution 4.0 International license.
Researchers who had been using Fitbit data to help predict surgical outcomes have a new method to more accurately gauge how patients may recover from spine surgery.
Using machine-learning techniques, researchers worked to develop a way to more accurately predict recovery from lumbar spine surgery.
The results, published in the journal Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, show that their model outperforms previous models to predict spine surgery outcomes.
This is important because in lower back surgery and many other types of orthopedic operations, outcomes vary widely depending on the patients structural disease but also on varying physical and mental health characteristics across patients.
Surgical recovery is influenced by both physical and mental health before the operation. Some people may have excessive worry in the face of pain that can make pain and recovery worse. Others may suffer from physiological problems that worsen pain. If physicians can get a heads-up on the various pitfalls a patient faces, they can better tailor treatment plans.
By predicting the outcomes before the surgery, we can help establish some expectations and help with early interventions and identify high risk factors, says first author Ziqi Xu, a PhD student in the lab of Chenyang Lu, a professor in the McKelvey School of Engineering at Washington University in St. Louis.
Previous work in predicting surgery outcomes typically used patient questionnaires given once or twice in clinics, capturing a static slice of time.
It failed to capture the long-term dynamics of physical and psychological patterns of the patients, Xu says. Prior work training machine-learning algorithms focused on just one aspect of surgery outcome but ignored the inherent multidimensional nature of surgery recovery, she adds.
Researchers have used mobile health data from Fitbit devices to monitor and measure recovery and compare activity levels over time. But the new research has shown that activity data, plus longitudinal assessment data, is more accurate in predicting how the patient will do after surgery, says Jacob Greenberg, an assistant professor of neurosurgery at the School of Medicine.
The current work offers a proof of principle showing that, with multimodal machine learning, doctors can see a more accurate big picture of the interrelated factors that affect recovery. Before beginning this work, the team first laid out the statistical methods and protocol to ensure they were feeding the artificial intelligence system the right balanced diet of data.
Previously, the team had published work in the journal Neurosurgery showing for the first time that patient-reported and objective wearable measurements improve predictions of early recovery compared to traditional patient assessments.
In addition to Greenberg and Xu, Madelynn Frumkin, a PhD student studying psychological and brain sciences in Thomas Rodebaughs laboratory, was a co-first author on that work. Wilson Zack Ray, a professor of neurosurgery at the School of Medicine, was co-senior author, along with Rodebaugh and Lu. Rodebaugh is now at the University of North Carolina at Chapel Hill.
In that research, they show that Fitbit data can be correlated with multiple surveys that assess a persons social and emotional state. They collected that data via ecological momentary assessments (EMAs) that employ smartphones to give patients frequent prompts to assess mood, pain levels, and behavior multiple times throughout day.
We combine wearables, EMA, and clinical records to capture a broad range of information about the patients, from physical activities to subjective reports of pain and mental health, and to clinical characteristics, Lu says.
Greenberg adds that state-of-the-art statistical tools that Rodebaugh and Frumkin have helped advance, such as Dynamic Structural Equation Modeling, were key in analyzing the complex, longitudinal EMA data.
For the most recent study, they took all those factors and developed a new machine-learning technique of Multi-Modal Multi-Task Learning to effectively combine these different types of data to predict multiple recovery outcomes.
In this approach, the AI learns to weigh the relatedness among the outcomes while capturing their differences from the multimodal data, Lu adds.
This method takes shared information on interrelated tasks of predicting different outcomes and then leverages the shared information to help the model understand how to make an accurate prediction, according to Xu.
It all comes together in the final package, producing a predicted change for each patients post-operative pain interference and physical function score.
Greenberg says the study is ongoing as the researchers continue to fine-tune their models so they can take more detailed assessments, predict outcomes and, most notably, understand what types of factors can potentially be modified to improve longer-term outcomes.
Funding for the study came from AO Spine North America, the Cervical Spine Research Society, the Scoliosis Research Society, the Foundation for Barnes-Jewish Hospital, Washington University/BJC Healthcare Big Ideas Competition, the Fullgraf Foundation, and the National Institute of Mental Health.
Source: Washington University in St. Louis
Read more from the original source:
AI better predicts back surgery outcomes - Futurity: Research News
How to think about the economics of AI – Top1000funds.com
Posted: at 2:48 am
The most underrated area of innovation in artificial intelligence is not in computing, nor is it in the development of algorithms or techniques for data collection. It is in the human ability to recast problems in terms of predictions.
Leading economist and academic Ajay Agrawal told the Fiduciary Investors Symposium in Toronto that it helps to think of AI and machine learning as simply a drop in the cost of prediction.
Agrawal serves as the Geoffrey Taber Chair in Entrepreneurship and Innovation at the University of Torontos Rotman School of Management, as well as being aProfessor of Strategic Management.
AI is computational statistics that does prediction, Agrawal said.
Thats all it is. And so, on the one hand, that seems very limiting. On the other hand, the thing thats so remarkable about it is all the things weve discovered that we can do with high fidelity prediction.
Agrawal said prediction is, in simple terms, taking information you have to generate information you dont have. And its the creativity of people to recast problems, that none of us in this room characterised as prediction problems, into prediction that underpins developments in and the potential of AI, he said.
Five years ago, probably nobody in this room would have said driving is a prediction problem.
Very few people in the room would have said translation is a prediction problem. Very few of you would have said replying to email is a prediction problem. But thats precisely how were solving all those things today.
Whether its predictive text when replying to an email or enhancing investment performance, the supporting AI systems are all implementations of statistics and prediction, Agrawal said.
These prediction models reached a zenith in large language models (LLMs), where machines were trained on how to predict the next most likely word in a sequence of words that made up sentences, paragraphs and whole responses.
If you think about language, lets say English, every book, every poem, every scripture that youve ever read, is a resequencing of the samecharacters: 26 letters, a few punctuation marks just re-sequenced over and over again makes all the books. What if we could do that with actions? Agrawal said.
The principles of LLMs (next most likely word) are now being applied to large behavioural models robots by training them to predict the next most likely verb or action.
In that case, we could take all the tasks think about everyone that you know, every job they do, and every job probably has 30 or 40 different tasks, so theres hundreds of thousands of tasks. But what if all those tasks are just really sequences of a small number of verbs?
So what theyre doing is theyre training that robots to do a handful verbs 50, 80, 120 verbs. Then you give the robot a prompt, just like chat GPT. You say to the robot, can you please unpack those boxes and put the tools on the shelf? The robot hears the prompt, and then predicts what is the optimal sequence of verbs in order to complete the task.
It is, Agrawal said, another application of prediction.
Agrawal said that businesses and industries are now facing a tidal wave of problems that have been recast as prediction problems.
So we now are pointing machine intelligence at many of these.
The problem is, it has come so hard and so fast, that people seem to be struggling with where do we start? And how do we actually point this towards something useful?
Agrawal said it pays to be very specific about the metric or the performance measure that needs to be improved, and then [point] the AI at that.
AIs are mathematical optimisers, they have to know what theyre optimising towards, he said.
If the problem is a tidal wave of new solutions available, and the problem is we dont know how to harness it, here is a way to think about the solution a short-term and a long-term strategy.
Agrawal said short-term strategies are basically productivity enhancements. Theyre deployable within a year, aim for 20 per cent productivity gains, and have a payback period of no more than two years.
And heres the key point, no change in the workflow, he said.
In other words, its truly a technology project where you just drop it in, but the rest of the system stays the same.
Long-term strategies take longer to deploy but theyre genuine game-changers, offering gains 10 times or more greater than short-term deployments. But critically, they require a redesign of workflows. Agrawal said AI, like electricity, is a general-purpose technology,
a useful analogy is when factories were first electrified and started to move away from stream-powered engines.
In the first 20 years after electricity was invented, there was very low take-up less than 3 per cent of factories used electricity, and when they did, the main value propositionwas it will reduce your input costs by doing things like replacing gas lamps.
Nobody wanted to tear apart their existing infrastructure in order to have that marginal benefit, Agrawal said.
The only ones that were experimenting with electricity were entrepreneurs building new factories, and even then, most of them said, No, I want to stick with what I know in terms of factory design.
But a few entrepreneurs realised there was a chance to completely reimagine and redesign a factory that was powered by electricity, because no longer was it dependent on transmitting power from engines outside the factory via long steel shafts to drive the factory machinery.
When the shafts became obsolete, so did the large columns inside the factories to support them. And that opened the door to lightweight, lower-cost construction, and factory design and layout changed to having everything on one level.
They redesigned the entire workflow, Agrawal said.
The machines, the materials, the material handling, the people flow, everything [was] redesigned. Some of the factories got up to 600 per cent productivity lift.
Agrawal said initially, the productivity differences between electrified and non-electrified factories were very small.
You could be operating a non-electrified factory and think those guys who want the newfangled electricity, its more trouble than its worth, he said.
But the but the productivity benefits just started taking off from electricity.
Now were seeing the same thing with machine intelligence [and] the adoption rate of AI.
However, Agrawal said the characteristic that makes AI different from every other tool weve ever had in human history, is this the only one that learns from us.
He said this explains the headlong development rush and the commitment of so much capital to the technology.
The way AI works is that whoever gets an early lead, their AI gets better; when their AI gets better, they get more users; when they get more users, they get more data; when they get more data, then the AI the prediction improves, he said.
And so, once they get that flywheel turning, it gets very hard to catch up to them.
Agrawal said AI and machine learning is developing so quickly its virtually impossible for companies and businesses to keep up, let alone implement and adapt.
The thing I would pay attention to is not so much the technology capability, because obviously thats important and its moving quickly, he said.
But what Im watching are the unit economics of the companies who are first experimenting with it, and then putting it into production, he said.
Cost just keeps going down because the AI is learning and getting better. And so that, like my sense there is, just pay very laser-close attention to the unit economics of what it costs to do a thing.
And you can go right down the stack of every good and service watching how, when you start applying these machine intelligence solutions to that thing, do the unit economics change?
Read more from the original source:
Deep learning models for predicting the survival of patients with hepatocellular carcinoma based on a surveillance … – Nature.com
Posted: at 2:48 am
Data description
In this study, 35,444 HCC patients were screened from the SEER database between 2010 and 2015, with 2197 patients meeting the criteria for inclusion. Table 1 shows the patients main baseline clinical characteristics (eTable 1 in the Supplement). Among the 2197 participants, 70% (n=1548) were aged 66years and below, 23% (n=505) were between 66 and 77years old, and 6.6% (n=144) were over 77years old. Male participants accounted for 78% (n=1915), while females represented 22% (n=550). In terms of race, the majority of participants were White, accounting for 66% (n=1455), followed by Asians or Pacific Islanders at 22% (n=478), Black individuals at 10% (n=228), and Native Americans/Alaskan Natives at only 1.6% (n=36). Regarding marital status, 60% (n=1319) were married, and the remaining 40% (n=878) were of other marital statuses. Histologically, most participants (98%, n=2154) were of type 8170. Additionally, 50% (n=1104) of the patients were grade II differentiated, 18% (n=402) were grade III, 1.0% (n=22) were grade IV, and 30% (n=669) were grade I. In terms of tumor staging, 48% (n=1054) of participants were at stage I, 29% (n=642) at stage II, 16% (n=344) at stage III, and 7.1% (n=157) at stage IV. Regarding the TNM classification, 49% (n=1079) were T1, 31% (n 1=677) were T2, 96% (n=2114) were N0, and 95% (n=2090) were M0. 66% (n=1444) of the participants had a positive/elevated AFP. 70% (n=1532) showed high levels of liver fibrosis. 92% (n=2012) had a single tumor, while the remaining 8.4% (n=185) had multiple tumors. 32% (n=704) underwent lobectomy, 14% (n=311) underwent local tumor destruction, 34% (n=753) had no surgery, and 20% (n=429) underwent wedge or segmental resection. Finally, 2.1% (n=46) received radiation therapy, with 62% (n=1352) not receiving chemotherapy and 38% (n=855) undergoing chemotherapy. The average overall survival (OS) in months for participants was 4534months, with 1327 (60%) surviving at the end of follow-up.
Following univariate Cox regression analysis, we identified several factors significantly correlated with the survival rate of hepatocellular carcinoma patients (p<0.05). These factors included age, race, marital status, histological type, tumor grade, tumor stage, T stage, N stage, M stage, alpha-fetoprotein levels, tumor size, type of surgery, and chemotherapy status. These variables all significantly impacted patient survival in the univariate analysis. However, in the multivariate Cox regression analysis, we further confirmed that only age, marital status, histological type, tumor grade, tumor stage, and tumor size were independent factors affecting patient survival (p<0.05) (Table 1). Additionally, through collinearity analysis, we observed a significant high degree of collinearity between tumor staging (Stage) and the individual stages of T, N, and M (Fig.1). This phenomenon occurs primarily because the overall tumor stage (Stage) is directly determined based on the results of the TNM assessment. This collinearity suggests the need for cautious handling of these variables during modeling to avoid overfitting and reduced predictive performance. Despite certain variables not being identified as independent predictors in multivariable analysis, we incorporated them into the construction of our deep learning model for several compelling reasons. Firstly, these variables may capture subtle interactions and nonlinear relationships that are not readily apparent in traditional regression models, but can be discerned through more sophisticated modeling techniques such as deep learning. Secondly, including a broader set of variables may enhance the generalizability and robustness of the model across diverse clinical scenarios, allowing it to better account for variations among patient subgroups or treatment conditions. Based on this analysis, we ultimately selected 12 key factors (age, race, marital status, histological type, tumor grade, T stage, N stage, M stage, alpha-fetoprotein, tumor size, type of surgery, chemotherapy) for inclusion in the construction of the predictive model. We divided the dataset into two subsets: a training set containing 1537 samples and a test set containing 660 samples (Table 2). By training and testing the model on these data, we aim to develop a model that can accurately predict the survival rate of hepatocellular carcinoma patients, assisting in clinical decision-making and improving patient prognosis.
Correlation coeffcients for each pair of variables in the data set.
Initially, we conducted fivefold cross-validation on the training set and performed 1000 iterations of random search. Among all these validations, we selected parameters that showed the highest average concordance index (C-index) and identified them as the optimal parameters. Figure2 displays the loss function graphs for the two deep learning models, NMTLR and DeepSurv. This set of graphs reveals the loss changes of these two models during the training process.
Loss convergence graph for (A) DeepSurv, (B) neural network multitask logistic regression (N-MTLR) models.
When comparing the machine learning models with the standard Cox Proportional Hazards (CoxPH) model in terms of predictive performance, Table 3 presents the performance of each model on the test set. In our analysis, we employed the log-rank test to compare the concordance indices (C-index) across models. The results indicated that the three machine learning modelsDeepSurv, N-MTLR, and RSFdemonstrated significantly superior discriminative ability compared to the standard Cox Proportional Hazards (CoxPH) model (p<0.01), as detailed in Table 4. Specifically, the C-index for DeepSurv was 0.7317, for NMTLR was 0.7353, and for RSF was 0.7336, compared to only 0.6837 for the standard CoxPH model. Among these three machine learning models, NMTLR had the highest C-index, demonstrating its superiority in predictive performance. Further analysis of the Integrated Brier Score (IBS) for each model revealed that the IBS for the four models were 0.1598 (NMTLR), 0.1632 (DeepSurv), 0.1648 (RSF), and 0.1789 (CoxPH), respectively (Fig.3). The NMTLR model had the lowest IBS value, indicating its best performance in terms of uncertainty in the predictions. Additionally, there was no significant difference between the C-indices obtained from the training and test sets, suggesting that the NMTLR model has better generalization performance in the face of real-world complex data and can effectively avoid the phenomenon of overfitting.
Through calibration plots (Fig.4), we observed that the NMTLR model demonstrated the best consistency between model predictions and actual observations in terms of 1-year, 3-year, and 5-year overall survival rates, followed by the DeepSurv model, RSF model, and CoxPH model. This consistency was also reflected in the AUC values: for the prediction of 1-year, 3-year, and 5-year survival rates, the NMTLR and DeepSurv models had higher AUC values than the RSF and CoxPH models. Specifically, the 1-year AUC values were 0.803 for NMTLR and 0.794 for DeepSurv, compared to 0.786 for RSF and 0.766 for CoxPH; the 3-year AUC values were 0.808 for NMTLR and 0.809 for DeepSurv, compared to 0.797 for RSF and 0.772 for CoxPH; the 5-year AUC values were 0.819 for both DeepSurv and NMTLR, compared to 0.812 for RSF and 0.772 for CoxPH. The results indicate that, in predicting the survival prognosis of patients with hepatocellular carcinoma, the deep learning modelsDeepSurv and NMTLRdemonstrate higher accuracy than the RSF and the classical CoxPH models. The NMTLR model significantly exhibited the best performance in multiple evaluation metrics.
The receiver operating curves (ROC) and calibration curves for 1-, 3-, 5-year survival predictions. ROC curves for (A) 1-, (C) 3-, (E) 5-year survival predictions. Calibration curves for (B) 1-, (D) 3-, (F) 5-year survival predictions.
In the feature analysis of deep learning models, the impact of a feature on model accuracy when its values are replaced with random data can be measured by the percentage decrease in the concordance index (C-index). A higher decrease percentage indicates the feature's significant importance in maintaining the model's predictive accuracy. Figure5 shows the feature importance heatmaps for the DeepSurv, NMTLR, and RSF models.
Heatmap of feature importance for DeepSurv, neural network multitask logistic regression (NMTLR) and random survival forest (RSF) models.
In the NMTLR model, the replacement of features such as age, race, marital status, histological type, tumor grade, T stage, N stage, alpha-fetoprotein, tumor size, type of surgery, and chemotherapy led to an average decrease in the concordance index by more than 0.1%. In the DeepSurv model, features like age, race, marital status, histological type, T stage, N stage, alpha-fetoprotein, tumor size, and type of surgery saw a similar average decrease in the concordance index when replaced with random data. In the RSF model, we found that features including age, race, tumor grade, T stage, M stage, tumor size, and type of surgery significantly impacted the model's accuracy, as evidenced by a noticeable decrease in the C-index, averaging a reduction of over 0.1% when replaced with random data.
In the training cohort, the NMTLR model was employed to predict patient risk probabilities. Optimal threshold values for these probabilities were determined using X-tile software. Patients were stratified into low-risk (<178.8), medium-risk (178.8248.4), and high-risk (>248.4) categories based on these cutoff points. Statistically significant differences were observed in the survival curves among the groups, with a p-value of less than 0.001, as depicted in Fig.6A. Similar results were replicated in the external validation cohort, as shown in Fig.6B, underscoring the robust risk stratification capability of the NMTLR model.
KaplanMeier curves evaluated the risk stratification ability of NMTLR model.
The web application developed in this study, primarily intended for research or informational purposes, is publicly accessible at http://120.55.167.119:8501/. The functionality and output visualization of this application are illustrated in Fig.7 and eFigure 1 in the Supplement.
The online web-based application of NMTLR model.
Visit link:
Enhancing customer retention in telecom industry with machine learning driven churn prediction | Scientific Reports – Nature.com
Posted: at 2:48 am
Kimura, T. Customer churn prediction with hybrid resampling and ensemble learning. J. Manag. Inform. Decis. Sci. 25(1), 123 (2022).
MathSciNet Google Scholar
Lalwani, P., Mishra, M.K., Chadha, J.S. and Sethi, P. Customer churn prediction system: a machine learning approach.Computing, pp.124 (2022).
Hadden, J., Tiwari, A., Roy, R. & Ruta, D. Computer assisted customer churn management: State-of- the-art and future trends. Comput. Oper. Res. 34(10), 29022917 (2007).
Article Google Scholar
Rajamohamed, R. & Manokaran, J. Improved credit card churn prediction based on rough clustering and supervised learning techniques. Clust. Comput. 21(1), 6577 (2018).
Article Google Scholar
Backiel, A., Baesens, B. & Claeskens, G. Predicting time-to-churn of prepaid mobile telephone customers using social network analysis. J. Operat. Res. Soc. 67(9), 11351145. https://doi.org/10.1057/jors.2016.8 (2016).
Article Google Scholar
Zhu, B., Baesens, B. & Vanden Broucke, S. K. An empirical comparison of techniques for the class imbalance problem in churn prediction. Inform. Sci. 408, 8499. https://doi.org/10.1016/j.ins.2017.04.015 (2017).
Article Google Scholar
Vijaya, J. & Sivasankar, E. Computing efficient features using rough set theory combined with ensemble classification techniques to improve the customer churn prediction in telecommunication sector. Computing 100(8), 839860 (2018).
Article Google Scholar
Ahmad, S. N. & Laroche, M. S. Analyzing electronic word of mouth: A social commerce construct. Int. J. Inform. Manag. 37(3), 202213 (2017).
Article Google Scholar
Gaurav Gupta, S. A critical examination of different models for customer churn prediction using data mining. Int. J. Eng. Adv. Technol. 6(63), 850854 (2019).
Google Scholar
Abbasimehr, H., Setak, M. & Tarokh, M. A neuro-fuzzy classifier for customer churn prediction. Int. J. Comput. Appl. 19(8), 3541 (2011).
Google Scholar
Kumar, S. & Kumar, M. Predicting customer churn using artificial neural network. In Engineering Applications of Neural Networks: 20th International Conference, EANN 2019, Xersonisos, Crete, Greece, May 24-26, 2019, Proceedings (eds Macintyre, J. et al.) 299306 (Springer International Publishing, 2019). https://doi.org/10.1007/978-3-030-20257-6_25.
Chapter Google Scholar
Sharma, T., Gupta, P., Nigam, V. & Goel, M. Customer churn prediction in telecommunications using gradient boosted trees. In International Conference on Innovative Computing and Communications: Proceedings of ICICC 2019 Vol. 2 (eds Khanna, A. et al.) 235246 (Springer Singapore, 2020). https://doi.org/10.1007/978-981-15-0324-5_20.
Chapter Google Scholar
Umayaparvathi, V. & Iyakutti, K. A survey on customer churn prediction in telecom industry: Datasets, methods and metrics. Int. Res. J. Eng. Technol. 4(4), 10651070 (2016).
Google Scholar
Ahmad, A. K., Jafar, A. & Aljoumaa, K. Customer churn prediction in telecom using machine learning in big data platform. J. Big Data 6(1), 28 (2019).
Article Google Scholar
Extracted from: https://www.kaggle.com/competitions/customer-churn-prediction-2020/data?select=test.csv
Mishra, A. & Reddy, U. S. A comparative study of customer churn prediction in telecom industry using ensemble based classifiers. In 2017 International Conference on Inventive Computing and Informatics (ICICI). IEEE, 721725. (2017)
Coussement, K., Lessmann, S. & Verstraeten, G. A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry. Decis. Support Syst. 95, 2736 (2017).
Article Google Scholar
Wang, Q. F., Xu, M. & Hussain, A. Large-scale ensemble model for customer churn prediction in search ads. Cogn. Comput. 11(2), 262270 (2019).
Article Google Scholar
Hashmi, N., Butt, N. A. & Iqbal, M. Customer churn prediction in telecommunication a decade review and classification. Int. J. Comput. Sci. Issues 10(5), 271 (2013).
Google Scholar
Eria, K. & Marikannan, B. P. Systematic review of customer churn prediction in the telecom sector. J. Appl. Technol. Innovat. 2(1), 714 (2018).
Google Scholar
Brnduoiu, I., Toderean, G. & Beleiu, H. Methods for churn prediction in the pre-paid mobile telecommunications industry. In 2016 International conference on communications (COMM), 97100. IEEE. (2016)
Singh, M., Singh, S., Seen, N., Kaushal, S., & Kumar, H. Comparison of learning techniques for prediction of customer churn in telecommunication. In 2018 28th International Telecommunication Networks and Applications Conference (ITNAC) IEEE, pp. 15. (2018)
Lee, E. B., Kim, J. & Lee, S. G. Predicting customer churn in the mobile industry using data mining technology. Ind. Manag. Data Syst. 117(1), 90109 (2017).
Article Google Scholar
Bharadwaj, S., Anil, B. S., Pahargarh, A., Pahargarh, A., Gowra, P. S., & Kumar, S. Customer Churn Prediction in Mobile Networks using Logistic Regression and Multilayer Perceptron (MLP). In 2018 Second International Conference on Green Computing and Internet of Things (ICGCIoT), IEEE. pp. 436438, (2018)
Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785794. (2016)
Dhaliwal, S. S., Nahid, A. A. & Abbas, R. Effective intrusion detection system using XGBoost. Information 9(7), 149 (2018).
Article Google Scholar
Baesens, B., Hppner, S. & Verdonck, T. Data engineering for fraud detection. Decis. Support Syst. 150, 113492 (2021).
Article Google Scholar
Zhou, H., Chai, H. F. & Qiu, M. L. Fraud detection within bankcard enrollment on mobile device based payment using machine learning. Front. Inform. Technol. Electron. Eng. 19(12), 15371545 (2018).
Article Google Scholar
Pamina, J., Raja, B., SathyaBama, S. & Sruthi, M. S. An effective classifier for predicting churn in telecommunication. J. Adv. Res. Dyn. Control Syst. 11, 221229 (2019).
Google Scholar
Kuhn, M. & Johnson, K. Applied Predictive Modeling 26th edn. (Springer, 2013).
Book Google Scholar
Yijing, L., Haixiang, G., Xiao, L., Yanan, L. & Jinling, L. Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowl. -Based Syst. 94, 88104 (2016).
Article Google Scholar
Verbeke, W., Martens, D., Mues, C. & Baesens, B. Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Syst. Appl. 38(3), 23542364 (2011).
Article Google Scholar
Burez, J. & Van den Poel, D. Handling class imbalance in customer churn prediction. Expert Syst. Appl. 36(3), 46264636 (2009).
Article Google Scholar
Lpez, V., Fernndez, A., Garca, S., Palade, V. & Herrera, F. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inform. Sci. 250, 113141 (2013).
Article Google Scholar
Kaur, H., Pannu, H. S. & Malhi, A. K. A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Comput. Surv. (CSUR) 52(4), 136 (2019).
Google Scholar
Salunkhe, U. R. & Mali, S. N. A hybrid approach for class imbalance problem in customer churn prediction: A novel extension to under-sampling. Int. J. Intell. Syst. Appl. 11(5), 7181 (2018).
Google Scholar
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H. & Herrera, F. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C 42(4), 463484. https://doi.org/10.1109/TSMCC.2011.2161285 (2012).
Article Google Scholar
Singh, A. & Purohit, A. A survey on methods for solving data imbalance problem for classification. Int. J. Comput. Appl. 127(15), 3741 (2015).
Google Scholar
Schaefer, G., Krawczyk, B., Celebi, M. E. & Iyatomi, H. An ensemble classification approach for melanoma diagnosis. Memetic Comput. 6(4), 233240 (2014).
Article Google Scholar
Salunkhe, U. R. & Mali, S. N. Classifier ensemble design for imbalanced data classification: A hybrid approach. Proc. Comput. Sci. 85, 725732 (2016).
Article Google Scholar
Liu, X. Y., Wu, J. & Zhou, Z. H. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(2), 539550 (2008).
Google Scholar
Haixiang, G., Yijing, L., Shang, J. & Mingyun, G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 73, 220239 (2017).
Article Google Scholar
Douzas, G., Bacao, F. & Last, F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inform. Sci. 465, 120. https://doi.org/10.1016/j.ins.2018.06.056 (2018).
Article Google Scholar
Mahesh, B. Machine learning algorithms-a review. Int. J. Sci. Res. 9, 381386 (2020).
Google Scholar
Bonaccorso, G. Machine Learning Algorithms (Packt Publishing Ltd., 2017).
Google Scholar
Ray, S. A quick review of machine learning algorithms. In2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon). IEEE. pp. 3539, (2019)
Singh, A., Thakur, N. and Sharma, A., A review of supervised machine learning algorithms. In2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 13101315. 2016
Ayodele, T. O. Types of machine learning algorithms. New Adv. Mach. Learn. 3, 1948 (2010).
Google Scholar
Sagi, O. & Rokach, L. Ensemble learning: A survey. Wiley Interdisciplin. Rev.: Data Min. Knowled. Discov. 8(4), e1249 (2018).
Google Scholar
Zhang, C. & Ma, Y. (eds) Ensemble Machine Learning: Methods and Applications (Springer Science & Business Media, 2012).
Google Scholar
Amin, A., Adnan, A. & Anwar, S. An adaptive learning approach for customer churn prediction in the telecommunication industry using evolutionary computation and Nave Bayes. Appl. Soft Comput. 137, 110103 (2023).
Article Google Scholar
Amin, A. et al. Customer churn prediction in the telecommunication sector using a rough set approach. Neurocomputing 237, 242254 (2017).
Article Google Scholar
Amin, A., Shah, B., Khattak, A. M., Baker, T., & Anwar, S. Just-in-time customer churn prediction: With and without data transformation. In2018 IEEE congress on evolutionary computation (CEC), IEEE, pp. 16. (2018).
Amin, A., Shah, B., Abbas, A., Anwar, S., Alfandi, O., & Moreira, F. Features weight estimation using a genetic algorithm for customer churn prediction in the telecom sector. InNew Knowledge in Information Systems and Technologies: Vol. 2. Springer International Publishing. pp. 483491, (2019)
Chaubey, G. et al. Customer purchasing behavior prediction using machine learning classification techniques. J. Ambient Intell. Hum. Comput. https://doi.org/10.1007/s12652-022-03837-6 (2022).
Article Google Scholar
Thomas, W. E., & David, O. M. Chapter 4exploratory study.Research methods for cyber security, Syngress, 95130 (2017).
View post:
Developing a prognostic model using machine learning for disulfidptosis related lncRNA in lung adenocarcinoma … – Nature.com
Posted: at 2:48 am
Identification of prognostically relevant DRLs and construction of prognostic models
In our investigation of the LUAD landscape, we analyzed 16,882 lncRNAs derived from the TCGA-LUAD database. This comprehensive evaluation led to the identification of 708 DRLs, which demonstrate significant interactions with DRGs, as depicted in a sankey diagram (Fig.2A). Through further analysis incorporating data from three GEO databases, we narrowed these DRLs down to 199 lncRNAs consistently present across datasets, suggesting a pivotal role in LUAD pathogenesis (Fig.2B). Our prognostic assessment using univariate cox regression analysis revealed 37 lncRNAs with significant implications for LUAD patient outcomes (Fig.2C). Leveraging these lncRNAs, we constructed a predictive model employing an ensemble of machine learning techniques, with the ensemble model (Supplementary Table 2) achieving a notably high C-index of 0.677[95% confidence interval (CI) 0.63 to 0.73], suggesting robust predictive performance (Fig.2D). This model's effectiveness was further validated through a risk stratification system, categorizing patients into high and low-risk groups based on their lncRNA expression profiles. This stratification was substantiated by principal component analysis (PCA), which confirmed the distinct separation between the risk groups, underscoring the potential of our model in clinical risk assessment (Fig.2E).
Construction of prognostic model composed of 27 DRLs. (A) Sankey diagram illustrating the relationship between DRGs and associated lncRNAs. (B) The intersection of DRLs sourced from the TCGA database and GEO database. (C) 27 lncRNAs after univariate Cox regression. (D) 101 prediction models evaluated, with C-index calculated for each across all validation datasets. (E) Principal Component Analysis of the low-risk and high-risk cohorts based on 27 DRLs.
Our survival analysis using the TCGA-LUAD dataset revealed a significant distinction in OS between the high- and low-risk groups identified through our model (p<0.001, log-rank test) (Fig.3A). This finding was consistently replicated across three independent GEO datasets, demonstrating significant differences in both OS (GSE31210, p=0.001; GSE30219, p=0.019; GSE50081, p=0.025) (Fig.3BD) and DFS (GSE31210, p<0.001; GSE30219, p=0.009; GSE50081, p=0.023) (Supplementary Fig. S1AC). The predictive power of the risk score was superior to that of traditional prognostic factors such as age, gender, and staging, as evidenced by the C-index comparison (Supplementary Fig. S1D). The risk score also emerged as an independent prognostic indicator in our univariate and multivariate cox analyses (p<0.001) (Supplementary Table 3). Multicollinearity within the model was assessed using the variance inflation factor, which was below 10 for all variables (Supplementary Table 4). The AUC analysis further validated the robustness of our model, with one-year, two-year, and three-year AUCs of 0.76, 0.72, and 0.74, respectively, in the TCGA-LUAD dataset (Fig.3F). The external validation using GEO datasets underscored the model's accuracy, particularly notable in GSE30219, GSE50081 and GSE31210 for the evaluated intervals (Fig.3G,I).
Efficacy of the DRLs Survival Prognostic Risk Model. KaplanMeier (KM) analysis for high-risk and low-risk groups are exhibited in (A) TCGA-LUAD, (B) GSE31210, (C) GSE30219 and (D)GSE50081. (E) KaplanMeier (KM) survival curves for mutant and non-mutant groups. Analysis of 1-, 2-, and 3-year ROC curves for (F) TCGA-LUAD, (G) GSE30219, (H) GSE50081, and (I) GSE31210.
Further analysis showed gender-specific differences in risk scores across various pathological stages. In early stages (I and II), men exhibited significantly higher risk scores compared to women (Stage I: p=0.015; Stage II: p=0.006; Wilcoxon test) (Supplementary Fig. S2A,B). However, these differences were not observed in later stages (III/IV) (p=0.900, Wilcoxon test) (Supplementary Fig. S2C), suggesting stage-specific risk dynamics. In addition, our study uncovered notable disparities in risk scores among patients with mutations in EGFR, ALK, and KRAS genes in the GSE31210 dataset (p<0.001, KruskalWallis test) (Supplementary Fig. S2D). Patients harboring these mutations also exhibited better OS compared to those without (p=0.018, log-rank test) (Fig.3E), highlighting the potential prognostic relevance of genetic profiles in LUAD. The impact of smoking, a known risk factor for LUAD, was evident as significant differences in risk scores between smokers and non-smokers were observed in analyses of the GSE30210 and GSE50081 datasets (GSE31210, p=0.003; GSE50081, p=0.027; Wilcoxon test) (Supplementary Fig. S2E,F).
To enhance our model's utility in clinical decision-making, we developed a nomogram that incorporates the identified risk scores alongside essential clinical parametersage, gender, and TNM staging. This integration aims to provide a more comprehensive tool for predicting the prognosis of LUAD patients (Fig.4A). We rigorously validated the nomogram's predictive accuracy using calibration curves, which compare the predicted survival probabilities against the observed outcomes. The results demonstrated a high degree of concordance, indicating that our nomogram accurately reflects patient survival rates (Fig.4B). Further assessment through DCA (Fig.4C-E) confirmed that the nomogram provides substantial clinical benefit. Notably, the analysis showed that the nomogram significantly outperforms the predictive capabilities of the risk score alone, particularly in terms of net benefit across a wide range of threshold probabilities.
Development of a Nomogram for Risk Prediction & Analysis of Mutation Patterns in Both Risk Groups. (A) Nomogram that combines model and clinicopathological factors. (B) Calibration curves in 1-, 3-, and 5-year for the nomogram. (CE) The decision curves analysis (DCA) of the nomogram and clinical characteristics in 1-, 3-, and 5-year. (F) TMB levels between the high-risk and low-risk groups. (G) Gene mutation waterfall chart of the low-risk group. (H) Gene mutation waterfall chart of the high-risk group.
A marked difference in TMB was discerned between the high- and low-risk cohorts (p<0.001 by wilcoxon test) (Fig.4F). The waterfall plot delineates the mutational landscape of the ten most prevalent genes across both risk strata. In the low-risk cohort, approximately 84.53% of specimens exhibited gene mutations (Fig.4G), whereas in the high-risk stratum, mutations were observed in roughly 95.33% of specimens (Fig.4H). Predominant mutations within the high-risk category included TP53, TTN, and CSMD3.
The differential expression analysis revealed a total of 1474 DEGs between the low-risk and high-risk cohorts. Among these, 568 genes were upregulated and 906 genes were downregulated. The volcano plot (Supplementary Fig. S2G) illustrates the distribution of these DEGs. These results indicate that specific genes are significantly associated with risk stratification in our study cohort. In the GO analysis (Fig.5A,D), DEGs showed predominant enrichment in terms of molecular functions such as organic anion transport, carboxylic acid transport. Regarding cellular components, the main enrichment was observed in the apical plasma membrane (Fig.5C). Figure5E demonstrates the GSEA results, highlighting significant enrichment of specific gene sets related to metabolic processes, DNA binding, and hyperkeratosis. The KEGG result highlighted a significant enrichment of DEGs in neuroactive ligand-receptor interaction and the cAMP signaling pathway (Fig.5B).
Biological function analysis of the DRLs risk score model. The top 5 significant terms of (A) GO function enrichment and (B) KEGG function enrichment. (C,D) System clustering dendrogram of cellular components. (E) Gene set enrichment analysis.
To validate the precision of our results, we employed seven techniques: CIBERSORT, EPIC, MCP-counter, xCell, TIMER, quanTIseq, and ssGSEA, to assess immune cell penetration in both high-risk and low-risk categories (Fig.6A). With the ssGSEA data, we explored the connection between TME and several characteristics of lung adenocarcinoma patients, such as age, gender, and disease stage (Fig.6B). We then visualized this data with box plots for both CIBERSORT and ssGSEA (Fig.6C,D). These plots showed that the infiltration levels of B cells memory, T cells CD4 memory resting, and Monocyte was notably lower in the high-risk group compared to the low-risk group. With the help of the ESTIMATE algorithm, we evaluated the stromal (Fig.6F), immune (Fig.6E), and ESTIMATE scores (Supplementary Fig. S3A) across the different risk groups. This allowed us to gauge tumor purity. Our study suggests that the high-risk group has reduced stromal, ESTIMATE, and immune scores. Conversely, the score of tumor purity in the low-risk group is less than that in the high-risk group (Supplementary Fig. S3B).
The tumor microenvironment between high-risk and low-risk groups based on DRLs. (A) Comparing the levels of immune cell infiltration for different immune cell types in the CIBERSORT, EPIC, MCP-counter, xCell, TIMER and quanTIseq algorithm for low-risk and high-risk groups. (B) Immune infiltration of different lung adenocarcinoma patient characteristics. Box plot of the difference in immune cell infiltration between the high-risk and low-risk score groups based on (C) CIBERSORT and (D) ssGSEA. *p-value<0.05, **p-value<0.01, ***p-value<0.001, ns=no significance. (E) Immune score, and (F)stromal score were lower in the high-risk group than in the low-risk group.
We calculated the TIDE score and forecasted the immunotherapy response in both groups of the high risk and low risk (Fig.7A). Based on results from both datasets, patients in low-risk group seem more inclined to show a positive reaction to immunotherapy. Additionally, IPS for the combination of anti-CTLA4 and anti-PDL1 treatment, as well as for anti-CTLA4 alone, was consistently higher in the low-risk group (Fig.7B,C). However, the analysis of anti-PDL1 treatment alone (P=0.170) did not reach statistical significance (Fig.7D). This suggests that low-risk patients may respond better to anti-CTLA4 and/or anti-PDL1 immunotherapy. Recently, research has found a link between tumor TLS and outcomes in several tumor types. In line with these discoveries, our review of TCGA-LUAD dataset showed that LUAD patients with high TLS scores had more favorable outcomes than those with low scores (Fig.7F). We also noticed that the TLS score was higher in the low-risk group compared to the high-risk group (Fig.7E).
Immunotherapeutic sensitivity between high-risk and low-risk groups based on DRLs. (A) Differences in risk scores between the TIDE responsive and nonresponsive groups. (BD) Sensitivity of high- and low-risk groups to combination therapy, anti-CTLA4, and anti-PDL1 by different IPS scores. (E) Differences in tumor tertiary lymphoid structure (TLS) scores in high-risk and low-risk groups in TCGA-LUAD. (F) KM analysis of high-TLS and low-TLS groups.
In our assessment of the relationship between risk scores and sensitivity to chemotherapy, we measured the IC50 for some widely used chemotherapeutic medicine. Our findings showed that the high-risk group was more sensitive to drugs like Cisplatin, Vinblastine, Cytarabine, Vinorelbine, Bexarotene, Cetuximab, Docetaxel, and Doxorubicin than the low-risk group (Fig.8AP).
Immunotherapy sensitivity analysis and in-depth study of LINC00857. (AP) Differences in drug sensitivity between high-risk and low-risk groups. (Q) Volcano plot for GTEX_Lung vs. TCGA_Lung_ Adenocarcinoma.
Through differential gene analysis of tumor tissues and normal tissues, 13,995 DEGs (|logFC|>1.5, p-value<0.050) (Fig.8Q, Supplementary Fig. S3C) were identificated. By cross-referencing with the 27 lncRNAs that form our prognostic model, we pinpointed LINC01003. Supplementary Fig. S4A presents a heatmap demonstrating the expression levels of LINC01003 across different NSCLC datasets and cell types. The results indicate that LINC01003 is differentially expressed, with notable high expression in monocytes/macrophages and endothelial cells across several datasets, suggesting its potential involvement in these cell types within the NSCLC tumor microenvironment. Supplementary Figure S4B further illustrates the expression profile of LINC01003 in different cell populations from the GSE143423 dataset. The violin plot shows significant expression of LINC01003 in malignant cells, compared to other cell types, indicating its potential role in tumor progression.
To decipher the LINC00857 related regulatory mechanisms, we constructed a lncRNA-miRNA-mRNA network (Supplementary Fig. S4C). This network illustrates the intricate interactions between LINC00857 and various miRNAs and mRNAs. In this network, LINC00857 acts as a central regulatory hub, potentially influencing gene expression by sequestering multiple miRNAs, such as hsa-miR-4709-5p, hsa-miR-760, and hsa-miR-340-5p. These miRNAs, in turn, are connected to a wide array of target genes, including YWHAZ, BCL2L2, PTEN, and MYC, which are critical in cellular processes such as cell cycle regulation, apoptosis, and signal transduction.
See the original post here:
Assessing calibration and bias of a deployed machine learning malnutrition prediction model within a large healthcare … – Nature.com
Posted: at 2:48 am
The primary training cohort used to recalibrate the model included 49,652 patients (median [IQR] age = 66.0 [26.0]), of which 49.9% self-identified as female, 29.6% self-identified as Black or African American, 54.8% were on Medicare and 27.8% on Medicaid. 11,664 (24%) malnutrition cases were identified. Baseline characteristics are summarized in Table 1 and malnutrition event rates are summarized in Supplementary Table 2. The validation cohort used to test the model included 17,278 patients (median [IQR] age = 66.0 [27.0]), of which 49.8% self-identified as female, 27.1% self-identified as Black or African American, 52.9% were on Medicare, and 28.2% on Medicaid. 4,005 (23%) malnutrition cases were identified.
Although the model overall had a c-index of 0.81 (95% CI: 0.80, 0.81), it was miscalibrated according to both weak and moderate calibration metrics, with a Brier score of 0.26 (95% CI: 0.25, 0.26) (Table 2), indicating that the model is relatively inaccurate17. It also overfitted the risk estimate distribution, as evidenced by the calibration curve (Supplementary Fig. 1). Logistic recalibration of the model successfully improved calibration, bringing the calibration intercept to 0.07 (95% CI: 0.11, 0.03), calibration slope to 0.88 (95% CI: 0.86, 0.91), and significantly decreasing Brier score (0.21, 95% CI: 0.20, 0.22), Emax (0.03, 95% CI: 0.01, 0.05), and Eavg (0.01, 95% CI: 0.01, 0.02). Recalibrating the model improved specificity (0.74 to 0.93), PPV (0.47 to 0.60), and accuracy (0.74 to 0.80) while decreasing sensitivity (0.75 to 0.35) and NPV (0.91 to 0.83) (Supplementary Tables 2 and 3).
Weak and moderate calibration metrics between Black and White patients significantly differed prior to recalibration (Table 3, Supplementary Fig. 2A, B), with the model having a more negative calibration intercept for White patients on average compared to Black patients (1.17 vs. 1.07), and Black patients having a higher calibration slope compared to White patients (1.43 vs. 1.29). Black patients had a higher Brier score of 0.30 (95% CI: 0.29, 0.31) compared to White patients with 0.24 (95% CI: 0.23, 0.24). Logistic recalibration significantly improved calibration for both Black and White patients (Table 4, Fig. 1ac). For Black patients within the hold-out set, the recalibrated calibration intercept was 0 (95% CI: -0.07, 0.05), calibration slope was 0.91 (95% CI: 0.87, 0.95), and Brier score improved from 0.30 to 0.23 (95% CI: 0.21, 0.25). For White patients within the hold-out set, the recalibrated calibration intercept was -0.15 (95% CI: -0.20, -0.10), calibration slope was 0.82 (95% CI: 0.78, 0.85), and Brier score improved from 0.24 to 0.19 (95% CI: 0.18, 0.21). Post-recalibration, calibration for Black and White patients still differed significantly according to weak calibration metrics, but not so according to moderate calibration metrics and the strong calibration curves (Table 4, Fig. 1). Calibration curves of the recalibrated model showed good concordance between actual and predicted event probabilities, although the predicted risks for Black and White patients differed between the 30th and 60th risk percentiles. Logistic recalibration also improved the specificity, PPV, and accuracy, but decreased the sensitivity and NPV of the model across both White and Black patients (Supplementary Tables 2and 3). Discriminative ability was not significantly different for White and Black patients before and after recalibration. We also found calibration statistics to be relatively similar in Asian patients (Supplementary Table 4).
Columns from left to right are curves for a, No Recalibration b, Recalibration-in-the-Large and c, Logistic Recalibration for Black vs. White patients d, No Recalibration e, Recalibration-in-the-Large and f, Logistic Recalibration for male vs. female patients.
Calibration metrics between male and female patients also significantly differed prior to recalibration (Table 3, Supplementary Fig. 2C, D). The model had a more negative calibration intercept for female patients on average compared to male patients (1.49 vs. 0.88). Logistic recalibration significantly improved calibration for both male and female patients (Table 4, Fig. 1df). In male patients within the hold-out set, the recalibrated calibration intercept was 0 (95% CI: 0.05, 0.03), calibration slope was 0.88 (95% CI: 0.85, 0.90), and Brier score improved from 0.29 to 0.23 (95% CI: 0.22, 0.24). In female patients within the hold-out set, the recalibrated calibration intercept was 0.11 (95% CI: 0.16, 0.06), calibration slope was 0.91 (95% CI: 0.87, 0.94), but the Brier score did not significantly improve. After logistic recalibration, only calibration intercepts differed between male and female patients. Calibration curves of the recalibrated model showed good concordance, although the predicted risks for males and females differed between the 10th and 30th risk percentiles. Discrimination metrics for male and female patients were significantly different before recalibration. The model had a higher sensitivity and NPV for females than males, but a lower specificity, PPV, and accuracy (Supplementary Table 2). The recalibrated model had the highest sensitivity (0.95, 95% CI: 0.94, 0.96), NPV (0.84, 95% CI: 0.83, 0.85) and accuracy (0.82, 95% CI: 0.81, 0.83) for female patients, at the cost of substantially decreasing sensitivity (0.27, 95% CI: 0.25, 0.30) (Supplementary Table 3).
We also assessed calibration by payor type and hospital type as sensitivity analyses. In the payor type analysis, we found that malnutrition predicted risk was more miscalibrated in patients with commercial insurance with more extreme calibration intercepts, Emax, and Eavg suggesting overestimation of risk (Supplementary Tables 5 and 6, Supplementary Fig. 3A, B). We did not observe substantial differences in weak or moderate calibration across hospital type (community, tertiary, quaternary) except that tertiary acute care centers had a more extreme calibration intercept, suggesting an overestimation of risk (Supplementary Tables 7 and 8, Supplementary Fig. 3C, D). Across both subgroups, logistic recalibration significantly improved calibration across weak, moderate, and strong hierarchy tiers (Supplementary Table 5, Supplementary Table 7, Supplementary Figs. 4 and 5).
Go here to see the original:
5 Key Ways AI and ML Can Transform Retail Business Operations – InformationWeek
Posted: at 2:48 am
Odds are youve heard more about artificial intelligence and machine learning in the last two years than you had in the previous 20. Thats because advances in the technology have been exponential, and many of the worlds largest brands, from Walmart and Amazon to eBay and Alibaba, are leveraging AI to generate content, power recommendation engines, and much more.
Investment in this technology is substantial, with exponential growth projected -- the AI in retail market was valued at $7.14 billion in 2023, with the potential to reach $85 billion by 2032.
Brands of all sizes are eyeing this technology to see how it fits into their retail strategies. Lets take a look at some of the impactful ways AI and ML can be leveraged to drive business growth.
One of the major hurdles for retailers -- particularly those with large numbers of SKUs -- is creating compelling, accurate product descriptions for every new product added to their assortment. When you factor in the ever-increasing number of platforms on which a product can be sold, from third-party vendors like Amazon to social selling sites to a brands own website, populating that amount of content can be unsustainable.
One of the areas in which generative AI excels is creating compelling product copy at scale. Natural language generation (NLG) algorithms can analyze vast amounts of product data and create compelling, tailored descriptions automatically. This copy can also be adapted to each channel, fitting specific parameters and messaging towards focused audiences. For example, generative AI engines understand the word count restrictions for a particular social channel. They can focus copy to those specifications, tailored to the demographic data of the person who will encounter that message. This level of personalization at scale is astonishing.
Related:Is an AI Bubble Inevitable?
This use of AI has the potential to help brands achieve business objectives through product discoverability and conversion by creating compelling content optimized for search.
Another area in which AI and ML excel is in the cataloging and organizing of data. Again, when brands deal with product catalogs with hundreds of thousands of SKUs spread across many channels, it is increasingly difficult to maintain consistency and clarity of information. Product, inventory, and eCommerce managers spend countless hours attempting to keep all product information straight and up-to-date, and they still make mistakes.
Related:Is Innovation Outpacing Responsible AI?
Brands can leverage AI to automate tasks such as product categorization, attribute extraction, and metadata tagging, ensuring accuracy and scalability in data management across all channels. This use of AI takes the guesswork and labor out of meticulous tasks and can have wide-ranging business implications. More accurate product information means a reduction in returns and improved product searchability and discoverability through intuitive data architecture.
As online shopping has evolved over the past decade, consumer expectations have shifted. Customers rarely go to company websites and browse endless product pages to discover the product theyre looking for. Rather, customers expect a curated and personalized experience, regardless of the channel through which theyre encountering the brand. A report from McKinsey showed that 71% of customers expect personalization from a brand, and 76% get frustrated when they dont encounter it.
Brands have been offering personalized experiences for decades, but AI and ML unlock entirely new avenues for personalization. Once again, AI enables an unprecedented level of scale and nuance in personalized customer interactions. By analyzing vast amounts of customer data, AI algorithms can connect the dots between customer order history, preferences, location and other identifying user data and create tailored product recommendations, marketing messages, shopping experiences, and more.
Related:Overcoming AIs 5 Biggest Roadblocks
This focus on personalization is key for business strategy and hitting benchmarks. Personalization efforts lead to increases in conversion, higher customer engagement and satisfaction, and better brand experiences, which can lead to long-term loyalty and customer advocacy.
Search functionalities are in a constant state of evolution, and the integration of AI and ML is that next leap. AI-powered search algorithms are better able to process natural language, enabling a brand to understand user intent and context, which improves search accuracy and relevance.
Whats more, AI-driven search can provide valuable insights into customer behavior and preferences, enabling brands to optimize product offerings and marketing strategies. By analyzing search patterns and user interactions, brands can identify emerging trends, optimize product placement, and tailor promotions to specific customer segments. Ultimately, this enhanced search experience improves customer engagement while driving sales growth and fostering long-term customer relationships.
At its core, the main benefit of AI and ML tools is that theyre always working and never burn out. This fact is felt strongest when applied to customer support. Tools like chatbots and virtual assistants enable brands to provide instant, personalized assistance around the clock and around the world. This automation reduces wait times, improves response efficiency, and frees staff to focus on higher-level tasks.
Much like personalization engines used in sales, AI-powered customer support tools can process vast amounts of customer data to tailor responses based on a customers order history and preferences. Also, like personalization, these tools can be deployed to radically reduce the amount of time customer support teams spend on low-level inquiries like checking order status or processing returns. Leveraging AI in support allows a brand to allocate resources in more impactful ways without sacrificing customer satisfaction.
Brands are just scratching the surface of the capabilities of AI and ML. Still, early indicators show that this technology can have a profound impact on driving business growth. Embracing AI can put brands in a position to transform operational efficiency while maintaining customer satisfaction.
Go here to see the original:
5 Key Ways AI and ML Can Transform Retail Business Operations - InformationWeek
Machine learning helps find advantageous combination of salts and organic solvents for easier anti-icing operations – Phys.org
Posted: at 2:48 am
This article has been reviewed according to ScienceX's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:
fact-checked
peer-reviewed publication
trusted source
proofread
close
An Osaka Metropolitan University research team has found a deicing mixture with high effectiveness and low environmental impact after using machine learning to analyze ice melting mechanisms of aqueous solutions of 21 salts and 16 organic solvents. The research appears in Scientific Reports on June 7, 2024.
The dangers of frozen roads, airplane engines, and runways are well known, but the use of commercial products often means short-term safety over long-term environmental degradation. Seeking a better product, Osaka Metropolitan University researchers have developed a deicing mixture offering higher performance than deicers on the market while also having less impact on the environment.
The team, made up of graduate student Kai Ito, Assistant Professor Arisa Fukatsu, Associate Professor Kenji Okada, and Professor Masahide Takahashi of the Graduate School of Engineering, used machine learning to analyze ice melting mechanisms of aqueous solutions of 21 salts and 16 organic solvents. The group then conducted experiments to find that a mixture of propylene glycol and aqueous sodium formate solution showed the best ice penetration capacity.
Because of the mixture's effectiveness, less of the substance needs to be used, thereby also lessening the environmental impact. It is also not corrosive, preventing damage, for example, when used for airport runways.
"We are proposing an effective and environmentally friendly deicer that combines the advantages of salts and organic solvents," said Dr. Fukatsu.
The results of this research also provide new insights into the ice melting process.
"The development of highly efficient deicers is expected to make deicing and anti-icing operations easier," Professor Takahashi added. "This will also lessen the environmental impact by reducing the amount of deicer used."
More information: Machine learning-assisted chemical design of highly efficient deicers, Scientific Reports (2024). DOI: 10.1038/s41598-024-62942-y
Journal information: Scientific Reports
Read the original:
Efficient deep learning-based approach for malaria detection using red blood cell smears | Scientific Reports – Nature.com
Posted: at 2:48 am
This section contains the details of the proposed methodology and details of the pre-trained models employed in this study. Figure 1 shows the workflow of the proposed methodology. The details of each step are provided in the subsequent sections.
Work flow of proposed methodology.
The dataset used in this study was obtained from the public data repository. It contains a total of 27,558 cell images with 13,779 parasitized images and 13,779 uninfected images. These images were obtained from 150 unhealthy patients (infected individuals) and 50 healthy patients. The expert slide-readers and pathologists manually annotated the whole dataset. Color variations in red cell images are due to different blood stains during the image acquisition process. Figure 2 shows samples of parasitized cell images and uninfected cell images.
Samples taken from the red blood cell image datasets contain parasitized cell images and uninfected cell images.
Preprocessing is a very crucial and initial step for deep learning image classification tasks. The dataset contains 13,779 images of parasitized cells and 13,779 images of uninfected cells, which are equally balanced. The cell images contain various widths and heights, and the deep learning model requires equal or fixed-size input. To test the models robustness and compatibility, we resized the images. After resizing, the next important step is to split the cell images into two parts; training and testing. The 80% data are used to train the deep learning models and 20% are kept for testing the model efficacy and performance. Table 1 shows the parasitized and uninfected images after data splitting into train and test.
Deep learning models learn complex patterns of data through various layers. Deep learning has demonstrated effectiveness in many image classification tasks in medical, engineering, and other applications. Deep learning models work well on large datasets, however, consume a lot of computational resources. The hyperparameter settings, loss function, and other layers are used to solve these problems by fastening the training process of deep learning models, reducing computational time, reducing layers, and creating efficient deep models30. Transfer learning is a popular technique that favors the pre-trained models that have been trained on large datasets such as ImageNet, and produces better results for small datasets (Table 2).
Architecture of proposed deep learning model for malaria detection.
EfficientNet-B2 is a CNN model that is exceptionally accurate and reliable and is mostly used for image classification problems. It is well suited for problems that require fewer parameters and have minimal processing resources. Using depth-wise separable convolutions (DWSC), an efficient scaling approach, this model improves the classification accuracy. The main aim of using EfficientNet-B2 in disease detection is its efficiency and accuracy because of its small model size and minimal computing resources. Figure 3 shows the architecture of the proposed model. The development of the EfficientNet-B2 model leads to the addition of a dropout layer, ultimately yielding an output shape of (5,5,1408). We use the flattened layer to convert the multi-dimensional input layer into a one-dimensional one. After that, we utilized three dense layers, four batch normalization layers, and three activation layers. We achieved this after flattening the layers into a single dimension. The first two dense layers of the network utilize ReLU activation functions. The Rectified Linear Unit (ReLU) functions not only collect complicated patterns correctly, but they also lower the chances of overfitting and generalization errors. This makes the model work better overall. The last dense layer primarily employs the sigmoid activation function for classification activities, particularly in binary classification situations. We use this function to complete classification tasks. Batch normalization is an essential component of deep learning architectures that improves accuracy while simultaneously speeding up the training process.
For training purposes, batch normalization uses a small amount of data to calculate the mean and standard deviation of each feature. The statistical data is then used to standardize the input when that step is completed. This approach minimizes internal co-variate shift, which is the change in the distribution of network activation resulting from differences in the parameters of the training process so that it can be used more efficiently. The efficiency of optimization techniques can be increased by batch normalization, which involves standardizing the input. If this is done, the model can be built more quickly and is less likely to encounter gradients that are evaporating or exploding. Additionally, it acts as a regularizer, which means it reduces the need for additional methods of regularization.
Malaria can be detected by analyzing images for symptoms using deep learning models that focus on red blood cells. The proposed model is trained to identify malaria-related symptoms by employing a collection of expert classifications applied to blood cells. Once the model has been adequately trained, it will have the ability to evaluate recently obtained blood cells and offer medical personnel useful information, thereby enabling a faster and more precise diagnosis. Once the model is adequately trained, it possesses the potential to aid physicians in the diagnostic process by classifying newly obtained blood cell samples as either infected or uninfected with malaria. Utilizing deep learning-based malaria detection models in clinical settings offers several potential advantages. These devices have the capability to deliver precise and prompt diagnosis, particularly in regions where there is a scarcity of skilled microscopists. These techniques expedite the initiation of medication for individuals with malaria, enabling front-line healthcare professionals to promptly identify the infection. Consequently, the incidence and mortality rates linked to malaria decline. Moreover, automated analysis is capable of efficiently managing a significant volume of samples on a broad scale, therefore alleviating the workload of laboratory personnel, particularly during outbreaks or monitoring initiatives.
This study also employed fine-tuned deep learning models such as CNN, VGG-16, DenseNet version 121,169, 201, Inception version 3, etc. for Malaria detection. Different pre-trained fine-tuned deep learning models and their trainable parameters are given in Table 3.
A CNN, a type of neural network, consists of numerous layers and aims to directly identify patterns from image pixels. It requires minimal pre-processing31. The convolution layer, the pooling layer, and the fully connected layer are the three essential layers that are widely considered to be the foundation of a CNN. We utilized three convolution blocks, three Maxpooling blocks, and three blocks for Batch normalization, ReLU activation, and Dropout layers. The convolution layer, a fundamental component of a CNN, performs the majority of the computational work. This layer performs the convolution or filtration operation on the input and then transmits the response to the subsequent layer. We place the pooling layer between the successive convolution layers to spatially reduce the input representation and the required processing space. This layer performs the pooling process on each sliced input, thereby reducing the computational workload for the subsequent convolution layer. After that, We flatten all the layers into single dimensions and then add two dense layers with Batch and ReLU activation. The application of the completely linked layer (sigmoid layer) generates the final output, which is also equal to the number of classes32. The detailed architecture of CNN is shown in Fig. 4.
Detailed CNN architecture.
In 2014, VGG16 won the ILSVR (ImageNet) competition and is now considered one of the most advanced vision models available. The VGG-16 network was trained using the ImageNet database and consists of 16 weighted layers, including 13 convolutional layers and 3 fully connected layers. Despite limited image datasets, the VGG-16 network delivers high accuracy due to its extensive training. VGG16 is capable of both object detection and classification with 92.7% accuracy, classifying 1000 images into 1000 unique categories. It is a widely used image classification algorithm that is easy to implement using transfer learning. By adding new layers to neural networks and utilizing batch normalization, the training process can be accelerated, making learning easier and the model more robust33.
Inception V3 is a deep CNN architecture introduced in 2015 by Google researchers. It is the third version of the Inception family of models and is designed to be more efficient and accurate than its predecessors. The Inception V3 model boasts a more expansive network compared to its predecessors, the Inception V1 and V2 models. This deep CNN is specifically designed to be trained on low-configuration computers, although it is still a challenging process that can take several days. Transfer learning provides a solution to this issue by retaining the parameters of previous layers while only updating the last layer for new categories. This approach involves deconstructing the Inception V3 model by removing its final layer, thereby leveraging the benefits of transfer learning34.
DenseNet121 is a CNN architecture that has gained widespread use in image classification tasks since its introduction in 2017. DenseNet121 architecture aims to increase the depth of deep learning networks while improving their training efficiency. This is achieved through the use of short connections between layers. In DenseNet, each layer is connected to all other layers that are deeper in the network, making it a CNN. The number 121 pertains to the count of layers with trainable weights, excluding batch normalization layers. The remaining 5 layers consist of the initial 77 convolutional layer, 3 transitional layers, and a fully connected layer35.
DenseNet169 is a deep CNN architecture that is part of the DenseNet family of models. It was introduced by researchers at Facebook AI Research in 2017 as an improvement over the original DenseNet model. DenseNet169 has 169 layers, which is more than the original DenseNet but less than DenseNet201. Like other DenseNet models, DenseNet169 uses dense connectivity to promote feature reuse and reduce the number of parameters needed to train the network. It also includes bottleneck layers to reduce the computational cost of convolutions. DenseNet169 has achieved state-of-the-art performance on several benchmark datasets, making it a popular choice for image classification tasks requiring high accuracy36.
DenseNet20137 is a deep CNN architecture. DenseNet201 uses a dense connectivity structure, where each layer is connected to every other layer in a feed-forward fashion. This dense connectivity promotes feature reuse and reduces the number of parameters needed to train the network. DenseNet201 also includes a feature called bottleneck layers which reduces the computational cost of convolutions by using 11 convolutions to reduce the dimensionality of the input. DenseNet201 has achieved state-of-the-art performance on several benchmark datasets and is widely used in image classification tasks.
ResNet50, an architecture in deep learning, was introduced in 2015 by Microsoft researchers. It has found applications in a range of computer vision tasks, including the analysis of medical images. ResNet50 is designed to overcome the challenge of vanishing gradients by introducing shortcut connections that allow the network to learn residual representations. By utilizing ResNet50, researchers have been able to attain various results in computer vision tasks, including object detection, image classification, and medical image analysis38.
EfficientNet-B1 is a neural network architecture that was proposed by Google researchers in 2019. It is part of the EfficientNet family of models that are designed to achieve high accuracy while minimizing computational resources. It has fewer parameters and floating-point operations (FLOP) than larger models but still achieves competitive performance on various benchmark datasets. EfficientNet-B1 has been used in a range of computer vision tasks, including image classification, object detection, and segmentation39. Its efficient design makes it particularly suitable for mobile and embedded devices.
EfficientNet-B7 is a powerful model that has shown promising results in a variety of computer vision tasks, including medical image analysis. It is the largest model in the EfficientNet family and has significantly more parameters and FLOP than smaller models in the family. EfficientNet-B740 achieves state-of-the-art performance on various benchmark datasets, including ImageNet, with significantly fewer computational resources than previous state-of-the-art models. However, due to its large size, EfficientNet-B7 may not be suitable for mobile and embedded devices with limited computational resources.
MobileNet is a family of neural network architectures that are designed to be efficient on mobile and embedded devices with limited computational resources. It was proposed by Google researchers in 2017 and has since become a popular choice for a range of computer vision tasks. MobileNet achieves its efficiency by using depth-wise separable convolutions, which separate the spatial and channel-wise dimensions of convolutions and reduce the number of parameters and computations. This design allows MobileNet to achieve high accuracy while requiring significantly fewer resources than larger models. MobileNet has been implemented in various frameworks and is widely used in real-world applications41.
MobileNetV2 is a follow-up to the original MobileNet architecture, proposed by Google researchers in 2018. It further improves the efficiency and accuracy of the original architecture by introducing several novel features. One of the key improvements is the use of a bottleneck block that expands and then contracts the number of channels, allowing for better feature extraction. MobileNetV2 also uses a technique called linear bottlenecks, which adds a linear activation function after each depth-wise convolution to further reduce the computational cost. These innovations make MobileNetV2 one of the most efficient neural network architectures for mobile and embedded devices, while still achieving high accuracy on a range of computer vision tasks39.
The performance of all models that were used in this study was evaluated using precision, recall, F1 score, and accuracy. After training the model, the testing part is used to test the models efficiency and classification. The performance is also evaluated using the confusion matrix. The confusion matrix constitutes TP, TN, FP, and FN predictions.
TP: The true positive rate refers to the actual positive class that is predicted to be positive.
TN: The true negative rate refers to the correct negative predictions made by the model among all negative records.
FP: There is a false positive rate that states the actual negative predictions that are classified as positive by the model.
FN: There is a false negative rate that states the records belong to the positive class and are predicted as negative by the model.
Accuracy: The number of truly classified predictions by a model among the total number of predictions it makes or computes to divide the TP plus TN prediction by the total number of predictions.
$$begin{aligned} = frac{TP+TN}{TP+TN+FP+FN} end{aligned}$$
(1)
Precision: Precision is the number of true positive predictions from the total number of actual predictions classified by the model or computed to divide the TP predictions by the TP plus FP predictions.
$$begin{aligned} = frac{TP}{TP+FP} end{aligned}$$
(2)
Recall: The recall is the score of the correct positive prediction that the model found by looking at all of the actual positive tweets or by dividing the TP predictions by the TP plus FN predictions.
$$begin{aligned} = frac{TP}{TP+FN} end{aligned}$$
(3)
F1 score: An F1 score is an evaluation metric that estimates model performance by taking the average of recall and precision.
$$begin{aligned} =2times frac{Precisiontimes Recall}{Precision+Recall} end{aligned}$$
(4)
Excerpt from:
Predicting sales and cross-border e-commerce supply chain management using artificial neural networks and the … – Nature.com
Posted: at 2:48 am
This section presents a model for supply chain management in CBEC using artificial intelligence (AI). The approach provides resource provisioning by using a collection of ANNs to forecast future events. Prior to going into depth about this method, the dataset specifications utilized in this study are given.
The performance of seven active sellers in the sphere of international products trade over the course of a month was examined in order to get the data for this study. At the global level, all of these variables are involved in the bulk physical product exchange market. This implies that all goods bought by clients have to be sent by land, air, or sea transportation. In order to trade their items, each seller in this industry utilizes a minimum of four online sales platforms. Each of the 945 documents that make up the datasets that were assembled for each vendor includes data on the number of orders that consumers have made with that particular vendor. Each record's bulk product transactions have minimum and maximum amounts of 3, and 29 units, respectively. Every record is defined using a total of twenty-three distinct attributes. Some of the attributes that are included are order registration time, date, month, method (platform type used), order volume, destination, product type, shipping method, active inventory level, product shipping delay history indicated by active in the previous seven transactions, and product order volume history throughout the previous seven days. For each of these two qualities, a single numerical vector is used.
This section describes a CBEC system that incorporates a tangible product supply chain under the management of numerous retailers and platforms. The primary objective of this study is to enhance the supply chain performance in CBEC through the implementation of machine learning (ML) and Internet of Things (IoT) architectures. This framework comprises four primary components:
Retailers They are responsible for marketing and selling products.
Common sales platform Provides a platform for introducing and selling products by retailers.
Product warehouse It is the place where each retailer stores their products.
Supply center It is responsible for instantly providing the resources needed by retailers. The CBEC system model comprises N autonomous retailers, all of which are authorized to engage in marketing and distribution of one or more products. Each retailer maintains a minimum of one warehouse for product storage. Additionally, retailers may utilize multiple online sales platforms to market and sell their products.
Consumers place orders via these electronic commerce platforms in order to acquire the products they prefer. Through the platform, the registered orders are transmitted to the product's proprietor. The retailer generates and transmits the sales form to the data center situated within the supply center as soon as it receives the order. The supply center is responsible for delivering the essential resources to each retailer in a timely manner. In traditional applications of the CBEC system, the supply center provides resources in a reactive capacity. This approach contributes to an extended order processing time, which ultimately erodes customer confidence and may result in the dissolution of the relationship. Proactive implementation of this procedure is incorporated into the proposed framework. Machine learning methods are applied to predict the number of orders that will be submitted by each agent at future time intervals. Following this, the allocation of resources in the storage facilities of each agent is ascertained by the results of these forecasts. In accordance with the proposed framework, the agent's warehouse inventory is modified in the data center after the sales form is transmitted to the data center. Additionally, a model based on ensemble learning is employed to forecast the quantity of upcoming orders for the product held by the retailer. The supply center subsequently acquires the required resources for the retailer in light of the forecast's outcome. The likelihood of inventory depletion and the time required to process orders are both substantially reduced through the implementation of this procedure.
As mentioned earlier, the efficacy of the supply chain is enhanced by this framework via the integration of IoT architecture. For this purpose, RFID technology is implemented in supply management. Every individual product included in the proposed framework is assigned a unique RFID identification tag. The integration of passive identifiers into the proposed model results in a reduction of the system's ultimate implementation cost. The electronic device serves as an automated data carrier for the RFID-based asset management system in the proposed paradigm. The architecture of this system integrates passive RFID devices that function within the UFH band. In addition, tag reader gateways are installed in the product warehouses of each retailer to facilitate the monitoring of merchandise entering and departing the premises. The proposed model commences the product entry and exit procedure through the utilization of the tag reader to extract the distinct identifier data contained within the RFID tags. The aforementioned identifier is subsequently transmitted to the controller in which the reader node is connected. A query containing the product's unique identifier is transmitted by the controller node to the data center with the purpose of acquiring product information, including entry/exit authorization. Upon authorization of this procedure, the controller node proceeds to transmit a storage command to the data center with the purpose of registering the product transfer information. This registration subsequently modifies the inventory of the retailer's product warehouse. Therefore, the overall performance of the proposed system can be categorized into the subsequent two overarching phases:
Predicting the number of future orders of each retailer in future time intervals using ML techniques.
Assigning resources to the warehouses of specific agents based on the outcomes of predictions and verifying the currency of the data center inventory for each agent's warehouse. The following sub-sections will be dedicated to delivering clarifications for each of the aforementioned phases.
The imminent order volume for each vendor is forecasted within this framework through the utilization of a weighted ensemble model. A direct proportionality exists between the quantity of prediction models and the number of retailers that participate in the CBEC system. In order to predict the future volume of customer orders for the affiliated retailer, each ensemble model compiles the forecasts produced by its internal learning models. The supplier furnishes the requisite supplies to each agent in adherence to these projections. Through proactive measures to alleviate the delay that arises from the reactive supply of requested products, this methodology maximizes the overall duration of the supply chain product delivery process. Utilizing a combination of FSFS and ANOVA, the initial step in forecasting sales volume is to identify which attributes have the greatest bearing on the sales volume of particular merchants. Sales projections are generated through the utilization of a weighted ensemble model that combines sales volume with the most pertinent features. The proposed weighted ensemble model for forecasting the order volume of a specific retailer trained each of the three ANN models comprising the ensemble using the order patterns of the input from that retailer. While ensemble learning can enhance the accuracy of predictions produced by learning systems, there are two additional factors that should be considered in order to optimize its performance even further.
Acceptable performance of each learning model Every learning component in an ensemble system has to perform satisfactorily in order to lower the total prediction error by combining their outputs. This calls for the deployment of well-configured learning models, such that every model continues to operate as intended even while handling a variety of data patterns.
Output weighting In the majority of ensemble system application scenarios, the efficacy of the learning components comprising the system differs. To clarify, while certain learning models exhibit a reduced error rate in forecasting the objective variable, others display a higher error rate. Consequently, in contrast to the methodology employed in traditional ensemble systems, it is not possible to designate an identical value to the output value of every predictive component. In order to address this issue, one may implement a weighting strategy on the outputs of each learning component, thereby generating a weighted ensemble system.
CapSA is utilized in the proposed method to address these two concerns. The operation of the proposed weighted ensemble model for forecasting customer order volumes is illustrated in Fig.1.
Operation of the proposed weighted ensemble model for predicting order volume.
As illustrated in Fig.1, the ensemble model under consideration comprises three predictive components that collaborate to forecast the order volume of a retailer, drawing inspiration from the structure of the ANN. Every individual learning model undergoes training using a distinct subset of sales history data associated with its respective retailer. The proposed method utilizes CapSA to execute the tasks of determining the optimal configuration and modifying the weight vector of each ANN model. It is important to acknowledge that the configuration of every ANN model is distinct from that of the other two models. By employing parallel processing techniques, the configuration and training of each model can be expedited. Every ANN model strives to determine the parameter values in a way that minimizes the mean absolute error criterion during the configuration phase. An optimal configuration set of learning models can be obtained through the utilization of this mechanism, thereby guaranteeing that every component functions at its designated level. After the configuration of each ANN component is complete, the procedure to determine the weight of the output of the predictive component is carried out. In order to accomplish this goal, CapSA is employed. During this phase, CapSA attempts to ascertain the output value of each learning model in relation to its performance.
After employing CapSA to optimize the weight values, the assembled and weighted models can be utilized to predict the volume of orders for novel samples. To achieve this, during the testing phase, input features are provided to each of the predictive components ANN1, ANN2, and ANN3. The final output of the proposed model is computed by averaging the weighted averages of the outputs from these components.
It is possible for the set of characteristics characterizing the sales pattern to contain unrelated characteristics. Hence, the proposed approach employs one-way ANOVA analysis to determine the significance of the input feature set and identify characteristics that are associated with the sales pattern. The F-score values of the features are computed in this manner utilizing the ANOVA test. Generally speaking, characteristics that possess greater F values hold greater significance during the prediction stage and are thus more conspicuous. Following the ranking of the features, the FSFS method is utilized to select the desired features. The primary function of FSFS is to determine the most visible and appropriate subset of ranked features. The algorithm generates the optimal subset of features by iteratively selecting features from the input set in accordance with their ranking. As each new feature is incorporated into the feature subset at each stage, the learning model's prediction error is assessed. The feature addition procedure concludes when the performance of the classification model is negatively impacted by the addition of a new feature. In such cases, the optimal subset is determined as the feature subset with the smallest error. Utilizing the resultant feature set, the ensemble system's components are trained in order to forecast sales volume.
CapSA is tasked with the responsibility of identifying the most appropriate neural network topologies and optimal weight values within the proposed method. As previously stated, the ensemble model under consideration comprises three ANNs, with each one tasked with forecasting the forthcoming sales volume for a specific retailer. Using CapSA, the configuration and training processes for each of these ANN models are conducted independently. This section provides an explanation of the procedure involved in determining the optimal configuration and modifying the weight vector for each ANN model. Hence, the subsequent section outlines the steps required to solve the aforementioned optimization problem using CapSA, after which the structure of the solution vector and the objective function are defined. The suggested method's optimization algorithm makes use of the solution vector to determine the topology, network biases, and weights of neuronal connections. As a result, every solution vector in the optimization process consists of two linked parts. The first part of the solution vector specifies the network topology. Next, in the second part, the weights of the neurons and biases (which match the topology given in the first part of the solution vector) are determined. As a result, the defined topology of the neural network determines the variable length of the solution vectors in CapSA. Because a neural network might have an endless number of topological states, it is necessary to include certain restrictions in the solution vector that relate to the topology of the network. The first part of the solution vector is constrained by the following in order to narrow down the search space:
The precise count of hidden layers in any neural network is one. As such, the first element of the solution vector consists of one element, and the value of that element represents the number of neurons assigned to the hidden layer of the neural network.
The hidden layer of the neural network has a minimum of 4 and a maximum of 15 neurons.
The number of input features and target classes, respectively, determine the dimensions of the input and output layers of the neural network. As a result, the initial segment of the solution vector, known as the topology determination, solely specifies the quantity of neurons to be contained in the hidden layers. Given that the length of the second part of the solution vector is determined by the topology in the first part, the length of the first part determines the number of neurons in the neural network. For a neural network with I input neurons, H hidden neurons, and P output neurons, the length of the second part of the solution vector in CapSA is equal to (Htimes (I+1)+Ptimes (H+1)).
In CapSA, the identification of optimal solutions involves the application of a fitness function to each one. To achieve this goal, following the solution vector-driven configuration of the neural network's weights and topology, the network produces outputs for the training samples. These outputs are then compared to the actual target values. Following this, the mean absolute error criterion is applied to assess the neural network's performance and the generated solution's optimality. CapSAs fitness function is thus characterized as follows:
$$MAE=sum_{i=1}^{N}left|{T}_{i}-{Z}_{i}right|$$
(1)
In this context, N denotes the quantity of training samples, while Ti signifies the desired value to be achieved for the i-th training sample. Furthermore, the output generated by the neural network for the i-th training sample is denoted as Zi. The proposed method utilizes CapSA to ascertain a neural network structure capable of minimizing Eq.(1). In CapSA, both the initial population and the search bounds for the second portion of the solution vector are established at random [1, +1]. Thus, all weight values assigned to the connections between neurons and biases of the neural network fall within this specified range. CapSA determines the optimal solution through the following procedures:
Step 1 The initial population of Capuchin agents is randomly valued.
Step 2 The fitness of each solution vector (Capuchin) is calculated based on Eq.(1).
Step 3 The initial speed of each Capuchin agent is set.
Step 4 Half of the Capuchin population is randomly selected as leaders and the rest are designated as follower Capuchins.
Step 5 If the number of algorithm iterations has reached the maximum G, go to step 13, otherwise, repeat the following steps:
Step 6 The CapSA lifespan parameter is calculated as follows27:
$$tau ={beta }_{0}{e}^{{left(-frac{{beta }_{1}g}{G}right)}^{{beta }_{2}}}$$
(2)
where g represents the current number of iterations, and the parameters ({beta }_{0}), ({beta }_{1}), and ({beta }_{2}) have values of 2, 21, and 2, respectively.
Step 7 Repeat the following steps for each Capuchin agent (leader and follower) like i:
Step 8 If i is a Capuchin leader; update its speed based on Eq.(3)@@27:
$${v}_{j}^{i}=rho {v}_{j}^{i}+tau {a}_{1}left({x}_{bes{t}_{j}}^{i}-{x}_{j}^{i}right){r}_{1}+tau {a}_{2}left(F-{x}_{j}^{i}right){r}_{2}$$
(3)
where the index j represents the dimensions of the problem and ({v}_{j}^{i}) represents the speed of Capuchin i in dimension j. ({x}_{j}^{i}) indicates the position of Capuchin i for the j-th variable and ({x}_{bes{t}_{j}}^{i}) also describes the best position of Capuchin i for the j-th variable so far. Also, ({r}_{1}) and ({r}_{2}) are two random numbers in the range [0,1]. Finally, (rho) is the parameter affecting the previous speed, which is set to 0.7.
Step 9 Update the new position of the leader Capuchins based on their speed and movement pattern.
Step 10 Update the new position of the follower Capuchins based on their speed and the leaders position.
Step 11 Calculate the fitness of the population members based on Eq.(1).
Step 12 If the entire populations position has been updated, go to Step 5, otherwise, repeat the algorithm from Step 7.
Step 13 Return the solution with the least fitness as the optimal configuration of the ANN model.
Once each predictive component has been configured and trained, CapSA is utilized once more to assign the most advantageous weights to each of these components. Determining the significance coefficient of the output produced by each of the predictive components ANN1, ANN2, and ANN3 with respect to the final output of the proposed ensemble system is the objective of optimal weight allocation. Therefore, the optimization variables for the three estimation components comprising the proposed ensemble model correspond to the set of optimal coefficients in this specific implementation of CapSA. Therefore, the length of each Capuchin in CapSA is fixed at three in order to determine the ensemble model output, and the weight coefficients are assigned to the outputs of ANN1, ANN2, and ANN3, correspondingly. Each optimization variable's search range is a real number between 0 and 1. After providing an overview of the computational methods employed in CapSA in the preceding section, the sole remaining point in this section is an explanation of the incorporated fitness function. The following describes the fitness function utilized by CapSA to assign weights to the learning components according to the mean absolute error criterion:
$$fitness=frac{1}{n} sum_{i=1}^{n}{T}_{i}-frac{sum_{j=1}^{3}{w}_{j}times {y}_{j}^{i}}{sum_{j=1}^{3}{w}_{j}}$$
(4)
where ({T}_{i}) represents the actual value of the target variable for the i-th sample. Also, ({y}_{j}^{i}) represents the output estimated by the ANNj model for the i-th training sample, and wj indicates the weight value assigned to the ANNj model via the solution vector. At last, n describes the number of training samples.
A weight coefficient is allocated to each algorithm within the interval [0,1], delineating the manner in which that algorithm contributes to the final output of the ensemble model. It is crucial to note that the weighting phase of the learning components is executed only once, after the training and configuration processes have been completed. Once the optimal weight values for each learning component have been determined by CapSA, the predicted volume of forthcoming orders is executed using the trained models and the specified weight values. Once the predictive output of all three implemented ANN models has been obtained, the number of forthcoming orders is computed as follows by the proposed weighted ensemble model:
$$output=frac{sum_{i=1}^{3}{w}_{i}times {y}_{i}}{sum_{i=1}^{3}{w}_{i}}$$
(5)
Within this framework, the weight value (wi) and predicted value (yi) denote the ANNi model's assigned weight and predicted value, respectively, for the provided input sample. Ultimately, the retailer satisfies its future obligations in accordance with the value prediction produced by this ensemble model.
By predicting the sales volume of the product for specific retailers, it becomes possible to procure the requisite resources for each retailer in alignment with the projected sales volume. By ensuring that the supplier's limited resources are distributed equitably, this mechanism attempts to maximize the effectiveness of the sales system. In the following analysis, the sales volume predicted by the model for each retailer designated as i is represented by pi, whereas the agent's current inventory is denoted by vi. Furthermore, the total distribution capacity of the supplier is represented as L. In such a case, the supplier shall allocate the requisite resources to the retailer as follows:
Sales volume prediction Applying the model described in the previous part, the upcoming sales volume for each agent in the future time interval (pi) is predicted.
Receiving warehouse inventory The current inventory of every agent (vi) is received through supply chain management systems.
Calculating the required resources The amount of resources required for the warehouse of each retailer is calculated as follows:
$$S_{i} = max left( {0,p_{i} - v_{i} } right)$$
(6)
Calculating each agents share of allocatable resources The share of each retailer from the allocatable resources is calculated by Eq.(7), (N represents the number of retailers):
$${R}_{i}=frac{{S}_{i}}{sum_{j=1}^{N}{S}_{j}}$$
(7)
Resource allocation The supply center sends the needed resources for each agent according to the allocated share (Ri) to that agents warehouse.
Inventory update The inventory of every agent is updated with the receipt of new resources.
The rest is here: