Indonesian Journal of Data and Science <p align="justify">Indonesian Journal of Data and Science (IJODAS) is an electronic periodical publication published by Yocto Brain (YB),&nbsp; a non-commercial company that focused on education and training. IJODAS provides online media to publish scientific articles from research in the field of Data Science, Data Mining, Data Communication and Data Security. IJODAS is registered in National Library with Online Number International Standard Serial Number (ISSN) <a title="SK ISSN" href="" target="_blank" rel="noopener"><strong>2715-9930</strong></a>.</p> <p>&nbsp;</p> yocto brain en-US Indonesian Journal of Data and Science 2715-9930 <p><strong>License and Copyright Agreement</strong></p> <p>In submitting the manuscript to the journal, the authors certify that:</p> <ul> <li class="show">They are authorized by their co-authors to enter into these arrangements.</li> <li class="show">The work described has not been formally published before, except in the form of an abstract or as part of a published lecture, review, thesis, or overlay journal.</li> <li class="show">The work is not under consideration for publication elsewhere.</li> <li class="show">The work has been approved by all the author(s) and by the responsible authorities – tacitly or explicitly – of the institutes where the work has been carried out.</li> <li class="show">They secure the right to reproduce any material that has already been published or copyrighted elsewhere.</li> <li class="show">They agree to the following license and copyright agreement.</li> </ul> <p><strong>Copyright</strong></p> <p>Authors who publish with Indonesian Journal of Data and Science agree to the following terms:</p> <ol> <li class="show">Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a&nbsp;<a href="" rel="license">Creative Commons Attribution-NonCommercial 4.0 International License</a>.<a href="" target="_blank" rel="noopener">&nbsp;(CC BY-NC 4.0)</a>&nbsp;that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.&nbsp;</li> <li class="show">Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.</li> <li class="show">Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.</li> </ol> Assessing the Predictive Power of Logistic Regression on Liver Disease Prevalence in the Indian Context <p>This study delves into the application of Logistic Regression through a Voting Classifier to predict liver disease prevalence within the Indian demographic, specifically analyzing data from the NorthEast of Andhra Pradesh. Employing a dataset encompassing 584 patient records, the research utilizes a 5-fold cross-validation approach to evaluate the model's performance across accuracy, precision, recall, and F1-Score metrics. The findings reveal accuracy rates ranging from 69.23% to 74.14%, with variable precision and recall, indicating a promising yet improvable predictive capability of the model. The study significantly contributes to the existing body of knowledge by demonstrating the potential of Logistic Regression in medical diagnostics, especially in the context of liver disease, and highlighting the critical role of machine learning models in enhancing diagnostic processes. Through a detailed discussion, the research aligns with previous studies on the efficacy of machine learning in healthcare, advocating for the integration of more comprehensive data and suggesting further exploration into the model's applicability across diverse populations. The study's implications extend to healthcare professionals and policymakers, underscoring the necessity for advanced diagnostic tools in the early detection of liver diseases.</p> Izmy Alwiah Umar Zaky Aris Wahyu Murdiyanto Copyright (c) 2024 Indonesian Journal of Data and Science 2024-03-31 2024-03-31 5 1 1 7 10.56705/ijodas.v5i1.121 Improving Mental Health Diagnostics through Advanced Algorithmic Models: A Case Study of Bipolar and Depressive Disorders <p>This study explores the efficacy of a voting classifier integrating K-Nearest Neighbors (K-NN), Gaussian Naive Bayes (GNB), and Random Forest algorithms in diagnosing bipolar and depressive disorders. Utilizing a dataset of 120 psychology patients exhibiting 17 essential symptoms, the research employs a 5-fold cross-validation method to assess the model's diagnostic performance. Results indicate variability in accuracy (66.67% to 91.67%), precision (66.46% to 93.75%), recall (identical to accuracy), and F1-Scores (65.96% to 91.43%) across folds, demonstrating the model's robustness and potential to enhance psychiatric diagnostic processes. The findings suggest that the voting classifier significantly outperforms traditional diagnostic methods, offering a promising tool for more accurate and efficient mental health diagnostics. This research contributes to the burgeoning field of machine learning applications in mental health care, highlighting the potential of ensemble methods in addressing the complexities of psychiatric diagnosis. Given the limitations related to data diversity and model sensitivity, future research should focus on employing larger, more varied datasets and exploring the integration of additional algorithms to further refine diagnostic accuracy. This study lays the groundwork for advancing mental health diagnostics through innovative machine learning techniques.</p> Adityo Permana Wibowo Medi Taruk ⁠⁠Thomas Edyson Tarigan Muhammad Habibi Copyright (c) 2024 Indonesian Journal of Data and Science 2024-03-31 2024-03-31 5 1 8 14 10.56705/ijodas.v5i1.122 Advancements in Agricultural Automation: SVM Classifier with Hu Moments for Vegetable Identification <p>This study investigates the application of Support Vector Machine (SVM) classifiers in conjunction with Hu Moments for the purpose of classifying segmented images of vegetables, specifically Broccoli, Cabbage, and Cauliflower. Utilizing a dataset comprising segmented vegetable images, this research employs the Canny method for image segmentation and Hu Moments for feature extraction to prepare the data for classification. Through the implementation of a 5-fold cross-validation technique, the performance of the SVM classifier was thoroughly evaluated, revealing moderate accuracy, precision, recall, and F1-scores across all folds. The findings highlight the classifier's potential in distinguishing between different vegetable types, albeit with identified areas for improvement. This research contributes to the growing field of agricultural automation by demonstrating the feasibility of using SVM classifiers and image processing techniques for the task of vegetable identification. The moderate performance metrics emphasize the need for further optimization in feature extraction and classifier tuning to enhance classification accuracy. Future recommendations include exploring alternative machine learning algorithms, advanced feature extraction methods, and expanding the dataset to improve the classifier's robustness and applicability in agricultural settings. This study lays a foundation for future advancements in automated vegetable sorting and quality control, offering insights that could lead to more efficient agricultural practices.</p> Bagus Satrio Waluyo Poetro ⁠⁠Eny Maria Hamada Zein Effan Najwaini Dian Hafidh Zulfikar Copyright (c) 2024 Indonesian Journal of Data and Science 2024-03-31 2024-03-31 5 1 15 22 10.56705/ijodas.v5i1.123 Evaluating the Performance of Voting Classifier in Multiclass Classification of Dry Bean Varieties <p>This study explores the application of a voting classifier, integrating Decision Tree, Logistic Regression, and Gaussian Naive Bayes models, for the multiclass classification of dry bean varieties. Utilizing a dataset comprising 13,611 images of dry bean grains, captured through a high-resolution computer vision system, we extracted 16 features to train and test the classifier. Through a rigorous 5-fold cross-validation process, we assessed the model's performance, focusing on accuracy, precision, recall, and F1-score metrics. The results demonstrated significant variability in the classifier's performance across different data subsets, with accuracy rates fluctuating between 31.23% and 96.73%. This variability highlights the classifier's potential under specific conditions while also indicating areas for improvement. The research contributes to the agricultural informatics field by showcasing the effectiveness and challenges of using ensemble learning methods for crop variety classification, a crucial task for enhancing agricultural productivity and food security. Recommendations for future research include exploring additional features to improve model generalization, extending the dataset for broader applicability, and comparing the voting classifier's performance with other ensemble methods or advanced machine learning models. This study underscores the importance of machine learning in advancing agricultural classification tasks, paving the way for more efficient and accurate crop sorting and grading processes.</p> I Putu Adi Pratama Ery Setiyawan Jullev Atmadji Dwi Amalia Purnamasar Edi Faizal Copyright (c) 2024 Indonesian Journal of Data and Science 2024-03-31 2024-03-31 5 1 23 29 10.56705/ijodas.v5i1.124 Leveraging K-Nearest Neighbors for Enhanced Fruit Classification and Quality Assessment <p>This study investigates the application of the K-Nearest Neighbors (KNN) algorithm for fruit classification and quality assessment, aiming to enhance agricultural practices through machine learning. Employing a comprehensive dataset that encapsulates various fruit attributes such as size, weight, sweetness, crunchiness, juiciness, ripeness, acidity, and quality, the research leverages a 5-fold cross-validation method to ensure the reliability and generalizability of the KNN model's performance. The findings reveal that the KNN algorithm demonstrates high accuracy, precision, recall, and F1-Score across all metrics, indicating its efficacy in classifying fruits and predicting their quality accurately. These results not only validate the algorithm's potential in agricultural applications but also align with existing research on machine learning's capability to tackle complex classification problems. The study's discussions extend to the practical implications of implementing a KNN-based model in the agricultural sector, highlighting the possibility of revolutionizing quality control and inventory management processes. Moreover, the research contributes to the field by confirming the hypothesis regarding the effectiveness of KNN in agricultural settings and lays the foundation for future explorations that could integrate multiple machine learning techniques for enhanced outcomes. Recommendations for subsequent studies include expanding the dataset and exploring algorithmic synergies, aiming to further the advancements in agricultural technology and machine learning applications.</p> I Gede Iwan Sudipa Rezania Agramanisti Azdy Ika Arfiani Nicodemus Mardanus Setiohardjo Sumiyatun Copyright (c) 2024 Indonesian Journal of Data and Science 2024-03-31 2024-03-31 5 1 30 36 10.56705/ijodas.v5i1.125 Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions <p>This study harnesses the predictive capabilities of machine learning to explore the determinants of obesity within populations from Mexico, Peru, and Colombia, using a Decision Tree algorithm bolstered by 5-fold cross-validation. Our comprehensive analysis of 2111 individuals' lifestyle and physical condition data yielded accuracy, precision, recall, and F1-scores that notably peaked in the third and fifth folds. The findings affirmed the significance of dietary habits and physical activity as substantial predictors of obesity levels. The variability in model performance across the folds underscored the importance of robust cross-validation in enhancing the model's generalizability. This research contributes to the burgeoning field of data science in public health by providing a viable model for obesity prediction and laying the groundwork for targeted health interventions. Our study's insights are pivotal for public health officials and policymakers, serving as a stepping stone towards more sophisticated, data-driven approaches to combating obesity. The study, however, recognizes the inherent limitations of self-reported data and the need for broader datasets that encompass more diverse variables. Future research directions include the analysis of longitudinal data to establish causal relationships and the comparison of various machine learning models to optimize predictive performance</p> Fadhila Tangguh Admojo Nurul Rismayanti Copyright (c) 2024 Indonesian Journal of Data and Science 2024-03-31 2024-03-31 5 1 37 44 10.56705/ijodas.v5i1.126 Automated Classification of COVID-19 Chest X-ray Images Using Ensemble Machine Learning Methods <p>This study delves into the efficacy of ensemble machine learning techniques for classifying chest X-ray images into three distinct categories: Normal, COVID-19, and Lung Opacity. Employing the Random Forest Classifier and a rigorous k-5 cross-validation framework, we aimed to enhance diagnostic accuracy for one of the most urgent medical challenges today—rapid and reliable COVID-19 detection. The analysis revealed an average accuracy of 51%, with varying precision and recall across different folds. The F1-score remained consistently around 35%, indicating a need for improved balance between precision and recall. Visualizations such as performance metric trends and a confusion matrix provided further insight into the classifier's performance, highlighting a notable degree of misclassification. Despite moderate success in the automated classification of the images, our research illustrates the complexity of applying machine learning to medical imaging, especially in differentiating between diseases with overlapping radiographic features. The study’s findings emphasize the potential of machine learning models to support diagnostic processes and suggest the necessity of advanced pre-processing techniques and extended datasets for enhanced model training. The research contributes to the growing body of knowledge in computational diagnostics and underscores the importance of developing robust, accurate machine learning tools to aid in the global healthcare crisis precipitated by the pandemic.</p> A. Sinra Husni Angriani Copyright (c) 2024 Indonesian Journal of Data and Science 2024-03-31 2024-03-31 5 1 45 53 10.56705/ijodas.v5i1.127 Classification of Rice Grain Varieties Using Ensemble Learning and Image Analysis Techniques <p>This research explored the efficacy of machine learning techniques, specifically the Bagging meta-estimator, in the classification of rice grain images. Utilizing a dataset composed of 45,000 images of Arborio, Basmati, and Jasmine rice varieties, a 5-fold cross-validation was employed to evaluate the model's performance. The results were highly promising, with the model consistently achieving over 96% in accuracy, precision, recall, and F1-score across all folds, indicating its robustness and reliability. The study confirmed that ensemble learning techniques could significantly improve the classification accuracy over single classifier systems in agricultural applications. The findings offer a significant contribution to automated agricultural processes, suggesting that machine learning can greatly enhance the efficiency and precision of rice variety classification. These results pave the way for further research into the integration of such models into agricultural quality control and provide a foundation for the exploration of advanced image processing and deep learning techniques for improved performance. Future research directions include expanding the model to encompass a wider variety of crops and integrating additional data modalities to refine classification accuracy further. Practical applications should explore the incorporation of this technology into existing agricultural systems to maximize the benefits of automation in quality control.</p> Rudi Setiawan Hayatou Oumarou Copyright (c) 2024 Indonesian Journal of Data and Science 2024-03-31 2024-03-31 5 1 54 63 10.56705/ijodas.v5i1.129 Federated Learning for Bronchus Cancer Detection Using Tiny Machine Learning Edge Devices <p>In deep learning, acquiring sufficient data is crucial for making informed decisions. However, due to concerns regarding security and privacy, obtaining enough data for training models in the era of deep learning is challenging. There is a growing need for machine learning (ML) solutions that can derive accurate conclusions from small data while preserving privacy. Smartphones, which are widely used and generate large amounts of data, can serve as an excellent source for data generation. One suitable approach for regularly evaluating real-world data from edge devices is Tiny Machine Learning (TinyML). With the increasing number of edge devices involved in transmitting private data, it's vital to have a method that allows computations to be performed on edge devices and pushed to the edge rather than over the network. Considering these obstacles, the combination of TinyML edge devices and Federated Learning can be applied in the early treatment of Bronchus Cancer. Under the framework of federated learning, local edge devices are trained independently and then integrated into the server without exchanging edge device data. This approach enables the creation of secure models without sharing information, resulting in a highly efficient solution with enhanced data security and accessibility. This article provides a comprehensive discussion of the key challenges addressed in recent literature, accompanied by an extensive examination of relevant studies. Additionally, a novel model based on edge devices and federated learning is proposed.</p> Musa Dima Genemo Copyright (c) 2024 Indonesian Journal of Data and Science 2024-03-31 2024-03-31 5 1 64 69 10.56705/ijodas.v5i1.116 Comparison of Three Resouces Allocation Technique in Cloud Computing <p>The shift to the cloud enables organizations of all sizes to swiftly, efficiently, and innovatively move their operations. The adoption of cloud computing has significantly transformed most organizations' work, communication, and collaboration methods, making it a crucial necessity for maintaining competitiveness in the digital age. Organizations are implementing cloud bursting to handle IT demand peaks by utilizing private cloud capacity and public cloud capacity, freeing up local resources for critical applications, and reverting data back to the private cloud. Organizations face challenges in allocating resources in cloud computing to automatically switch from private cloud to public cloud, leading to system issues, user frustration, operational failure, increased stress, and revenue loss. To address these concerns. This paper investigates traffic predictions by comparing three prediction tools, such as support vector machines, spatio-temporal, and edge-cloud collaborative schemes, and proposing conceptual solutions. Efficient cloud computing traffic management can prevent system bottlenecks, especially during peak periods, potentially leading to dissatisfied clients.</p> Sello Prince Sekwatlakwatla Copyright (c) 2024 Indonesian Journal of Data and Science 2024-03-31 2024-03-31 5 1 70 75 10.56705/ijodas.v5i1.118