Indonesian Journal of Data and Science

Classification of Cavendish Banana Ripeness With CNN Method

2025-11-29T07:50:16+07:00

Cavendish bananas are one of the most widely consumed tropical fruits in Indonesia due to their sweet taste and high nutritional content. However, as they ripen, the sugar content in bananas increases, which can be a problem for diabetics. To help diabetics choose bananas with the right level of ripeness, this study developed a Cavendish banana ripeness classification model using artificial intelligence technology, namely the ResNet50 Convolutional Neural Network (CNN) architecture. The banana data is divided into five ripeness categories: green, yellowish green, yellow, spotted yellow, and spotted brownish yellow. The model was trained with two approaches, with and without data augmentation, using two types of training algorithms (optimizers), namely Adam and SGD, as well as a k-fold cross-validation method to ensure accurate results. The results showed that the ResNet50 model produced the highest accuracy of 98% when trained using data augmentation and the Adam optimizer with a learning rate setting of 0.0001.

Hybrid CNN-LSTM and Cox Model for Bipolar Risk Analysis Using Social Media Data

2025-11-29T07:50:02+07:00

Introduction: Mental disorders such as bipolar disorder are becoming increasingly prominent, particularly with the rise of emotional expression through social media. Early detection remains a significant challenge due to the lack of non-invasive, real-time assessment methods. Methods: This study proposes a hybrid deep learning approach combining Convolutional Neural Network–Long Short-Term Memory (CNN-LSTM) and the Cox Proportional Hazards (Cox PH) model to analyze the risk and timing of bipolar disorder onset. A dataset of 3,511 tweets from 517 Twitter users was collected. The CNN-LSTM model classified bipolar risk levels based on text data, while the Cox PH model estimated the time-to-event for high-risk conditions using behavioral features and predicted risk labels. Results: The hybrid model demonstrated strong predictive performance. The risk label significantly influenced the time to high-risk condition (hazard ratio = 5.39, p < 0.005). The model achieved a concordance index (C-index) of 0.816, indicating high reliability in survival prediction. Conclusions: This case study highlights the potential of integrating deep learning and survival analysis for early bipolar disorder detection using social media data. The proposed non-invasive method can support mental health monitoring while raising awareness of ethical and privacy considerations

Classification Of Bougainvillea Flower Varieties Using Variant Of CNN: Resnet50

2025-11-29T07:49:49+07:00

Bougainvillea is a tropical ornamental plant renowned for its vibrant colors and variety of cultivars, yet classifying its species remains challenging due to morphological similarities. This study aims to develop an automated classification system using the ResNet50 deep learning architecture to identify Bougainvillea flower varieties based on visual imagery. The dataset consists of 700 images from seven distinct classes, captured under natural lighting using a smartphone camera. The research process includes image preprocessing (resizing to 224x224 pixels), geometric data augmentation to increase dataset diversity, and evaluation using K-Fold Cross Validation to ensure robust model assessment. The model was trained using transfer learning, and its performance was compared between augmented and non-augmented datasets through evaluation metrics such as accuracy, precision, recall, and F1-score. The results show that augmentation significantly improved the model's performance, achieving an average accuracy of 99.67% on augmented data compared to 93.39% on non-augmented data. The augmented model also exhibited greater consistency across all folds, with several achieving perfect scores. These findings highlight that combining ResNet50 with transfer learning and image augmentation produces a highly accurate and reliable Bougainvillea classification system. This research contributes to the field of AI-based plant phenotyping and lays the groundwork for future applications in horticulture, biodiversity conservation, and education. Further development is recommended to explore larger and more diverse datasets, investigate advanced architectures such as EfficientNet or Vision Transformers, and build real-time mobile-based classification tools for practical field usage

Classification Of Organic And Inorganic Waste Using Resnet50

2025-11-29T07:49:34+07:00

Waste generation, particularly from organic and inorganic sources, has become a growing environmental issue, especially in culturally unique regions like Bali where traditional offerings contribute to organic waste volumes. Despite regulations such as Gianyar Regency Regulation No. 76 of 2023 mandating source-level separation, on-ground implementation remains inconsistent due to low public awareness and operational limitations. This study addresses the challenge by developing an automated image-based classification system using the ResNet50 deep learning architecture to distinguish between organic and inorganic waste. A total of 200 images were collected 100 per class using smartphone cameras, and the dataset was expanded to 1,400 images through geometric data augmentation techniques such as rotation, flipping, and zooming. Images were resized to 224x224 pixels and evaluated using K-Fold Cross Validation to ensure model stability. The model was trained using transfer learning and tested under two conditions with and without augmentation while optimizing hyperparameters such as learning rates (0.0001 and 0.00001) and optimizers (Adam and SGD). The results demonstrate that augmentation significantly enhanced model performance, with the augmented model achieving an average accuracy of 99.25%, precision of 99.32%, recall of 99.25%, and F1-score of 99.25%, compared to 89.88% accuracy in the non-augmented model. These findings confirm that ResNet50, when combined with geometric augmentation and proper preprocessing, offers a robust, accurate, and scalable solution for waste classification tasks. This research contributes to the advancement of AI-driven environmental technologies and offers a potential framework for smart waste management systems, with future directions including real-time deployment, multi-class classification, and expansion to more diverse and real-world datasets.

Comparation Analysis of Otsu Method for Image Braille Segmentation : Python Approaches

2025-11-29T07:49:20+07:00

Braille plays a crucial role in supporting literacy for individuals with visual impairments. However, converting Braille documents into digital text remains a technical challenge, particularly in accurately segmenting Braille dots from scanned images. This study aims to evaluate and compare the effectiveness of several classical image segmentation techniques—namely Otsu, Otsu Inverse, Otsu Morphology, and Otsu Inverse Morphology—in enhancing Braille image pre-processing. The methods were tested using a set of Braille image datasets and evaluated based on six quantitative image quality metrics: Peak Signal-to-Noise Ratio (PSNR), Mean Squared Error (MSE), Mean Absolute Error (MAE), Structural Similarity Index (SSIM), Feature Similarity Index (FSIM), and Edge Similarity Index (ESSIM). The results show that the Otsu Morphology method achieved the highest PSNR (27.6798) and SSIM (0.5548), indicating superior image fidelity and structural preservation, while the standard Otsu method yielded the lowest MSE (113.3485).These findings demonstrate that applying morphological operations in combination with thresholding significantly enhances the segmentation quality of Braille images, supporting better accuracy in subsequent recognition tasks. This approach offers a practical and efficient alternative to deep learning models, particularly for resource-constrained systems such as portable Braille readers.

YOLOv8 Implementation on British Sign Language System with Edge Detection Extraction

2025-11-29T07:49:06+07:00

This study presents the development and implementation of a deep learning-based system for recognizing static hand gestures in British Sign Language (BSL). The system utilizes the YOLOv8 model in conjunction with edge detection extraction techniques. The objective of this study is to enhance the accuracy of recognition and facilitate communication for individuals with hearing impairments. The dataset was obtained from Kaggle and comprises images of various BSL hand signs captured against a uniform green background under consistent lighting conditions. The preprocessing steps entailed resizing the images to 640 640 pixels, implementing pixel normalization, filtering out low-quality images, and employing data augmentation techniques such as horizontal flipping, rotation, shear, and brightness adjustments to enhance robustness. Edge detection was implemented to accentuate the contours of the hand, thereby facilitating more precise gesture identification. Manual annotation was performed to generate both bounding boxes and segmentation masks, allowing for the training of two model variants: The first is YOLOv8 (non-segmentation), and the second is YOLOv8-seg (segmentation). Both models underwent training over a period of 100 epochs, employing the Adam optimizer and binary cross-entropy loss. The training-to-testing data splits utilized were 50:50, 60:40, 70:30, and 80:20. The evaluation metrics employed included mAP@50, precision, recall, and F1-score. The YOLOv8-seg model with an 80:20 split demonstrated the optimal performance, exhibiting a precision of 0.974, a recall of 0.968, and mAP@50 of 0.979. These metrics signify the model's capacity for robust detection and localization. Despite requiring greater computational resources, the segmentation model offers enhanced contour recognition, rendering it well-suited for high-precision applications. However, the generalizability of the model is constrained due to the employment of static gestures and controlled backgrounds. In the future, researchers should consider incorporating dynamic gestures, varied backgrounds, and uncontrolled lighting to enhance real-world performance.

Classification of Lontara Script Using K-NN Algorithm, Decision Tree, and Random Forest Based on Hu Moments and Canny Segmentation

2025-11-29T07:48:53+07:00

Lontara script is a traditional writing system of the Bugis-Makassar people in South Sulawesi, used to write the Bugis, Makassar, and Mandar languages. This system is based on an abugida, in which each letter represents a consonant with an inherent vowel. It was once used to record history, customary law, and literature, but its use has declined due to the influence of the Latin alphabet. Today, the Lontara script is preserved through education and digitization as part of the cultural heritage of the Indonesian archipelago. In this article, the researchers attempt to use a dataset of handwritten Lontara Bugis-Makassar characters. The process begins with the collection of character datasets, which are then processed through Canny segmentation and Hu Moment feature extraction to obtain a representation of the shape that is invariant to rotation and scale. The processed data was divided into training and testing data, then classified using the K-NN, Decision Tree, and Random Forest algorithms. The results showed that the KNN algorithm with 6 neighbors achieved the highest accuracy, precision, and recall of 98%. The Decision Tree algorithm achieved an accuracy of 96.67%, precision of 96.22%, recall of 95.33%, and an F1-score of 95.98%. Meanwhile, Random Forest showed an accuracy of 96.67%, precision of 96.34%, recall of 96%, and an F1-score of 95.98%.

Deep Learning-Based Blood Cell Image Classification Using ResNet18 Architecture

2025-11-29T07:48:39+07:00

The classification of white blood cells (WBC) plays a critical role in haematological diagnostics, yet manual examination remains a labour-intensive and subjective process. In response to this challenge, this study investigates the application of deep learning, specifically the ResNet18 convolutional neural network architecture, for the automated classification of blood cell images into four classes: eosinophils, lymphocytes, monocytes, and neutrophils. The dataset used comprises microscopic images annotated by cell type and is divided into training and validation sets with an 80:20 ratio. Standard pre-processing techniques such as image normalization and augmentation were applied to enhance model robustness and generalization. The model was fine-tuned using transfer learning with pre-trained weights from ImageNet and optimized using the Adam optimizer. Performance was evaluated through a comprehensive set of metrics including accuracy, precision, recall, F1-score, mean squared error (MSE), and root mean squared error (RMSE). The best model achieved a validation accuracy of 86.89%, with macro-averaged precision, recall, and F1-score of 0.8738, 0.8690, and 0.8688, respectively. Lymphocyte classification yielded the highest F1-score (0.9515), while eosinophils posed the greatest classification challenge, as evidenced by lower precision and higher misclassification rates in the confusion matrix. Error-based evaluation further supported the model’s consistency, with an MSE of 0.7125 and RMSE of 0.8441. These results confirm that ResNet18 is capable of learning discriminative features in complex haematological imagery, providing an efficient and reliable alternative to manual analysis. The findings suggest potential for practical implementation in clinical workflows and pave the way for further research involving multi-model ensembles or cell segmentation pre-processing for improved precision

Klasifikasi Penyakit Jantung Menggunakan Metode K-Nearest Neighbor

2025-11-28T07:51:09+07:00

Secara global, penyebab kematian nomor satu setiap tahunnya adalah penyakit kardiovaskuler. Penyakit kardiovaskuler adalah penyakit yang disebabkan gangguan fungsi jantung dan pembuluh darah(Kemenkes RI, 2014). K-Nearest Neighbor (KNN) adalah metode yang mencari kelompok objek dalam data training yang paling mirip dengan objek pada data baru atau data testing (Lestari, 2014). Penelitian ini mencakup pengukuran performa (akurasi, presisi, recall dan f-measure) metode KNN dengan nilai K 3 hingga 9 pada objek 1000 data pasien penyakit jantung yang diperoleh dari pusat dataset UCI Machine Learning Repository. Hasil dari pengukuran performa diperoleh nilai K terbaik adalah 6 dimana nilai akurasi 85%, presisi 78%, recall 93% dan f-measure sebesar 85%

Perancangan sistem pendukung keputusan dalam pengalokasian dana bantuan sosial di kabupaten pinrang dengan menggunakan metode AHP

2025-11-28T07:50:55+07:00

Penelitian ini bertujuan untuk membuat sistem yang dapat membantu pemerintah kabupaten Pinrang dalam menentukan penerima bantuan sosial yang layak.Sistem yang digunakan adalah sistem pendukung keputusan dengan menggunakan metode AHP berbasis website.Dalam Sistem ini terdapat 6 Kriteria-kriteria yang dapat membantu pemerintah untuk dapat memperhitungkan manfaat dan resiko dari setiap keputusannya, Kriteria-kriteria tersebut dianalisis menggunakan metode AHP menggunakan berbasis website.Penelitian ini berusaha untuk membentuk suatu sistem pendukung keputusan yang diharapkan dapat membantu pengambil keputusan untuk melaksanakan pertimbangannya. Sistem yang dibangun akan memudahkan pengambil keputusan untuk membuat, menghapus, ataupun mengedit model-model penilaian yang ada. Dengan mengetahui model yang paling tepat untuk masing-masing kelompok ataupun usulan, diharapkan pengalokasian dana Bantuan sosial usaha khususnya di Kabupaten Pinrang Propinsi Sulawesi Selatan dapat diperoleh oleh masyarakat dan wilayah yang benar- benar membutuhkannya

Design and Build an Automatic Spraying System for Shallot Plants using Soil Moisture and Air Temperature Sensors with the Fuzzy Method

2025-11-28T07:50:42+07:00

Agriculture utilizes biological resources to produce food, industrial raw materials, energy sources, and manage the environment. This sector plays a strategic role in national economic development. This research aims to design an automatic spraying system for shallot plants based on soil moisture using soil moisture sensors. This system utilizes soil moisture sensors to detect the water content in the soil as well as soil moisture sensors to measure the air humidity around the plants. Data from both sensors are processed by the microcontroller to regulate the timing and duration of the spraying. The prototype of this system was built using soil moisture sensors, soil moisture sensors, microcontrollers, water pumps, solenoid valves, and other supporting components. Testing was conducted in the field with red onion plants as the test subjects. The results show that the system is capable of functioning effectively in watering plants based on soil moisture levels. The sensor works accurately in measuring water content, while the microcontroller successfully controls the spraying optimally. The implementation of this system has proven to increase water usage efficiency and support better growth of red onion plants. Thus, this automatic spraying system offers an environmentally friendly and efficient solution for irrigation based on soil moisture and soil moisture sensors.

Performance Comparison of MobileNet and EfficientNet Architectures in Automatic Classification of Bacterial Colonies

2025-11-28T07:50:27+07:00

Bacterial colony classification is crucial in microbiology but remains labor-intensive and time-consuming when performed manually. Deep learning, particularly Convolutional Neural Networks (CNNs), enables automated classification, improving accuracy and efficiency. This study compares MobileNetV2 and EfficientNet-B0 for bacterial colony classification, evaluating the impact of data augmentation on model performance. Using the Neurosys AGAR dataset, preprocessing techniques such as histogram equalization, gamma correction, and Gaussian blur were applied, while data augmentation (rotation, noise addition, luminosity adjustments) improved model generalization. The dataset was split (80% training, 20% testing), and models were trained with learning rates (0.0001, 0.001) and epochs (100, 150, 200). Results show EfficientNet-B0 outperforms MobileNetV2, achieving higher validation accuracy and stability, with optimal performance at 150–200 epochs and a lower learning rate (0.0001). Data augmentation significantly improved accuracy and reduced overfitting. While MobileNetV2 remains a lightweight alternative, its performance is heavily reliant on augmentation. These findings highlight EfficientNet-B0 as the superior model, supporting the automation of microbiological diagnostics. Future research should explore hybrid CNN architectures, Vision Transformers (ViTs), and real-time implementation for improved classification efficiency.

Yoga Posture Recognition and Classification Using YOLOv5

2025-12-03T10:10:18+07:00

Yoga, a centuries-old health practice from India, has gained global recognition for its benefits to physical, mental, and emotional well-being. However, incorrect execution of yoga poses can lead to injuries or diminished results. This research develops an automated system for recognizing and classifying yoga postures using YOLOv5, a state-of-the-art deep learning algorithm. YOLOv5, part of the YOLO (You Only Look Once) series, is designed for real-time object detection and offers enhanced performance through features like anchor-free detection and adaptive training strategies. The study collects a dataset of 1,000 images across 20 yoga pose categories, followed by manual annotation and training using transfer learning. Validation results show strong performance, achieving an accuracy of 90% with precision and recall scores of 0.942 and 0.941, respectively, and mAP@50 and mAP@50-95 values of 0.976 and 0.866. Despite challenges with certain poses showing lower accuracy due to variations in posture and dataset limitations, the model demonstrates robustness in detecting and classifying yoga postures effectively. This system has potential applications in artificial intelligence-driven yoga education, enabling practitioners to train independently with real-time feedback

Assessing Machine Learning Techniques for Cryptographic Attack Detection: A Systematic Review and Meta-Analysis

2025-11-28T07:50:00+07:00

The detection of cryptographic attacks is a vital aspect of maintaining cybersecurity, especially as digital infrastructures become increasingly intricate and susceptible to sophisticated threats. This systematic review aims to examine and compare a range of machine learning approaches applied to cryptographic attack detection, focusing on their performance in terms of detection rates, efficiency, and overall effectiveness. A comprehensive review and meta-analysis were conducted, focusing on existing research that utilized machine learning models for identifying cryptographic attacks. The models included in the review were Naïve Bayes, C4.5, Random Forest, Decision Tree, K-Means, and Particle Swarm Optimization (PSO) combined with Neural Networks. Studies were selected based on their relevance to cryptographic security, with particular attention paid to performance metrics like classification accuracy, precision, recall, and area under the curve (AUC). The findings indicated that the C4.5 decision tree model achieved a high classification rate of 98.8%, while both Random Forest and Decision Tree models performed with an accuracy of 99.9%, making them highly suitable for real-time attack detection. Additionally, the PSO + Neural Network model showed enhanced detection precision, illustrating the value of integrating optimization techniques with machine learning models. The use of machine learning, especially with ensemble methods such as Random Forest and Decision Trees, proves to be highly effective for cryptographic attack detection. The study underscores the necessity for customized machine learning solutions in cybersecurity, balancing both high accuracy and operational efficiency. Further research should focus on the real-world deployment of hybrid models to confirm their practical effectiveness.

A Comparative Study of Public Opinion on Indonesian Police: Examining Cases in the Aftermath of the Kanjuruhan Football Disaster

2025-11-28T07:49:46+07:00

This research explores public sentiment towards the Indonesian police using sentiment analysis and machine learning techniques. The study addresses the challenge of understanding public opinion based on social media comments related to significant police cases. The aim is to compare reported satisfaction levels with actual public sentiment. Utilizing the Indonesian RoBERTa base IndoLEM sentiment classifier, comments were analyzed and preprocessed. The classification was conducted using Random Forest (RF) and Complement Naive Bayes (CNB) models, incorporating unigram and bi-gram features. Oversampling techniques were applied to handle data imbalance. The best-performing model, Random Forest with bi-gram features, achieved high evaluation scores, including a precision of 0.91 and accuracy of 0.91. The findings reveal significant insights into public opinion, contributing to improved law enforcement strategies and public trust.

Performance Comparison of MobileNetV2 and NASNetMobile Architectures in Soybean Leaf Disease Classification

2025-11-28T07:49:32+07:00

Soybean is one of the essential commodities in Indonesia, commonly used as a raw material for tofu and tempeh, making it highly sought after. However, soybean production has decreased by up to 30% due to disease attacks, necessitating preventive measures. This study aims to compare two Convolutional Neural Network (CNN) architectures, MobileNetV2 and NASNetMobile, in classifying soybean leaf diseases. The models were trained using a leaf image dataset collected directly from agricultural fields and categorized into five classes. The dataset underwent augmentation to increase its size, resulting in a total of 6,000 images, which were then split with an 80:10:10 ratio. The models were trained using the Adam optimizer with a learning rate of 0.001, optimized using ReduceLROnPlateau, and a dropout rate of 0.2 to prevent overfitting. Evaluation results using a confusion matrix indicated that MobileNetV2 performed better with an accuracy of 96.67%, precision of 96.70%, recall of 96.67%, and an F1-score of 96.68%, compared to NASNetMobile, which achieved an accuracy of 86.33%, precision of 86.91%, recall of 86.33%, and an F1-score of 86.40%.

Comparative Analysis of Random Forest and LSTM Models for Customer Churn Prediction Based on Customer Satisfaction and Retention

2025-11-28T07:49:19+07:00

Forecasting of Customer churn and prediction is important for sustaining long-term customer relationships and enhancing profitability in competitive markets. This study outlines the comparison of the performance of Random Forest (RF) and Long Short-Term Memory (LSTM) models in predicting customer churn using a dataset of 2,850 customers. The dataset comprises of behavioral, transactional, and satisfaction metrics. Key evaluation metrics include accuracy, precision, recall, F1-score, and AUC-ROC. The result clearly shows that while Random Forest offers strong baseline performance with interpretable results, LSTM captures temporal patterns very effectively and performs better in identifying subtle churn indicators, especially in sequential customer satisfaction data. The result of metrics evaluated shows LSTM has an Accuracy of 88.6%,Precision of 85.3%,Recall of 82.5%,F1-score of 83.9% and AUC-ROC of 0.92 while Random Forest has Accuracy of 85.2%,Precision of 81.5%,Recall of 77.0%,F1- Score of 79.2% and AUC-ROC of 0.89. This shows the preference of LSTM for rapidly changing and large volume dataset over RF excellence in less complicated and sparse dataset

Enhanced NER Tagging Model using Relative Positional Encoding Transformer Model

2025-11-28T07:49:05+07:00

Named Entity Recognition remains pivotal for structuring unstructured text, yet existing models face challenges with long-range dependencies, domain generalisation, and reliance on large, annotated datasets. To address these limitations, this paper introduces a hybrid architecture combining a transformer model enhanced with relative positional encoding and a rule-based refinement module. Relative positional encoding improves contextual understanding by capturing token relationships dynamically, while rule-based post-processing corrects inconsistencies in entity tagging. After being evaluated on the Groningen Meaning Bank and Wikipedia Location datasets, the proposed model achieves state-of-the-art performance, with validation accuracies of 98.91% for Wikipedia and 98.50% for GMB with rule-based refinement, surpassing existing benchmark research of 94.0%. The relative positional encoding contributes 34.42% to the attention mechanism’s magnitude, underscoring its efficacy in modelling token interactions. Results demonstrate that integrating transformer-based architectures with rule-based corrections significantly enhances entity classification accuracy, particularly in complex and morphologically diverse contexts. This work highlights the potential of hybrid approaches to optimise sequence labelling tasks across domains.

Integration of Yolov8 And Instance Segmentation in The Chinese Sign Language (CSL) Recognition System

2025-11-28T07:48:51+07:00

This research aims to develop an advanced recognition system for Chinese Sign Language (CSL) by integrating YOLOv8 and instance segmentation techniques. Communication through sign language is essential for the deaf community, and although CSL has been standardized in China, recognizing complex hand movements remains a significant challenge. YOLOv8 is employed for real-time object detection, while instance segmentation is used to provide more detailed analysis of hand gestures. This integration seeks to improve hand gesture recognition under varying lighting and background conditions, which is crucial for more effective communication between the deaf community and the wider society. The study evaluates the system’s performance using common metrics such as Mean Average Precision (mAP), precision, recall, and F1-score. The findings indicate that the non-segmentation model performs better than the segmentation model in terms of precision, recall, and mAP, especially when trained with a larger dataset ratio. The non-segmentation model provides faster and more accurate detection, while the segmentation model, despite using the same amount of data, shows potential for more detailed recognition of gestures. Although the segmentation model shows improvements in the F1-score with more detailed accuracy, the non-segmentation model remains superior in overall detection speed and accuracy. This research highlights the importance of integrating YOLOv8 and instance segmentation for improving CSL recognition, with better results on the non-segmentation model for more effective communication for the deaf

Performance Analysis of Random Forest and Naive Bayes Methods for Classifying Tomato Leaf Disease Datasets

2025-11-28T07:48:36+07:00

Tomato productivity is often disrupted by diseases affecting tomato plants, such as early blight and late blight, which can significantly reduce crop yields. Early detection of these diseases is crucial to prevent greater losses. This study compares two machine learning-based classification methods, namely Random Forest and Naïve Bayes, in identifying diseases on tomato leaves. The dataset used consists of 1,255 images obtained from Kaggle, with the data divided into two classes: early blight with 627 images and late blight with 628 images, which then underwent preprocessing and data splitting with three ratio scenarios (70:30, 80:20, and 90:10) for training and testing. This study shows that it only achieved an accuracy of 76.98%, while the Random Forest method had the highest accuracy of 92.86% in the 90:10 data ratio scenario. Thus, the Random Forest method proves to be more effective in classifying tomato leaf diseases compared to Naïve Bayes. The implementation of this model can help farmers detect diseases more quickly and accurately, thereby increasing agricultural productivity.

Optimization of Nglegena Javanese Script Recognition With Machine Learning Based on Zoning And Normalization of Feature Extraction

2025-11-28T07:48:23+07:00

Machine learning offers promising solutions for the recognition of handwritten Javanese Nglegena script, which is crucial for preserving Indonesia's cultural heritage. This study explores the application of several supervised learning algorithms-K-Nearest Neighbors (KNN), Naïve Bayes, Decision Tree, and Random Forest-for classifying handwritten images of Nglegena Javanese script. Feature extraction is performed using a zoning technique, where each character image is divided into multiple zones (16, 25, 36, and 64) to capture local details. The extracted features are further processed using normalization methods, including Min-Max, Z-Score, and Binary normalization, to ensure uniform data distribution. The dataset, consisting of 600 images representing Javanese Nglegena characters, is split into training and testing sets using various ratios. Experimental results show that the combination of Naïve Bayes classification, 36-zone feature extraction, and Min-Max or Z-Score normalization achieves the highest accuracy of 65%. These findings demonstrate that optimizing zoning and normalization can significantly enhance the accuracy of machine learning models for Javanese script recognition. The research contributes to developing Optical Character Recognition (OCR) technology for Javanese script, supporting the digital preservation of Indonesia's historical and cultural heritage.

Comparative Analysis of OCR Methods Integrated with Fuzzy Matching for Food Ingredient Detection in Japanese Packaged Products

2025-11-28T07:48:09+07:00

Advances in digital technology offer a solution to the challenges faced by foreign consumers in understanding ingredient information on Japanese food packaging, especially due to the use of Kanji, Hiragana, and Katakana characters. This study develops and reveals an allergen detection method based on Optical Character Recognition (OCR) and fuzzy match applied to Japanese food packaging. Three OCR methods—Google Vision OCR, PaddleOCR, and Tesseract OCR—were compared and evaluated using Precision, Recall, F1-Score, and Confusion Matrix metrics.The study began with the collection of food product images from bold sources, followed by text extraction using the three OCR methods. The extracted text was then cleaned and normalized before being matched with ground truth data using fuzzy match. Testing was conducted on 10 product image samples with varying quality and lighting conditions. The results showed that Google Vision OCR outperformed the others, achieving an average F1 score of 1.00, followed by PaddleOCR (0.75), and Tesseract OCR (0.30). Google Vision was the most consistent in detecting allergens such as 乳 (milk), 小麦 (wheat), and 卵 (egg). These findings suggest that the integration of OCR and fuzzy matching is effective in detecting allergens, even in the presence of textual variations and recognition errors. This study contributes to the development of automated food recommendation systems for foreign consumers, especially those who have food preferences due to allergies, religious beliefs, or personal preferences.

Bayesian Analysis of Two Parameter Weibull Distribution Using Different Loss Functions

2025-10-30T06:13:25+07:00

This paper focuses on the Bayesian technique to estimate the parameters of the Weibull distribution. At this location, we use both informative and non-informative priors. We calculate the estimators and their posterior risks using different asymmetric and symmetric loss functions. Bayes estimators do not have a closed form under these loss functions. Therefore, we use an approximation approach established by Lindley to get the Bayes estimates. A comparative analysis is conducted to compare the suggested estimators using Monte Carlo simulation based on the related posterior risk. We also analyze the impact of distinct loss functions when using various priors.

Improving Multi-Class Classification on 5-Celebrity-Faces Dataset using Ensemble Classification Methods

2025-09-20T07:25:27+07:00

This study aims to compare the performance between Random Forest Classifier and Gaussian Naïve Bayes Classifier in classification. Several evaluation metrics such as accuracy, precision, recall, and F1-score were used to analyze the performance of both models. The dataset used has specific characteristics that influence the evaluation results. The research findings indicate that Random Forest Classifier outperforms Gaussian Naïve Bayes Classifier in most of the evaluation metrics. Random Forest Classifier achieves higher accuracy and better precision, recall, and weighted F1-score. However, it should be noted that Random Forest Classifier also has more outliers compared to Gaussian Naïve Bayes Classifier when visualized using boxplots. Therefore, in selecting a classification model, a trade-off between higher performance and sensitivity to outliers needs to be considered. Further statistical testing and advanced evaluation are required to gain a deeper understanding of the impact and interpretation of the obtained results. This study provides valuable insights into understanding the comparison between these two classification models and their implications in different contexts.

M2SmallLint : software health monitoring tool

2025-09-20T07:25:11+07:00

Developing error-free applications is a major challenge for computer scientists. Tools to remedy this problem have been developed, notably Rule Checkers and proof assistants. As a particular case of error, a bug is by nature intangible, invisible and difficult to trace. We propose to investigate the correlations between the alerts generated by rule checkers and the internal quality of the software system. In this first version of the work, we present M2SmallLint, a tool for visualizing and navigating through source code properties in order to locate potential errors. This tool enables the visualization of software health.

Design and Implementation of Health Supplies Inventory Monitoring System Using First Expired First Out Method

2025-09-20T07:24:22+07:00

The manual management of health supplies inventory by pharmacists at UPTD Puskesmas Toari, Southeast Sulawesi, poses a challenge in monitoring the stock of health supplies due to the manual recording, resulting in less effective monitoring of the inventory. This study aims to design a monitoring system for health supplies inventory using the First Expired First Out (FEFO) method. The FEFO method is suitable for managing the dispensing of health supplies that are closest to their expiration date by utilizing them first to avoid any expired supplies being consumed. The development of this health supplies inventory monitoring system will utilize the waterfall method, consisting of five stages: Requirement Analysis, Design, Implementation, Testing, and Maintenance. The system will be web-based, providing efficient management of health supplies inventory, including medications, disposable medical materials (BMHP), and medical equipment, while also providing timely and accurate information. The research findings indicate that the system can effectively monitor the health supplies inventory at the UPTD Puskesmas Toari Pharmacy Warehouse, utilizing the FEFO method to prioritize the dispensing of health supplies with the nearest expiration date. The system also includes a notification feature to inform the need for immediate dispensing of health supplies. Questionnaire results showed that respondents agreed with the implementation of the monitoring system at the UPTD Puskesmas Toari Pharmacy Warehouse

Comparison of Performance of Four Distance Metric Algorithms in K-Nearest Neighbor Method on Diabetes Patient Data

2025-09-20T07:24:01+07:00

Diabetes is a chronic disease that occurs when the pancreas no longer produces insulin or when the body cannot effectively use the insulin it produces. The aim of this study is to analyze and compare the classification performance on diabetes patient dataset using four distance metric algorithms in the K-Nearest Neighbor (K-NN) method. Based on previous research, the performance values obtained were not sufficiently high, not exceeding 80%. Therefore, some actions are needed with the hope of obtaining new performance values and making comparisons with previous studies. Based on the test results using the confusion matrix, the accuracy level using Euclidean distance measurement obtained the best performance value at k=17 with 10-k fold, with an accuracy of 85.71%, precision of 86.24%, recall of 85.71%, and F-measure of 85.12%. The Manhattan distance measurement obtained the best performance value at k=25 with 10-k fold, with an accuracy of 85.53%, precision of 85.54%, recall of 85.53%, and F-measure of 85.10%. The Minkowski distance measurement obtained the best performance value at k=17 with 10-k fold, with an accuracy of 85.71%, precision of 86.24%, recall of 85.71%, and F-measure of 85.12%. On the other hand, the Hamming distance measurement obtained the best performance value at k=23 with 10-k fold, with an accuracy of 75.32%, precision of 79.27%, recall of 75.32%, and F-measure of 71.45%.

Comparison of K-Nearest Neighbor and Decision Tree Methods using Principal Component Analysis Technique in Heart Disease Classification

2025-09-20T07:23:39+07:00

Heart disease has become a global health issue that can threaten anyone, regardless of age. Numerous research efforts have been made to develop classification methods that can aid in diagnosing heart disease. In this study, we compared two classification methods, namely K-Nearest Neighbor (KNN) and Decision Tree, by applying Principal Component Analysis (PCA) technique to the heart disease classification. The dataset used contains relevant clinical attributes. After analyzing the dataset and performing data preprocessing, we applied PCA to reduce the dataset's dimensions. PCA models with KNN and Decision Tree were implemented and evaluated using performance metrics such as Confusion Matrix, F1 Score, and Accuracy. The analysis results showed that the PCA model with Decision Tree outperformed the PCA model with KNN in terms of accuracy. The Decision Tree model successfully classified all data correctly, while KNN had some misclassifications. This research recommends using the PCA model with Decision Tree for heart disease classification with the best performance. However, further research with larger datasets is needed for a deeper understanding

Comparison Analysis of Classification Model Performance in Lung Cancer Prediction Using Decision Tree, Naive Bayes, and Support Vector Machine

2025-09-20T07:23:21+07:00

This research aims to analyze the performance of three classification models, namely Decision Tree Classifier, Support Vector Machine, and Naive Bayes Classifier, in predicting lung cancer using the "Lung Cancer Prediction" dataset. The performance evaluation metrics used include accuracy, precision weighted, recall weighted, and F1 weighted. As a preliminary step, exploratory data analysis (EDA) and dataset preprocessing, including feature selection, data cleaning, and data transformation, were conducted. The test data results showed that the Decision Tree Classifier and Naive Bayes Classifier had similar performances with high accuracy, precision, recall, and F1 values. Meanwhile, the Support Vector Machine also exhibited competitive performance, although its precision weighted value was slightly lower. Additionally, an outlier analysis was conducted using box plots, revealing that the Decision Tree Classifier had 2 outlier values, while the Support Vector Machine had 4 outlier values, and Naive Bayes had no outlier values. In conclusion, all three classification models demonstrated good potential in lung cancer prediction. However, selecting the best model requires consideration of relevant evaluation metrics for the application and accommodating the limitations of each model. Further evaluation and in-depth analysis are needed to ensure the reliability of the models in predicting lung cancer cases more accurately and consistently.

Spatial Prediction of Stunting Incidents Prevalence Using Support Vector Regression Method

2025-09-20T07:23:00+07:00

Stunting in toddlers is a major nutritional problem faced by Indonesia, with a high incidence rate occurring in several provinces across the country. This nutritional issue can occur at any age, starting from the prenatal stage, infancy, childhood, adolescence, adulthood, and even in the elderly. To reduce the prevalence of stunting in affected provinces, prevention efforts are essential, including predicting the spread of stunting incidents in each region. Therefore, this research conducted spatial prediction of the prevalence rate of stunting incidents using Machine Learning, specifically Support Vector Machine based Regression. The results of this study produced a prediction model with an RMSE (Root Mean Square Error) value of 0.008689303 and a multiple correlation coefficient of 0.65912721. Based on these findings, the predictive model utilized demonstrated satisfactory performance in predicting the prevalence rate of stunting incidents in each area