wafa_ath
DIIDevHeads IoT Integration Server
•Created by wafa_ath on 9/13/2024 in #middleware-and-os
Should I Adjust min_child_samples When Training LightGBM with 100% of the Data?
Hi, I'm training a LightGBM model to optimize performance for an embedded system application, specifically for real-time anomaly detection on edge devices. I'm currently facing a dilemma regarding parameter tuning when increasing the amount of training data.
Initially, I split my dataset into 90% for training and 10% for testing. Using grid search, I found the optimal parameters for the model. Now, I want to leverage 100% of the data to train the model to make it as robust as possible for deployment on resource-constrained devices.
My question is about parameters like
min_child_samples
, which are related to the data volume. When I increase the data from 90% to 100%, should I keep min_child_samples
the same as the value found during the 90% data training? Or should I adjust it because the data volume has increased, considering the constraints of embedded systems?
Could someone provide guidance on how to handle this or share any best practices for tuning parameters when increasing the data size to ensure optimal model performance in embedded system applications?5 replies