The Knowledge Distillation technique
The Knowledge Distillation technique is a process known as teacher-student, which transfers information from a large and complex model (the teacher) to a smaller model (the student) that is faster and more efficient , reducing the use of memory and computational cost. ↙️
But how does this work?
We need a complex model trained on a large amount of data, and we use it to extract information such as hidden layers and predictions on this data. Then, we need a student model whose architecture is simpler compared to the other one. Afterwards, we train it on the original data + this extracted information which helps it learn complex relationships quickly.
But why do we use this technique?
It allows us to deploy the student model on devices with limited resources, for example on self-driving cars which use a model for image recognition that relies on a CNN (Convolutional Neural Network) that has a high accuracy rate but is way too big and we can't put it in the car . This is why they use Knowledge Distillation to transfer what the CNN learned to a smaller model and put it in the car
0 Replies