KL Divergence

<Distillation> GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models

2023.07.05· Paper Review

최근(2023.06)에 나온 논문을 읽어보고 간단히 정리했습니다. 혹시 부족하거나 잘못된 내용이 있다면 댓글 부탁드립니다 🙇‍♂️ usechatgpt init success [Google DeepMind] Generalized Knowledge Distillation(GKD)을 통해 기존 distillation이 마주한 한계를 극복. summarization, machine translation, arithmetic reasoning task로 검증 LLM이 가지는 능력을 사이즈가 작은 모델도 지닐 수 있도록 확률 분포 자체를 모방하듯이 학습하는 방식을 Knowledge Distillation(KD)이라고 부릅니다. 그러나 이 방식이 완벽한 것은 아니어서, ‘학습 시 배운 분포와 실제 생성하는 outpu..

티스토리툴바