chanmuzi
<LK Lab, Multi-modal> [ZeroTA] Zeor-Shot Dense Video Captioning by Jointly Optimizing Text and Moment (2023.01)