chanmuzi
<Multi-modal> BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models