RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Controlhttps://arxiv.org/abs/2307.15818 RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic ControlWe study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enab..