PolyViT: Co-training Vision Transformers on Images, Videos and Audio

Published in Arxiv, 2021

https://arxiv.org/abs/2111.12993

Valerii Likhosherstov, Anurag Arnab, Krzysztof Choromanski, Mario Lucic, Yi Tay, Adrian Weller and Mostafa Dehghani.