Jiang Lu, Head of Google’s VideoPoet Project, Joined TikTok
China’s Tech Media “Jiaziguangnian” has exclusively learned that Jiang Lu, a senior scientist at Google and part-time professor at Carnegie Mellon University’s School of Computer Science, has joined TikTok.
Jiang Lu is the head of Google’s VideoPoet project, a large-scale video generation model launched by Google in December 2023, similar to the recently released Sora by OpenAI.
Another clue is that on Thursday, Kunlun World Wide Technology founder Zhou Yahui stated on his social media that an author of a paper has joined ByteDance North America as a Tech Leader, referring to Jiang Lu, the author of the paper “VideoPoet: A large language model for zero-shot video generation.”
Jiang Lu studied computer science at Xi’an Jiaotong University, the Free University of Brussels, and Carnegie Mellon University, and interned at Microsoft Research Asia, Google Research, and Yahoo Research. In 2017, after graduating, Jiang Lu joined Google as a founding member of Google Cloud AI and was the first researcher hired by Dr. Li Jia and Dr. Li Feifei. Subsequently, Jiang Lu worked at Google Research. His research has been applied to various Google products such as YouTube, cloud services, Cloud AutoML, advertising, Waymo, and translation services, impacting the daily lives of billions of users worldwide.
Jiang Lu’s work in natural language processing (ACL) and computer vision (CVPR) has been nominated for best paper at top conferences. He is an active member of the research community, serving as an AI reviewer for the U.S. National Seed Fund (NSF SBIR) and regularly chairing prominent conferences such as CVPR, ICCV, NeurlPS, ACM Multimedia, and AAAI.
Jiang Lu’s research interests primarily lie in the interdisciplinary field of multimedia, focusing on generative AI and video creation. Since 2019, he has been exploring the use of Transformers in image and video generation research.
Unlike the Diffusion+Transformer architecture used in Sora, the VideoPoet video generation model led by Jiang Lu employs a single Transformer architecture, capable of transforming any autoregressive language model or large language model into a high-quality video generator. It supports generating square or vertical videos tailored for short-form content and allows video input to generate audio.
VideoPoet utilizes a data processing technique called Tokenizer to encode video and audio clips into discrete token sequences, which can be converted back to their original representations. Video and image data use the MAGVIT V2 technique, while audio data uses the SoundStream technique. By training an autoregressive language model with multiple Tokenizers, VideoPoet learns across video, image, audio, and text modalities. Once the model generates tokens based on certain contexts, these tokens can be decoded back into viewable representations by the tokenizer decoder.
Three weeks ago, Jiang Lu announced on a professional social platform that it was his Last Day at Google, expressing pride in the video generation projects (VideoPoet, MAGVIT, WALT, etc.) he worked on at Google. He mentioned continuing to stay in the Bay Area and embarking on a new journey in the video generation field. Now, it seems this new journey leads to TikTok.
It is also worth mentioning that ByteDance has made comprehensive deployments in the large model field, launching its self-developed “Yunque Large Model” at the model layer and products like ChatGPT-based chatbot Dou Bao. At the end of 2023, ByteDance established a new AI application department called Flow and introduced multiple products overseas, including Coze. Recently, Zhang Nan, CEO of the Douyin Group, stepped down to focus on the development of the AI tool Jianying.
ByteDance recently refuted rumors about the launch of a Chinese version of Sora, stating, “The product is not yet fully developed and there is a significant gap from foreign models.”
Apparently, on the other hand, ByteDance is actively recruiting talent.
SEE ALSO: ByteDance Gathers Multiple Executives to Increase AI R&D