Moonshot AI’s Kimi Supports 2 Million Characters Input, Plans to Release Multimodal Product This Year

Mar 20, 2024, 16:01pm2024/03/20 16:01:42 Pandaily

On March 18th, Moonshot AI announced that its conversational AI assistant product Kimi Intelligent Assistant now supports lossless contextual input of 2 million words. When it was released in October last year, Kimi supported a lossless contextual input length of 200,000 words.

At the same time, Moonshot AI also provides more data sources for Kimi intelligent assistant. According to Xu Xinran, Vice President of Moonshot AI, when faced with a problem, Kimi intelligent assistant will try different directions to search and make answers based on this; there is also an improvement in response speed. Xu Xinran stated that based on optimization at the infra level, Kimi intelligent assistant’s generation speed has tripled compared to October last year.

At present, Kimi intelligent assistant has launched web version, Android, iOS, and mini program applications. According to SimilarWeb data, the web version of Kimi intelligent assistant had a visit volume of 2.919 million in February this year, an increase of 104.99% compared to the previous month.

The context window has always been a key focus of competition among major model companies.

In various application scenarios such as long document question answering and long text summarization, the context window is particularly important. During an interview with Tencent<img decoding="async" class="trade-card__loading" layout="fixed" height="50" width="50" src="https://assets.pandaily.com/uploads/2021/12/loading.png" placeholder /> Technology, Yang Zhilin, CEO of Moonshot AI, once described large models as computers and referred to ‘long context’ as the computer’s memory. In his view, this transforms a new computing paradigm into a more universal foundation.

Improving the length of context also has some technical approaches. Such as the retrieval augmented generation (RAG) method based on retrieval enhancement, sliding window method, but these methods often bring negative effects such as decreased intelligence level and increased cost while improving the length of context.

As for the solution to Moonshot AI, Yang Zhilin previously stated that it mainly involves two aspects: innovating network structures and optimizing engineering. Xu Xinran further mentioned at the communication meeting that achieving lossless improvement in context length requires collaborative efforts across data, infrastructure, model training, and product levels. The team has carried out native redesign and development from pre-training models to alignment and inference processes this time.

As the length of the context increases, the usage scenarios of Kimi intelligent assistant are also expanded. In traditional scenarios such as reading papers and analyzing financial reports, it can more accurately meet user needs. In addition, new usage scenarios such as being a game master in tabletop role-playing games (TRPG) have been introduced.

It is worth mentioning that increasing the length of context also poses challenges for model evaluation. In the past, when evaluating the context length of a large model product, a “needle in a haystack” approach was used, where an unrelated sentence was hidden among a large amount of text. Then, through natural language questioning (Prompt), it was observed whether AI could accurately extract this hidden sentence.

But as the industry conducts specialized training according to specific indicators, the needle-in-a-haystack evaluation method gradually loses its original reference significance. When the context length of large models is further increased, the dimensions of evaluation will also become more diverse. Xu Xinran frankly stated that this is still an open issue for discussion in academia.

Although Moonshot AI has made progress in long-text processing, the development of other AI companies in technologies such as natural language understanding and multimodal interaction cannot be ignored. The video generation capability demonstrated by Sora has made the Diffusion Transformer architecture (DiT) a preliminary consensus within the industry, with companies like Shengshu Technology and AIsphere stating their intention to catch up with Sora within this year.

Regarding the progress in multimodal aspects, Moonshot AI was not disclosed during this communication meeting last month. Zhou Xinyu, co-founder of Moonshot AI, stated that they had been conducting research and development on multimodal aspects before Sora’s release. Currently, it is progressing according to its established pace, and related products are expected to be released later this year.

In less than a year since its establishment, Moonshot AI has completed two large financing rounds. After completing a financing round of over $1 billion earlier this year, the company’s valuation has reached $25 billion. However, its workforce is still around 80 people. In response to this, Zhou Xinyu stated that talent density is more important than headcount and that future personnel expansion will be based on demand. He said, ‘Every person we hire must raise the average level of the team.’

Subscribe now to get unlimited access.