in , , , ,

Wu Dao 2.0 – Bigger, Stronger, Faster AI From China

It is no secret that China has COVID-19 under control. When you travel there you need to go through a 2-week hotel quarantine but once you are in the country, you are safe. Probably even safer than before COVID as wearing a mask is now part of the etiquette, and the many other viral respiratory diseases are likely to be on the decline. Hence, when I got invited to speak at the annual conference of the Beijing Academy of Artificial Intelligence (BAAI) in the AI for healthcare section, I readily accepted.

The BAAI is a great platform for showcasing technology and talent across broad categories. The non-profit institute encourages scientists to tackle problems and promote breakthroughs in AI theories, tools, systems and applications. In addition, the BAAI has a unique focus on long-term research on AI technology.

AI is big in China. So big that over 70,000 people register for the event and many more tune in to watch the BAAI presentations after the event. And may of these present very novel approaches, algorithms, systems, and applications. However, the real hit at the BAAI was Wu Dao 2.0 – a system that surpassed OpenAI’s GPT-3 in so many ways.

The Encyclopedia Britannica defines language as a ‘system of conventional spoken, manual, or written symbols by means of which human beings, as members of a social group and participants in its culture, express themselves.’ We can conclude from this definition that language is an integral part of human connection. Not only does it allow us to share ideas, thoughts and feelings with each other, language also allows us to create and build societies and empires. In simple words: language makes us human.

According to Professor Gareth Gaskell of the Department of Psychology at the University of York, the average 20-year-old knows between 27,000 and 52,000 different words. By age 60, that number averages between 35,000 and 56,000. Therefore, when we use words in a conversation, the brain has to make a quick decision regarding which words to use and in what sequence. In this context, the brain works as a processor that can do multiple things at the same time.

Language scientists suggest that each word we know is represented by a separate processing unit that has one job: to assess the likelihood of incoming speech matching that particular word. In the context of the brain, the processing unit that represent a word is similar to a pattern of activity across a group of neurons in the brain. So when we hear the beginning of a word, several thousand such units become active because there are many possible matches.

Most people can comprehend up to about eight syllables per second. However, the goal is not to recognize the word but to access its stored meaning. The brain accesses many possible meanings of the word before it has been fully identified. Studies show that upon hearing a word fragment like ‘cap,’ listeners start to register multiple possible meanings like ‘captain’ or ‘capital’ before the full word emerges.

Like most things driven by artificial intelligence in the 21st century, language is also evolving to take different shapes and meanings. Recently the concept of ‘language models’ has taken the center stage in AI. In essence, language models determine word probability by analyzing text data. This means that language models interpret data through the use of statistical and probabilistic techniques to figure out the probability of a given sequence of words. Language models are commonly used in natural language processing applications such as those which generate text as output. These include machine translation and question answering.

When Microsoft revealed its language model Turing-NLG in February 2020, it was hailed as the largest model ever published and one which outperformed other models on various language modeling benchmarks. Upon its release, Turing-NLG published at 17 billion parameters and could generate words to complete open-ended textual tasks. The model was also able to generate direct answers to questions and summaries of input documents.

In March of the same year, OpenAI unveiled their version of an autoregressive language model called Generative Pre-trained Transformer 3 (GPT-3), that uses deep learning to create human-like text. This third-generation language model in the GPT-n series has a capacity of 175 billion machine learning parameters. OpenAI researchers released a paper in which they demonstrated that GPT-3 can generate news articles that human evaluators have difficulty distinguishing from articles written by humans. These researchers also claim that the language model can be trained to generate 100 pages of content that cost only a few cents in energy costs.

The GPT-3 was deemed so strong and powerful that Microsoft licensed exclusive use of the language model and its underlying code.

Just a year later, however, another language model took over both GPT-3 and Turing-NLG in terms of its innovation and ingenuity.

This model, called Wu Dao 2.0, was showcased at the BAAI. The work behind Wu Dao 2.0, which is dubbed as China’s first homegrown super-scale intelligent model system, was led by BAAI Research Academic Vice President and Tsinghua University Professor Tang Jie. He was supported by a team of over 100 AI scientists from Peking University, Tsinghua University, Renmin University of China, Chinese Academy of Sciences and other institutions.

Wu Dao 2.0 is actually the successor to Wu Dao 1.0, which was unveiled by the BAAI earlier this year. Wu Dao 2.0 truly is China’s bigger and better answer to GPT-3.

Firstly, unlike GPT-3, Wu Dao 2.0 develops both in Chinese and English with skills acquired by analyzing 4.9 terabytes of images and texts. Wu Dao 2.0 also has partnership agreements with 22 brands including smartphone maker Xiaomi and video app Kuaishou. The Chinese model has been trained on 1.75 trillion parameters, which is nearly 10 times greater than the 175 billion parameters GPT-3 was trained on.

Wu Dao 2.0 can also write poems in traditional Chinese styles, answer questions, write essays, and write text for images. Additionally, this language model either reached or surpassed state of the art (SOTA) levels on nine benchmarks, as reported by BAAI. These include:

1-   ImageNet (zero-shot): SOTA, surpassing OpenAI CLIP.

2-   LAMA (factual and commonsense knowledge): Surpassed AutoPrompt.

3-   LAMBADA (cloze tasks): Surpassed Microsoft Turing NLG.

4-   SuperGLUE (few-shot): SOTA, surpassing OpenAI GPT-3.

5-   UC Merced Land Use (zero-shot): SOTA, surpassing OpenAI CLIP.

6-   MS COCO (text generation diagram): Surpassed OpenAI DALL·E.

7-   MS COCO (English graphic retrieval): Surpassed OpenAI CLIP and Google ALIGN.

8-   MS COCO (multilingual graphic retrieval): Surpassed UC (best multilingual and multimodal pre-trained model).

9-   Multi 30K (multilingual graphic retrieval): Surpassed UC.

Lastly, Wu Dao 2.0 unveiled Hua Zhibing, the world’s first Chinese virtual student. Hua can learn, draw pictures and compose poetry. In the future, she will be able to learn coding. This learning ability of Wu Dao 2.0 is in stark contrast to GPT-3.

Other details of how and what exactly Wu Dao 2.0 was trained are not available yet, making it difficult to compare it with GPT-3 directly. However, the new language model is testament to China’s AI ambitions and its superb research programs. There is no doubt that AI innovation will increase in the coming years, and many of these innovative developments will help advance many other industries.

One of the AI luminaries and investors, who helped build at least 7 AI-powered unicorns driven by AI, Dr. Kai-Fu Lee, recently gave a talk at the Hong Kong Science and Technology Park where he explained the power of transformers, and fine-tuning the massive pre-trained models such as Wu Dao 2.0. These models ca be fine-tuned for multiple industries and a large number of applications such as education, finance, law, entertainment, and, most importantly, healthcare and biomedical research.

The applications of transformers in biomedical research is likely to yield new discoveries that will benefit humans regardless of where they live. And we sincerely hope that despite the trade wars, the governments will consider collaborating on biomedical research.

What do you think?

Artificial Intelligence (AI) strategy: 10 questions to ask about yours

How to explain OKRs ( Objectives and Key Results) in plain English

OKRs and KPIs: 6 counterintuitive tips for leaders