SuperCLUE is an evaluation benchmark for general large models available in Chinese.

The main question it answers is: the effect of the Chinese large-scale model under the current situation of vigorous development of the general-purpose large-scale model. including but not limited to:

  • The effect of these models on different tasks
  • To what extent has it achieved compared to internationally representative models
  • How do these models compare to humans?

It tries to use multi-dimensional ability to test on a series of domestic and foreign representative models. SuperCLUE is a further development of the Chinese Language Understanding Evaluation Benchmark (CLUE) in the era of general artificial intelligence.

Composition and Features of SuperCLUE

Focusing on the ability to comprehensively evaluate the large model, it can comprehensively test the effect of the large model, and also examine the understanding and accumulation of the model’s unique tasks in Chinese. SuperCLUE evaluates the ability of the model from three different dimensions:Basic ability, professional ability and Chinese characteristic ability.

It includes common representative model capabilities, such as semantic understanding, dialogue, logical reasoning, role simulation, code, generation and creation, etc. 10 capabilities.

It includes middle school, university and professional examinations, covering more than 50 abilities from mathematics, physics, geography to social sciences.

For tasks with Chinese characteristics, it includes 10 kinds of abilities such as Chinese idioms, poetry, literature, and fonts.

SuperCLUE features

  • Multi-dimensional ability inspection (70+ sub-abilities in 3 categories)

The Chinese large model is tested from three different angles to examine the comprehensive ability of the model; and each sub-ability contains ten or more different subdivision capabilities.

  • Automated assessment (one-click assessment)

Test the effects of different models in a relatively objective form through automated evaluation methods, and you can evaluate large models with one click.

  • Wide range of representative models (9 models)

A number of representative and available models at home and abroad were selected for evaluation to reflect the development status of domestic large-scale models and to understand the gap or relative advantages and disadvantages with international leading models.

In the case of general artificial intelligence development, it also provides a comparison of indicators of the performance of the model relative to humans.

