GLM (General Language Model) It is a general-purpose language model pre-trained with autoregressive filling-in-the-blank targets launched by Tsinghua University, which can be fine-tuned for various natural language understanding and generation tasks.

GLM improves on gap-fill pre-training by adding 2D positional encoding and allowing prediction spans in arbitrary order, resulting in better performance than BERT and T5 on NLU tasks. At the same time, GLM can be pre-trained for different types of tasks by changing the number and length of blanks. GLM outperforms BERT, T5, and GPT given the same model size and data on a wide range of tasks across NLU, conditional, and unconditional generation, and achieves 1.25x BERT Larger parameters from a single pretrained model The best performance of , indicating its generalizability to different downstream tasks.

For a detailed description of GLM, please refer to the paper GLM: General Language Model Pretraining with Autoregressive Blank Infilling (ACL 2022)

ChatGLM-6B It is optimized for Chinese QA and dialogue on the basis of the GLM framework.

pre-trained model

Available from OneDrive or Tsinghua-Cloud Download the pretrained model used in the paper.














name Params Language Corpus objective File Config
GLM-Base 110M English Wiki+Book Token glm-base-blank.tar.bz2 model_blocklm_base.sh
GLM-Large 335M English Wiki+Book Token glm-large-blank.tar.bz2 model_blocklm_large.sh
GLM-Large-Chinese 335M Chinese WuDao Corpora Token+Sent+Doc glm-large-chinese.tar.bz2 model_blocklm_large_chinese.sh
GLM-Doc 335M English Wiki+Book Token+Doc glm-large-generation.tar.bz2 model_blocklm_large_generation.sh
GLM-410M 410M English Wiki+Book Token+Doc glm-1.25-generation.tar.bz2 model_blocklm_1.25_generation.sh
GLM-515M 515M English Wiki+Book Token+Doc glm-1.5-generation.tar.bz2 model_blocklm_1.5_generation.sh
GLM-RoBERTa 335M English ROBERTa Token glm-roberta-large-blank.tar.bz2 model_blocklm_roberta_large.sh
GLM-2B 2B English Pile Token+Sent+Doc glm-2b.tar.bz2 model_blocklm_2B.sh
GLM-10B 10B English Pile Token+Sent+Doc download model_blocklm_10B.sh
GLM-10B-Chinese 10B Chinese WuDao Corpora Token+Sent+Doc download model_blocklm_10B_chinese.sh

Unzip the downloaded file into a local folder and set in the corresponding scriptCHECKPOINT_PATHis the folder path.

result

SuperGLUE

Validation set, single model, single task fine-tuning






model COPA WSC RTE WiC CB MultiRC Bool Q ReCoRD
GLM-10B 98.0 95.2 93.1 75.7 98.7/98.2 88.1/63.3 88.7 94.4/94.0
DeBERTa-XXLarge-v2 97.0 93.5 87.8/63.6 88.3 94.1/93.7

Seq2Seq

CNN/Daily Mail (test set, no extra data used)








model ROUGE-1 ROUGE-2 ROUGE-L
GLM-10B 44.7 21.4 41.4
T5-11B 43.5 21.6 40.7
PEGASUS-Large 44.2 21.5 41.4
BART-Large 44.2 21.3 40.9

XSum (test set, no extra data used)







model ROUGE-1 ROUGE-2 ROUGE-L
GLM-10B 48.9 25.7 40.4
PEGASUS-Large 47.2 24.6 39.3
BART-Large 45.1 22.3 37.3

Language Modeling

test set, zero sample









model LAMBADA (accuracy) Wikitext103 (perplexity)
GLM-10B (bi) 72.35 11.33
GLM-10B (uni) 67.18 12.22
GPT-2 52.66 17.48
Megatron-LM (8.3B) 66.51 10.81
Turing-NLG 67.98 10.21

#GLM #homepage #documentation #downloads #general #pretraining #framework #natural #language #understanding #generation #News Fast Delivery

Leave a Comment

Your email address will not be published. Required fields are marked *