GLM (General Language Model) It is a general-purpose language model pre-trained with autoregressive filling-in-the-blank targets launched by Tsinghua University, which can be fine-tuned for various natural language understanding and generation tasks.

GLM improves on gap-fill pre-training by adding 2D positional encoding and allowing prediction spans in arbitrary order, resulting in better performance than BERT and T5 on NLU tasks. At the same time, GLM can be pre-trained for different types of tasks by changing the number and length of blanks. GLM outperforms BERT, T5, and GPT given the same model size and data on a wide range of tasks across NLU, conditional, and unconditional generation, and achieves 1.25x BERT Larger parameters from a single pretrained model The best performance of , indicating its generalizability to different downstream tasks.

For a detailed description of GLM, please refer to the paper GLM: General Language Model Pretraining with Autoregressive Blank Infilling (ACL 2022)

ChatGLM-6B It is optimized for Chinese QA and dialogue on the basis of the GLM framework.

pre-trained model

Available from OneDrive or Tsinghua-Cloud Download the pretrained model used in the paper.

name Params Language Corpus objective File Config
GLM-Base 110M English Wiki+Book Token glm-base-blank.tar.bz2
GLM-Large 335M English Wiki+Book Token glm-large-blank.tar.bz2
GLM-Large-Chinese 335M Chinese WuDao Corpora Token+Sent+Doc glm-large-chinese.tar.bz2
GLM-Doc 335M English Wiki+Book Token+Doc glm-large-generation.tar.bz2
GLM-410M 410M English Wiki+Book Token+Doc glm-1.25-generation.tar.bz2
GLM-515M 515M English Wiki+Book Token+Doc glm-1.5-generation.tar.bz2
GLM-RoBERTa 335M English ROBERTa Token glm-roberta-large-blank.tar.bz2
GLM-2B 2B English Pile Token+Sent+Doc glm-2b.tar.bz2
GLM-10B 10B English Pile Token+Sent+Doc download
GLM-10B-Chinese 10B Chinese WuDao Corpora Token+Sent+Doc download

Unzip the downloaded file into a local folder and set in the corresponding scriptCHECKPOINT_PATHis the folder path.



Validation set, single model, single task fine-tuning

model COPA WSC RTE WiC CB MultiRC Bool Q ReCoRD
GLM-10B 98.0 95.2 93.1 75.7 98.7/98.2 88.1/63.3 88.7 94.4/94.0
DeBERTa-XXLarge-v2 97.0 93.5 87.8/63.6 88.3 94.1/93.7


CNN/Daily Mail (test set, no extra data used)

GLM-10B 44.7 21.4 41.4
T5-11B 43.5 21.6 40.7
PEGASUS-Large 44.2 21.5 41.4
BART-Large 44.2 21.3 40.9

XSum (test set, no extra data used)

GLM-10B 48.9 25.7 40.4
PEGASUS-Large 47.2 24.6 39.3
BART-Large 45.1 22.3 37.3

Language Modeling

test set, zero sample

model LAMBADA (accuracy) Wikitext103 (perplexity)
GLM-10B (bi) 72.35 11.33
GLM-10B (uni) 67.18 12.22
GPT-2 52.66 17.48
Megatron-LM (8.3B) 66.51 10.81
Turing-NLG 67.98 10.21

#GLM #homepage #documentation #downloads #general #pretraining #framework #natural #language #understanding #generation #News Fast Delivery

Leave a Comment

Your email address will not be published. Required fields are marked *