GLM homepage, documentation and downloads – a general pre-training framework for natural language understanding and generation – News Fast Delivery

GLM (General Language Model) It is a general-purpose language model pre-trained with autoregressive filling-in-the-blank targets launched by Tsinghua University, which can be fine-tuned for various natural language understanding and generation tasks.

GLM improves on gap-fill pre-training by adding 2D positional encoding and allowing prediction spans in arbitrary order, resulting in better performance than BERT and T5 on NLU tasks. At the same time, GLM can be pre-trained for different types of tasks by changing the number and length of blanks. GLM outperforms BERT, T5, and GPT given the same model size and data on a wide range of tasks across NLU, conditional, and unconditional generation, and achieves 1.25x BERT Larger parameters from a single pretrained model The best performance of , indicating its generalizability to different downstream tasks.

For a detailed description of GLM, please refer to the paper GLM: General Language Model Pretraining with Autoregressive Blank Infilling (ACL 2022)

ChatGLM-6B It is optimized for Chinese QA and dialogue on the basis of the GLM framework.

pre-trained model

Available from OneDrive or Tsinghua-Cloud Download the pretrained model used in the paper.

name	Params	Language	Corpus	objective	File	Config
GLM-Base	110M	English	Wiki+Book	Token	glm-base-blank.tar.bz2	model_blocklm_base.sh
GLM-Large	335M	English	Wiki+Book	Token	glm-large-blank.tar.bz2	model_blocklm_large.sh
GLM-Large-Chinese	335M	Chinese	WuDao Corpora	Token+Sent+Doc	glm-large-chinese.tar.bz2	model_blocklm_large_chinese.sh
GLM-Doc	335M	English	Wiki+Book	Token+Doc	glm-large-generation.tar.bz2	model_blocklm_large_generation.sh
GLM-410M	410M	English	Wiki+Book	Token+Doc	glm-1.25-generation.tar.bz2	model_blocklm_1.25_generation.sh
GLM-515M	515M	English	Wiki+Book	Token+Doc	glm-1.5-generation.tar.bz2	model_blocklm_1.5_generation.sh
GLM-RoBERTa	335M	English	ROBERTa	Token	glm-roberta-large-blank.tar.bz2	model_blocklm_roberta_large.sh
GLM-2B	2B	English	Pile	Token+Sent+Doc	glm-2b.tar.bz2	model_blocklm_2B.sh
GLM-10B	10B	English	Pile	Token+Sent+Doc	download	model_blocklm_10B.sh
GLM-10B-Chinese	10B	Chinese	WuDao Corpora	Token+Sent+Doc	download	model_blocklm_10B_chinese.sh

Unzip the downloaded file into a local folder and set in the corresponding scriptCHECKPOINT_PATHis the folder path.

result

SuperGLUE

Validation set, single model, single task fine-tuning

model	COPA	WSC	RTE	WiC	CB	MultiRC	Bool Q	ReCoRD
GLM-10B	98.0	95.2	93.1	75.7	98.7/98.2	88.1/63.3	88.7	94.4/94.0
DeBERTa-XXLarge-v2	97.0	–	93.5	–	–	87.8/63.6	88.3	94.1/93.7

Seq2Seq

CNN/Daily Mail (test set, no extra data used)

model	ROUGE-1	ROUGE-2	ROUGE-L
GLM-10B	44.7	21.4	41.4
T5-11B	43.5	21.6	40.7
PEGASUS-Large	44.2	21.5	41.4
BART-Large	44.2	21.3	40.9

XSum (test set, no extra data used)

model	ROUGE-1	ROUGE-2	ROUGE-L
GLM-10B	48.9	25.7	40.4
PEGASUS-Large	47.2	24.6	39.3
BART-Large	45.1	22.3	37.3

Language Modeling

test set, zero sample

model	LAMBADA (accuracy)	Wikitext103 (perplexity)
GLM-10B (bi)	72.35	11.33
GLM-10B (uni)	67.18	12.22
GPT-2	52.66	17.48
Megatron-LM (8.3B)	66.51	10.81
Turing-NLG	67.98	10.21

#GLM #homepage #documentation #downloads #general #pretraining #framework #natural #language #understanding #generation #News Fast Delivery

pre-trained model

result

SuperGLUE

Seq2Seq

Language Modeling

Leave a Comment Cancel Reply