Pre-Training with Whole Word Masking for Chinese BERT (Chinese BERT-wwm series model)

In the field of natural language processing, pre-trained language models (Pre-trained Language Models) have become a very important basic technology. In order to further promote the research and development of Chinese information processing, Harbin Institute of Technology Xunfei Joint Laboratory (HFL) released the Chinese pre-training model BERT-wwm based on Whole Word Masking technology, and models closely related to this technology: BERT-wwm-ext, RoBERTa-wwm-ext, RoBERTa-wwm-ext-large, RBT3, RBTL3.

Whole Word Masking (wwm)temporarily translated as全词Maskor整词Mask, is an upgraded version of BERT released by Google on May 31, 2019, which mainly changes the training sample generation strategy in the original pre-training stage. To put it simply, the original WordPiece-based word segmentation method will divide a complete word into several subwords. When generating training samples, these divided subwords will be randomly masked.exist全词MaskIn , if a WordPiece subword of a complete word is masked, other parts of the word will also be masked, that is全词Mask.

It should be noted that the mask here refers to the generalized mask (replaced by[MASK]; keep the original vocabulary; randomly replace with another word), not limited to words replaced by[MASK]The case of the label. For more detailed instructions and examples, please refer to:#4

Similarly, due to Google’s officialBERT-base, ChineseChinese is based onCharacterSegmentation for granularity does not take into account the Chinese word segmentation (CWS) in traditional NLP.HFL The method of full-word Mask is applied to Chinese, Chinese Wikipedia (including simplified and traditional) is used for training, and theHarbin Institute of Technology LTPAs a word segmentation tool, that is, to form the samewordAll Chinese characters are masked.

The following text shows全词MaskGenerated samples of . Note: For ease of understanding, in the following examples only the replacement is considered[MASK]The case of the label.








illustrate sample
original text Use a language model to predict the probability of the next word.
participle text Use a language model to predict the probability of the next word.
Original Mask input use language [MASK] type to [MASK] Test the pro of the next word [MASK] ##lity.
Full word Mask input use language [MASK] [MASK] Come [MASK] [MASK] the next word [MASK] [MASK] [MASK] .

Chinese model download

This directory mainly contains the base model, so HFL is not marked in the model abbreviationbasetypeface. Models of other sizes will be marked with corresponding marks (such as large).

  • BERT-large模型:24-layer, 1024-hidden, 16-heads, 330M parameters
  • BERT-base模型:12-layer, 768-hidden, 12-heads, 110M parameters















model abbreviation corpus Google download Xunfei cloud download
RBT6, Chinese EXT data[1] TensorFlow (password XNMA)
RBT4, Chinese EXT data[1] TensorFlow (password e8dN)
RBTL3, Chinese EXT data[1] TensorFlow

PyTorch
TensorFlow (password vySW)
RBT3, Chinese EXT data[1] TensorFlow

PyTorch
TensorFlow (password b9nx)
RoBERTa-wwm-ext-large, Chinese EXT data[1] TensorFlow

PyTorch
TensorFlow (password u6gC)
RoBERTa-wwm-ext, Chinese EXT data[1] TensorFlow

PyTorch
TensorFlow (password Xe1p)
BERT-wwm-ext, Chinese EXT data[1] TensorFlow

PyTorch
TensorFlow (password 4cMG)
BERT-wwm, Chinese Chinese Wiki TensorFlow

PyTorch
TensorFlow (password 07Xj)
BERT-base, ChineseGoogle Chinese Wiki Google Cloud
BERT-base, Multilingual CasedGoogle Multilingual Wiki Google Cloud
BERT-base, Multilingual UncasedGoogle Multilingual Wiki Google Cloud

[1] EXT data includes: Chinese Wikipedia, other encyclopedias, news, Q&A and other data, with a total word count of 5.4B.

#ChineseBERTwwm #Homepage #Documentation #Download #Chinese #BERTwwm #Series #Models #News Fast Delivery

Leave a Comment

Your email address will not be published. Required fields are marked *