856 B
856 B
bookmark-of | date | post_meta | tags | title | type | url | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
2022-04-02T01:25:42.935585 |
|
|
models/official/projects/token_dropping at master · tensorflow/models | bookmarks | /bookmarks/2022/04/02/models-official-projects-token-dropping-at-master-tensorflow-models1648877142 |
Token dropping aims to accelerate the pretraining of transformer models such as BERT without degrading its performance on downstream tasks.
A BERT model pretrained using this token dropping method is not different to a BERT model pretrained in the conventional way: a BERT checkpoint pretrained with token dropping can be viewed and used as a normal BERT checkpoint, for finetuning etc.