ALBERT release
The ALBERT release was done in two steps, over 4 checkpoints of different sizes each time. The first version is noted as "v1", the second as "v2".
Fill-Mask • Updated • 13k • 8Note This model has the following configuration: - 12 repeating layers - 128 embedding dimension - 768 hidden dimension - 12 attention heads - 11M parameters Metrics: Average (80.1), Squad v1.1 (89.3/82.3), Squad v2 (80.0/77.1), MNLI (81.6) SST-2 (90.3) RACE(64.0)
albert/albert-large-v1
Fill-Mask • Updated • 1.44k • 3Note This model has the following configuration: - 24 repeating layers - 128 embedding dimension - 1024 hidden dimension - 16 attention heads - 17M parameters Metrics: Average (82.4), Squad v1.1 (90.6/83.9), Squad v2 (82.3/79.4), MNLI (83.5) SST-2 (91.7) RACE(68.5)
albert/albert-xlarge-v1
Fill-Mask • Updated • 1.21k • 4Note This model has the following configuration: - 24 repeating layers - 128 embedding dimension - 2048 hidden dimension - 16 attention heads - 58M parameters Metrics: Average (85.5), Squad v1.1 (92.5/86.1), Squad v2 (86.1/83.1), MNLI (86.4) SST-2 (92.4) RACE(74.8)
albert/albert-xxlarge-v1
Fill-Mask • Updated • 6.24k • 5Note This model has the following configuration: - 12 repeating layers - 128 embedding dimension - 4096 hidden dimension - 64 attention heads - 223M parameters Metrics: Average (91.0), Squad v1.1 (94.8/89.3), Squad v2 (90.2/87.4), MNLI (90.8) SST-2 (96.9) RACE(86.5)
albert/albert-base-v2
Fill-Mask • Updated • 4.48M • 116Note This model has the following configuration: - 12 repeating layers - 128 embedding dimension - 768 hidden dimension - 12 attention heads - 11M parameters Metrics: Average (82.3) Squad v1.1 (90.2/83.2) Squad v2 (82.1/79.3) MNLI (84.6) SST-2 (92.9) RACE (66.8)
albert/albert-large-v2
Fill-Mask • Updated • 16.7k • 17Note This model has the following configuration: - 24 repeating layers - 128 embedding dimension - 1024 hidden dimension - 16 attention heads - 17M parameters Metrics: Average (85.7) Squad v1.1 (91.8/85.2) Squad v2 (84.9/81.8) MNLI (86.5) SST-2 (94.9) RACE (75.2)
albert/albert-xlarge-v2
Fill-Mask • Updated • 6.11k • 9Note This model has the following configuration: - 24 repeating layers - 128 embedding dimension - 2048 hidden dimension - 16 attention heads - 58M parameters Metrics: Average (87.9) Squad v1.1 (92.9/86.4) Squad v2 (87.9/84.1) MNLI (87.9) SST-2 (95.4) RACE (80.7)
albert/albert-xxlarge-v2
Fill-Mask • Updated • 12.1k • 19Note This model has the following configuration: - 12 repeating layers - 128 embedding dimension - 4096 hidden dimension - 64 attention heads - 223M parameters Metrics: Average (90.9) Squad v1.1 (94.6/89.1) Squad v2 (89.8/86.9) MNLI (90.6) SST-2 (96.8) RACE (86.8)