Factorized the weight matrix in the GlobalAttentionPoolingHead, thus reducing the number of parameters in this layer by a factor of 48 a1e9f64 PeteBleackley commited on Mar 11, 2024
Making sure RoBERTa layers have all required arguments b2593fa PeteBleackley commited on Sep 25, 2023
tensorflow.vectorized_map might not like getting function arguments in a tuple 3f78694 PeteBleackley commited on Sep 24, 2023