PULI-HuBA 130M

PULI-HuBA 130M is a monolingual Hungarian foundation model based on the Mamba configuration. (https://huggingface.co/state-spaces/mamba-130m-hf)

Parameters: MambaForCausalLM( (backbone): MambaModel( (embeddings): Embedding(52000, 768) (layers): ModuleList( (0-23): 24 x MambaBlock( (norm): MambaRMSNorm(768, eps=1e-05) (mixer): MambaMixer( (conv1d): Conv1d(1536, 1536, kernel_size=(4,), stride=(1,), padding=(3,), groups=1536) (act): SiLU() (in_proj): Linear(in_features=768, out_features=3072, bias=False) (x_proj): Linear(in_features=1536, out_features=80, bias=False) (dt_proj): Linear(in_features=48, out_features=1536, bias=True) (out_proj): Linear(in_features=1536, out_features=768, bias=False) ) ) ) (norm_f): MambaRMSNorm(768, eps=1e-05) ) (lm_head): Linear(in_features=768, out_features=52000, bias=False) )

Training Data (Pretraining)

The model was trained on a ~3.48B-token, toxic-filtered, deduplicated, and semantically segmented dataset.

Training Details

License: Apache 2.0
Hardware: 4 × NVIDIA A100 (80GB) GPUs
Year of training: 2024
Input/output: Text only
Parameter count: 130 million
Available model size: Single variant
Data type: float32
Batch size: 10 per GPU
Learning rate: 3e-4
    Reference: GitHub issue

Ethical Considerations

Concerns:

Potential for biased, incorrect, or harmful content generation.

Usage Example

To generate text using this model with Hugging Face's pipeline, use the following Python code:

from transformers import pipeline

# Load the model
model_name = "NYTK/PULI-HuBA130M" 

# Initialize the text generation pipeline
generator = pipeline("text-generation", model=model_name)

# Generate text with recommended parameters
output = generator(
    "Az a tény, hogy anyanyelvem magyar, és magyarul beszélek, gondolkozom, írok, életem legnagyobb eseménye, melyhez nincs fogható.",  # Example prompt in Hungarian
    max_length=156,
    do_sample=True,
    repetition_penalty=1.35,
    temperature=0.2,
    top_k=100,
    top_p=0.99,
    truncation=True
)

# Print the generated text
print(output[0]["generated_text"])

Contact

If you have any questions, please contact me: [email protected] or [email protected]

Downloads last month
18
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for NYTK/PULI-HuBA130M

Finetuned
(8)
this model