[Cache Request] meta-llama/Meta-Llama-3-70B-Instruct-8192tokens
#169
by
harmeet03
- opened
I need to deploy llama3 model on infrentia2 machine that can take tokens upto 8192 length. The existing model has max_token length as 4096.
Can you please compile and upload model with:
{
'task': 'text-generation',
'batch_size': 4,
'num_cores': 24,
'auto_cast_type': 'fp16',
'sequence_length': 8192,
'compiler_type': 'neuronx-cc'
}
Any update on this?