``` # code https://huggingface.co/vincentoh/llama3_70b_no_robot_fsdp_qlora # model wget "https://huggingface.co/vincentoh/llama3-70b-GGUF/blob/main/vincentoh/llama3-70b-GGUF" # memory usage Thu May 16 15:53:07 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA H100 PCIe On | 00000000:08:00.0 Off | 0 | | N/A 37C P0 76W / 350W | 40441MiB / 81559MiB | 24% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 17735 C ./main 40428MiB | +---------------------------------------------------------------------------------------+ # token speed <|begin_of_text|>Why is the sky blue? The sky is blue due to a phenomenon called Rayleigh scattering. This scattering refers to the scattering of electromagnetic radiation (light) by particles much smaller than the wavelength of the light. The short-wavelength blue light is scattered more than the other colors of visible light, resulting in more blue light reaching the observer than the other colors of light.<|end_of_text|> [end of text] llama_print_timings: load time = 6244.37 ms llama_print_timings: sample time = 4.39 ms / 69 runs ( 0.06 ms per token, 15710.38 tokens per second) llama_print_timings: prompt eval time = 90.86 ms / 7 tokens ( 12.98 ms per token, 77.05 tokens per second) llama_print_timings: eval time = 2334.73 ms / 68 runs ( 34.33 ms per token, 29.13 tokens per second) llama_print_timings: total time = 2486.72 ms / 75 tokens Log end ```