Spaces:
Build error
Build error
In the context of LLMs, what is "Attention"? | |
In the context of LLMs, what is a completion? | |
In the context of LLMs, what is a prompt? | |
In the context of LLMs, what is GELU? | |
In the context of LLMs, what is RELU? | |
In the context of LLMs, what is softmax? | |
In the context of LLMs, what is decoding? | |
In the context of LLMs, what is encoding? | |
In the context of LLMs, what is tokenizing? | |
In the context of LLMs, what is an embedding? | |
In the context of LLMs, what is quantization? | |
In the context of LLMs, what is a tensor? | |
In the context of LLMs, what is a sparse tensor? | |
In the context of LLMs, what is a vector? | |
In the context of LLMs, how is attention implemented? | |
In the context of LLMs, why is attention all you need? | |
In the context of LLMs, what is "RoPe" and what is it used for? | |
In the context of LLMs, what is "LoRA" and what is it used for? | |
In the context of LLMs, what are weights? | |
In the context of LLMs, what are biases? | |
In the context of LLMs, what are checkpoints? | |
In the context of LLMs, what is "perplexity"? | |
In the context of LLMs, what are models? | |
In the context of machine-learning, what is "catastrophic forgetting"? | |
In the context of machine-learning, what is "elastic weight consolidation (EWC)"? | |
In the context of neural nets, what is a hidden layer? | |
In the context of neural nets, what is a convolution? | |
In the context of neural nets, what is dropout? | |
In the context of neural nets, what is cross-entropy? | |
In the context of neural nets, what is over-fitting? | |
In the context of neural nets, what is under-fitting? | |
What is the difference between an interpreted computer language and a compiled computer language? | |
In the context of software development, what is a debugger? | |
When processing using a GPU, what is off-loading? | |
When processing using a GPU, what is a batch? | |
When processing using a GPU, what is a block? | |
When processing using a GPU, what is the difference between a batch and a block? | |
When processing using a GPU, what is a scratch tensor? | |
When processing using a GPU, what is a layer? | |
When processing using a GPU, what is a cache? | |
When processing using a GPU, what is unified memory? | |
When processing using a GPU, what is VRAM? | |
When processing using a GPU, what is a kernel? | |
When processing using a GPU, what is "metal"? | |
In the context of LLMs, what are "Zero-Shot", "One-Shot" and "Few-Shot" learning models? | |
In the context of LLMs, what is the "Transformer-model" architecture? | |
In the context of LLMs, what is "Multi-Head Attention"? | |
In the context of LLMs, what is "Self-Attention"? | |
In the context of transformer-model architectures, how do attention mechanisms use masks? |