llama3¶
Llama3 standalone implementation inspired from https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/07_gpt_to_llama/standalone-llama32.ipynb Explanations on grouped query attention: https://www.ibm.com/think/topics/grouped-query-attention.
- class mfai.pytorch.models.llms.llama3.GroupedQueryAttention(d_in, d_out, num_heads, num_kv_groups)[source]¶
Bases:
Module- forward(x, mask, cos, sin)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mfai.pytorch.models.llms.llama3.Llama3(settings, vocab_size=32000)[source]¶
Bases:
Module- Parameters:
settings (Llama3Settings)
vocab_size (int)
- forward(tok_ids)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- forward_vectors(embeddings, first_embedding=None)[source]¶
Process a batch of embeddings through the model. If first_embedding is supplied the first tokens of each blocks are replaced by the corresponding embeddings. Useful for multimodal models with injection of vision data embeddings at each stage.
- model_type = 4¶
- settings_kls¶
alias of
Llama3Settings
- class mfai.pytorch.models.llms.llama3.Llama3Settings(emb_dim=256, context_length=512, n_heads=8, n_layers=8, hidden_dim=768, num_kv_groups=2, rope_base=500000.0)[source]¶
Bases:
object- Parameters:
- classmethod from_dict(kvs, *, infer_missing=False)¶
- classmethod from_json(s, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw)¶
- classmethod schema(*, infer_missing=False, only=None, exclude=(), many=False, context=None, load_only=(), dump_only=(), partial=False, unknown=None)¶
- to_json(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, indent=None, separators=None, default=None, sort_keys=False, **kw)¶
- class mfai.pytorch.models.llms.llama3.TransformerBlock(emb_dim, hidden_dim, num_heads, num_kv_groups, dtype=None)[source]¶
Bases:
Module- forward(x, mask, cos, sin)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.