llama2¶
Pytorch implementation of Llama2. It is widely inspired by Sebastian Raschka’s book and work https://github.com/rasbt/LLMs-from-scratch/.
- class mfai.pytorch.models.llms.llama2.FeedForwardLlama2(emb_dim, hidden_dim, dtype=None)[source]¶
Bases:
Module- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mfai.pytorch.models.llms.llama2.Llama2(settings, vocab_size=32000)[source]¶
Bases:
ModuleLlama2 implementation - Based on Sebastian Raschka’s book and github repo :
- Parameters:
settings (Llama2Settings)
vocab_size (int)
- forward(tok_ids)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- forward_vectors(embeddings, first_embedding=None)[source]¶
Process a batch of embeddings through the model. If first_embedding is supplied the first tokens of each blocks are replaced by the corresponding embeddings. Useful for multimodal models with injection of vision data at each stage.
- model_type = 4¶
- settings_kls¶
alias of
Llama2Settings
- class mfai.pytorch.models.llms.llama2.Llama2Settings(emb_dim=256, context_length=512, n_heads=4, n_layers=4, hidden_dim=768)[source]¶
Bases:
object- classmethod from_dict(kvs, *, infer_missing=False)¶
- classmethod from_json(s, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw)¶
- classmethod schema(*, infer_missing=False, only=None, exclude=(), many=False, context=None, load_only=(), dump_only=(), partial=False, unknown=None)¶
- to_json(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, indent=None, separators=None, default=None, sort_keys=False, **kw)¶
- class mfai.pytorch.models.llms.llama2.MultiHeadAttentionPySDPALlama2(d_in, d_out, num_heads, context_length, dtype=None)[source]¶
Bases:
ModuleMutli Head Attention using Pytorch’s scaled_dot_product_attention.
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mfai.pytorch.models.llms.llama2.RMSNorm(emb_dim, eps=1e-05)[source]¶
Bases:
Module- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mfai.pytorch.models.llms.llama2.SiLU[source]¶
Bases:
Module- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mfai.pytorch.models.llms.llama2.TransformerBlockLlama2(settings)[source]¶
Bases:
ModuleA transformer block - Based on Sebastian Raschka’s book and github repo : https://github.com/rasbt/LLMs-from-scratch/.
Attention used is based on pytorch’s scaled_dot_product_attention
( Most efficient MultiHeadAttention module accodring S.Raschka’s benchmark https://github.com/rasbt/LLMs-from-scratch/tree/main/ch03/02_bonus_efficient-multihead-attention )
- Parameters:
settings (Llama2Settings)
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.