gpt2¶
Pytorch implementation of GPT-2. It is widely inspired by Sebastian Raschka’s book and work https://github.com/rasbt/LLMs-from-scratch/.
- class mfai.pytorch.models.llms.gpt2.CrossAttentionGPT2(settings, vocab_size=50257)[source]¶
Bases:
ModuleA GPT2 with cross attention to allow vision/weather data injection as key/values into some of the transformer block. Freely inspired by Llama3.2 as described here : https://magazine.sebastianraschka.com/i/151078631/the-llama-herd-of-models.
- Parameters:
settings (CrossAttentionGPT2Settings)
vocab_size (int)
- forward(token_ids, vision_inputs)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- model_type = 4¶
- settings_kls¶
alias of
CrossAttentionGPT2Settings
- class mfai.pytorch.models.llms.gpt2.CrossAttentionGPT2Settings(emb_dim=768, context_length=1024, n_heads=12, n_layers=12, drop_rate=0.1, qkv_bias=False, model_size='124M', attn_tf_compat=False, x_att_ratio=4)[source]¶
Bases:
GPT2Settings- Parameters:
- classmethod from_dict(kvs, *, infer_missing=False)¶
- classmethod from_json(s, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw)¶
- classmethod schema(*, infer_missing=False, only=None, exclude=(), many=False, context=None, load_only=(), dump_only=(), partial=False, unknown=None)¶
- to_json(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, indent=None, separators=None, default=None, sort_keys=False, **kw)¶
- class mfai.pytorch.models.llms.gpt2.CrossAttentionTransformerBlock(settings)[source]¶
Bases:
ModuleA cross attention transformer block.
- Parameters:
settings (CrossAttentionGPT2Settings)
- forward(x_q, x_kv)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mfai.pytorch.models.llms.gpt2.FeedForward(emb_dim)[source]¶
Bases:
Module- Parameters:
emb_dim (int)
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mfai.pytorch.models.llms.gpt2.GELU[source]¶
Bases:
Module- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mfai.pytorch.models.llms.gpt2.GPT2(settings, vocab_size=50257)[source]¶
Bases:
ModuleGPT implementation - Based on Sebastian Raschka’s book and github repo :
- Parameters:
settings (GPT2Settings)
vocab_size (int)
- dowload_weights_from_tf_ckpt(model_dir)[source]¶
Downloads a tensorflow checkpoint into model_dir and sets the weights of self.
- forward(tok_ids)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- forward_vectors(embeddings, first_embedding=None)[source]¶
Process a batch of embeddings through the model. If first_embedding is supplied the first tokens of each blocks are replaced by the corresponding embeddings. Useful for multimodal models with injection of vision data at each stage.
- load_weights_from_dict(params)[source]¶
Loads weights into self using a dict likely coming from a tensorflow or other framework training. Use this to finetune from the official weights.
- Parameters:
params (dict)
- model_type = 4¶
- settings_kls¶
alias of
GPT2Settings
- class mfai.pytorch.models.llms.gpt2.GPT2Settings(emb_dim=768, context_length=1024, n_heads=12, n_layers=12, drop_rate=0.1, qkv_bias=False, model_size='124M', attn_tf_compat=False)[source]¶
Bases:
objectdefault settings correspond to a GPT2 small ‘124M’.
- Parameters:
- classmethod from_dict(kvs, *, infer_missing=False)¶
- classmethod from_json(s, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw)¶
- classmethod schema(*, infer_missing=False, only=None, exclude=(), many=False, context=None, load_only=(), dump_only=(), partial=False, unknown=None)¶
- to_json(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, indent=None, separators=None, default=None, sort_keys=False, **kw)¶
- class mfai.pytorch.models.llms.gpt2.LayerNorm(emb_dim)[source]¶
Bases:
Module- Parameters:
emb_dim (int)
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mfai.pytorch.models.llms.gpt2.MultiHeadAttention(d_in, d_out, num_heads, context_length, dropout=0.0, qkv_bias=False)[source]¶
Bases:
ModuleMultiHead Attention compatible with tensorflow original implementation and weigths.
- Parameters:
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mfai.pytorch.models.llms.gpt2.MultiHeadAttentionPySDPA(d_in, d_out, num_heads, context_length, dropout=0.0, qkv_bias=False)[source]¶
Bases:
ModuleMutli Head Attention using Pytorch’s scaled_dot_product_attention.
- Parameters:
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mfai.pytorch.models.llms.gpt2.MultiHeadCrossAttentionPySDPA(d_in_q, d_in_kv, d_out, num_heads, context_length, dropout=0.0, qkv_bias=False)[source]¶
Bases:
ModuleMutli Head Cross Attention using Pytorch’s scaled_dot_product_attention The query and key/values are from different sources.
- Parameters:
- forward(x_q, x_kv)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mfai.pytorch.models.llms.gpt2.TransformerBlock(settings)[source]¶
Bases:
ModuleA transformer block - Based on Sebastian Raschka’s book and github repo : https://github.com/rasbt/LLMs-from-scratch/.
Attention used is based on pytorch’s scaled_dot_product_attention
( Most efficient MultiHeadAttention module accodring S.Raschka’s benchmark https://github.com/rasbt/LLMs-from-scratch/tree/main/ch03/02_bonus_efficient-multihead-attention ).
- Parameters:
settings (GPT2Settings)
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses. :rtype:
TensorNote
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.