modules #

Classes:

Name	Description
`BNNMixin`	Abstract mixin class turning a torch module into a Bayesian neural network module.
`Conv1d`	Applies a 1D convolution over an input signal composed of several input planes.
`Conv2d`	Applies a 2D convolution over an input signal composed of several input planes.
`Conv3d`	Applies a 3D convolution over an input signal composed of several input planes.
`Linear`	Applies an affine transformation to the input.
`MultiheadAttention`	Attention layer (with multiple heads).
`Sequential`	A sequential container for modules.
`SinusoidalPositionalEncoding`	Sinusoidal Positional Encoding.

Functions:

Name	Description
`batched_forward`	Call a torch.nn.Module on inputs with arbitrary many batch dimensions rather than

BNNMixin #

BNNMixin(
    parametrization: (
        Parametrization | None
    ) = MaximalUpdate(),
    *args,
    **kwargs
)

Bases: ABC

Abstract mixin class turning a torch module into a Bayesian neural network module.

Parameters:

Name	Type	Description	Default
`parametrization`	`Parametrization \| None`	The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module.	`MaximalUpdate()`

Methods:

Name	Description
`forward`	Forward pass of the module.
`parameters_and_lrs`	Get the parameters of the module and their learning rates for the chosen optimizer
`reset_parameters`	Reset the parameters of the module and set the parametrization of all children

Attributes:

Name	Type	Description
`parametrization`	`Parametrization`	Parametrization of the module.

parametrization #

parametrization: Parametrization

Parametrization of the module.

forward #

forward(
    input: Float[Tensor, "*sample batch *in_feature"],
    /,
    sample_shape: Size | None = Size([]),
    generator: Generator | None = None,
    input_contains_samples: bool = False,
    parameter_samples: (
        dict[str, Float[Tensor, "*sample parameter"]] | None
    ) = None,
) -> Float[Tensor, "*sample *batch *out_feature"]

Forward pass of the module.

Parameters:

Name	Type	Description	Default
`input`	`Float[Tensor, 'sample batch in_feature']`	Input tensor.	required
`sample_shape`	`Size \| None`	Shape of samples. If None, runs a forward pass with just the mean parameters.	`Size([])`
`generator`	`Generator \| None`	Random number generator.	`None`
`input_contains_samples`	`bool`	Whether the input already contains samples. If True, the input is assumed to have `len(sample_shape)` many leading dimensions containing input samples (typically outputs from previous layers).	`False`
`parameter_samples`	`dict[str, Float[Tensor, '*sample parameter']] \| None`	Dictionary of parameter samples. Used to pass sampled parameters to the module. Useful to jointly sample parameters of multiple layers.	`None`

parameters_and_lrs #

parameters_and_lrs(
    lr: float, optimizer: Literal["SGD", "Adam"]
) -> list[dict[str, Tensor | float]]

Get the parameters of the module and their learning rates for the chosen optimizer and the parametrization of the module.

Parameters:

Name	Type	Description	Default
`lr`	`float`	The global learning rate.	required
`optimizer`	`Literal['SGD', 'Adam']`	The optimizer being used.	required

reset_parameters #

reset_parameters() -> None

Reset the parameters of the module and set the parametrization of all children to the parametrization of the module.

This method should be implemented by subclasses to reset the parameters of the module.

Conv1d #

Conv1d(
    in_channels: int,
    out_channels: int,
    kernel_size: _size_1_t,
    stride: _size_1_t = 1,
    padding: str | _size_1_t = 0,
    dilation: _size_1_t = 1,
    groups: int = 1,
    bias: bool = True,
    padding_mode: str = "zeros",
    layer_type: Literal[
        "input", "hidden", "output"
    ] = "hidden",
    cov: FactorizedCovariance | None = None,
    parametrization: Parametrization = MaximalUpdate(),
    device: device | None = None,
    dtype: dtype | None = None,
)

Bases: _ConvNd

Applies a 1D convolution over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size :math:(N, C_{\text{in}}, L) and output :math:(N, C_{\text{out}}, L_{\text{out}}) can be precisely described as:

.. math:: \text{out}(N_i, C_{\text{out}j}) = \text{bias}(Cj}) + \sum, k) \star \text{input}(N_i, k)}^{C_{in} - 1} \text{weight}(C_{\text{out}_j

where :math:\star is the valid cross-correlation_ operator, :math:N is a batch size, :math:C denotes a number of channels, :math:L is a length of signal sequence.

Parameters:

Name	Type	Description	Default
`in_channels`	`int`	Number of channels in the input image.	required
`out_channels`	`int`	Number of channels produced by the convolution.	required
`kernel_size`	`_size_1_t`	Size of the convolving kernel.	required
`stride`	`_size_1_t`	Stride of the convolution.	`1`
`padding`	`str \| _size_1_t`	Padding added to both sides of the input.	`0`
`dilation`	`_size_1_t`	Spacing between kernel elements.	`1`
`groups`	`int`	Number of blocked connections from input channels to output channels.	`1`
`bias`	`bool`	If `True`, adds a learnable bias to the output.	`True`
`padding_mode`	`str`	`'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default is `'zeros'`.	`'zeros'`
`layer_type`	`Literal['input', 'hidden', 'output']`	Type of the layer. Can be one of "input", "hidden", or "output". Controls the initialization and learning rate scaling of the parameters.	`'hidden'`
`cov`	`FactorizedCovariance \| None`	The covariance of the parameters.	`None`
`parametrization`	`Parametrization`	The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module.	`MaximalUpdate()`
`device`	`device \| None`	The device on which to place the tensor.	`None`
`dtype`	`dtype \| None`	The desired data type of the returned tensor.	`None`

Methods:

Name	Description
`extra_repr`
`forward`
`parameters_and_lrs`
`reset_parameters`	Reset the parameters of the module.

Attributes:

Name	Type	Description
`bias`	`Parameter`
`dilation`
`groups`
`in_channels`
`kernel_size`
`layer_type`
`out_channels`
`output_padding`
`padding`
`padding_mode`
`parametrization`	`Parametrization`	Parametrization of the module.
`params`
`stride`
`transposed`
`weight`	`Parameter`

bias #

bias: Parameter

dilation #

dilation = dilation

groups #

groups = groups

in_channels #

in_channels = in_channels

kernel_size #

kernel_size = kernel_size

layer_type #

layer_type = layer_type

out_channels #

out_channels = out_channels

output_padding #

output_padding = output_padding

padding #

padding = padding

padding_mode #

padding_mode = padding_mode

parametrization #

parametrization: Parametrization

Parametrization of the module.

params #

params = ParameterDict(mean_param_dict)

stride #

stride = stride

transposed #

transposed = transposed

weight #

weight: Parameter

extra_repr #

extra_repr()

forward #

forward(
    input: Float[Tensor, "*sample batch *in_feature"],
    /,
    sample_shape: Size | None = Size([]),
    generator: Generator | None = None,
    input_contains_samples: bool = False,
    parameter_samples: (
        dict[str, Float[Tensor, "*sample parameter"]] | None
    ) = None,
) -> Float[
    Tensor, "*sample *batch out_channel *out_feature"
]

parameters_and_lrs #

parameters_and_lrs(
    lr: float, optimizer: Literal["SGD", "Adam"] = "SGD"
) -> list[dict[str, Tensor | float]]

reset_parameters #

reset_parameters() -> None

Reset the parameters of the module.

Conv2d #

Conv2d(
    in_channels: int,
    out_channels: int,
    kernel_size: _size_2_t,
    stride: _size_2_t = 1,
    padding: str | _size_2_t = 0,
    dilation: _size_2_t = 1,
    groups: int = 1,
    bias: bool = True,
    padding_mode: str = "zeros",
    layer_type: Literal[
        "input", "hidden", "output"
    ] = "hidden",
    cov: FactorizedCovariance | None = None,
    parametrization: Parametrization = MaximalUpdate(),
    device: device | None = None,
    dtype: dtype | None = None,
)

Bases: _ConvNd

Applies a 2D convolution over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size :math:(N, C_{\text{in}}, H, W) and output :math:(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}}) can be precisely described as:

.. math:: \text{out}(N_i, C_{\text{out}j}) = \text{bias}(Cj}) + \sum(N_i, k)}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input

where :math:\star is the valid 2D cross-correlation_ operator, :math:N is a batch size, :math:C denotes a number of channels, :math:H is a height of input planes in pixels, and :math:W is width in pixels.

Parameters:

Name	Type	Description	Default
`in_channels`	`int`	Number of channels in the input image.	required
`out_channels`	`int`	Number of channels produced by the convolution.	required
`kernel_size`	`_size_2_t`	Size of the convolving kernel.	required
`stride`	`_size_2_t`	Stride of the convolution.	`1`
`padding`	`str \| _size_2_t`	Padding added to all four sides of the input.	`0`
`dilation`	`_size_2_t`	Spacing between kernel elements.	`1`
`groups`	`int`	Number of blocked connections from input channels to output channels.	`1`
`bias`	`bool`	If `True`, adds a learnable bias to the output.	`True`
`padding_mode`	`str`	`'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default is `'zeros'`.	`'zeros'`
`layer_type`	`Literal['input', 'hidden', 'output']`	Type of the layer. Can be one of "input", "hidden", or "output". Controls the initialization and learning rate scaling of the parameters.	`'hidden'`
`cov`	`FactorizedCovariance \| None`	The covariance of the parameters.	`None`
`parametrization`	`Parametrization`	The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module.	`MaximalUpdate()`
`device`	`device \| None`	The device on which to place the tensor.	`None`
`dtype`	`dtype \| None`	The desired data type of the returned tensor.	`None`

Methods:

Name	Description
`extra_repr`
`forward`
`parameters_and_lrs`
`reset_parameters`	Reset the parameters of the module.

Attributes:

Name	Type	Description
`bias`	`Parameter`
`dilation`
`groups`
`in_channels`
`kernel_size`
`layer_type`
`out_channels`
`output_padding`
`padding`
`padding_mode`
`parametrization`	`Parametrization`	Parametrization of the module.
`params`
`stride`
`transposed`
`weight`	`Parameter`

bias #

bias: Parameter

dilation #

dilation = dilation

groups #

groups = groups

in_channels #

in_channels = in_channels

kernel_size #

kernel_size = kernel_size

layer_type #

layer_type = layer_type

out_channels #

out_channels = out_channels

output_padding #

output_padding = output_padding

padding #

padding = padding

padding_mode #

padding_mode = padding_mode

parametrization #

parametrization: Parametrization

Parametrization of the module.

params #

params = ParameterDict(mean_param_dict)

stride #

stride = stride

transposed #

transposed = transposed

weight #

weight: Parameter

extra_repr #

extra_repr()

forward #

forward(
    input: Float[Tensor, "*sample batch *in_feature"],
    /,
    sample_shape: Size | None = Size([]),
    generator: Generator | None = None,
    input_contains_samples: bool = False,
    parameter_samples: (
        dict[str, Float[Tensor, "*sample parameter"]] | None
    ) = None,
) -> Float[
    Tensor, "*sample *batch out_channel *out_feature"
]

parameters_and_lrs #

parameters_and_lrs(
    lr: float, optimizer: Literal["SGD", "Adam"] = "SGD"
) -> list[dict[str, Tensor | float]]

reset_parameters #

reset_parameters() -> None

Reset the parameters of the module.

Conv3d #

Conv3d(
    in_channels: int,
    out_channels: int,
    kernel_size: _size_3_t,
    stride: _size_3_t = 1,
    padding: str | _size_3_t = 0,
    dilation: _size_3_t = 1,
    groups: int = 1,
    bias: bool = True,
    padding_mode: str = "zeros",
    layer_type: Literal[
        "input", "hidden", "output"
    ] = "hidden",
    cov: FactorizedCovariance | None = None,
    parametrization: Parametrization = MaximalUpdate(),
    device: device | None = None,
    dtype: dtype | None = None,
)

Bases: _ConvNd

Applies a 3D convolution over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size :math:(N, C_{in}, D, H, W) and output :math:(N, C_{out}, D_{out}, H_{out}, W_{out}) can be precisely described as:

.. math:: out(N_i, C_{out_j}) = bias(C_{out_j}) + \sum_{k = 0}^{C_{in} - 1} weight(C_{out_j}, k) \star input(N_i, k)

where :math:\star is the valid 3D cross-correlation_ operator

Parameters:

Name	Type	Description	Default
`in_channels`	`int`	Number of channels in the input image.	required
`out_channels`	`int`	Number of channels produced by the convolution.	required
`kernel_size`	`_size_3_t`	Size of the convolving kernel.	required
`stride`	`_size_3_t`	Stride of the convolution.	`1`
`padding`	`str \| _size_3_t`	Padding added to all six sides of the input.	`0`
`dilation`	`_size_3_t`	Spacing between kernel elements.	`1`
`groups`	`int`	Number of blocked connections from input channels to output channels.	`1`
`bias`	`bool`	If `True`, adds a learnable bias to the output.	`True`
`padding_mode`	`str`	`'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default is `'zeros'`.	`'zeros'`
`layer_type`	`Literal['input', 'hidden', 'output']`	Type of the layer. Can be one of "input", "hidden", or "output". Controls the initialization and learning rate scaling of the parameters.	`'hidden'`
`cov`	`FactorizedCovariance \| None`	The covariance of the parameters.	`None`
`parametrization`	`Parametrization`	The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module.	`MaximalUpdate()`
`device`	`device \| None`	The device on which to place the tensor.	`None`
`dtype`	`dtype \| None`	The desired data type of the returned tensor.	`None`

Methods:

Name	Description
`extra_repr`
`forward`
`parameters_and_lrs`
`reset_parameters`	Reset the parameters of the module.

Attributes:

Name	Type	Description
`bias`	`Parameter`
`dilation`
`groups`
`in_channels`
`kernel_size`
`layer_type`
`out_channels`
`output_padding`
`padding`
`padding_mode`
`parametrization`	`Parametrization`	Parametrization of the module.
`params`
`stride`
`transposed`
`weight`	`Parameter`

bias #

bias: Parameter

dilation #

dilation = dilation

groups #

groups = groups

in_channels #

in_channels = in_channels

kernel_size #

kernel_size = kernel_size

layer_type #

layer_type = layer_type

out_channels #

out_channels = out_channels

output_padding #

output_padding = output_padding

padding #

padding = padding

padding_mode #

padding_mode = padding_mode

parametrization #

parametrization: Parametrization

Parametrization of the module.

params #

params = ParameterDict(mean_param_dict)

stride #

stride = stride

transposed #

transposed = transposed

weight #

weight: Parameter

extra_repr #

extra_repr()

forward #

forward(
    input: Float[Tensor, "*sample batch *in_feature"],
    /,
    sample_shape: Size | None = Size([]),
    generator: Generator | None = None,
    input_contains_samples: bool = False,
    parameter_samples: (
        dict[str, Float[Tensor, "*sample parameter"]] | None
    ) = None,
) -> Float[
    Tensor, "*sample *batch out_channel *out_feature"
]

parameters_and_lrs #

parameters_and_lrs(
    lr: float, optimizer: Literal["SGD", "Adam"] = "SGD"
) -> list[dict[str, Tensor | float]]

reset_parameters #

reset_parameters() -> None

Reset the parameters of the module.

Linear #

Linear(
    in_features: int,
    out_features: int,
    bias: bool = True,
    layer_type: Literal[
        "input", "hidden", "output"
    ] = "hidden",
    cov: FactorizedCovariance | None = None,
    parametrization: Parametrization = MaximalUpdate(),
    device: device | None = None,
    dtype: dtype | None = None,
)

Bases: BNNMixin, Module

Applies an affine transformation to the input.

Parameters:

Name	Type	Description	Default
`in_features`	`int`	Size of each input sample.	required
`out_features`	`int`	Size of each output sample.	required
`bias`	`bool`	If set to `False`, the layer will not learn an additive bias.	`True`
`layer_type`	`Literal['input', 'hidden', 'output']`	Type of the layer. Can be one of "input", "hidden", or "output". Controls the initialization and learning rate scaling of the parameters.	`'hidden'`
`cov`	`FactorizedCovariance \| None`	Covariance object for the parameters.	`None`
`parametrization`	`Parametrization`	The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module.	`MaximalUpdate()`
`device`	`device \| None`	Device on which to instantiate the parameters.	`None`
`dtype`	`dtype \| None`	Data type of the parameters.	`None`

Methods:

Name	Description
`extra_repr`
`forward`
`parameters_and_lrs`
`reset_parameters`	Reset the parameters of the module.

Attributes:

Name	Type	Description
`bias`	`Parameter`
`in_features`
`layer_type`
`out_features`
`parametrization`	`Parametrization`	Parametrization of the module.
`params`
`weight`	`Parameter`

bias #

bias: Parameter

in_features #

in_features = in_features

layer_type #

layer_type = layer_type

out_features #

out_features = out_features

parametrization #

parametrization: Parametrization

Parametrization of the module.

params #

params = ParameterDict(mean_param_dict)

weight #

weight: Parameter

extra_repr #

extra_repr() -> str

forward #

forward(
    input: Float[Tensor, "*sample *batch in_feature"],
    /,
    sample_shape: Size | None = Size([]),
    generator: Generator | None = None,
    input_contains_samples: bool = False,
    parameter_samples: (
        dict[str, Float[Tensor, "*sample parameter"]] | None
    ) = None,
) -> Float[Tensor, "*sample *batch out_feature"]

parameters_and_lrs #

parameters_and_lrs(
    lr: float, optimizer: Literal["SGD", "Adam"] = "SGD"
) -> list[dict[str, Tensor | float]]

reset_parameters #

reset_parameters() -> None

Reset the parameters of the module.

MultiheadAttention #

MultiheadAttention(
    embed_dim: int,
    num_heads: int,
    dropout: float = 0.0,
    bias: bool = True,
    kdim: int | None = None,
    vdim: int | None = None,
    embed_dim_out: int | None = None,
    out_proj: bool = True,
    cov: (
        FactorizedCovariance
        | dict[FactorizedCovariance]
        | None
    ) = None,
    parametrization: Parametrization = MaximalUpdate(),
    device: device | None = None,
    dtype: dtype | None = None,
)

Bases: BNNMixin, Module

Attention layer (with multiple heads).

Multi-head (self-)attention layer with an optional attention mask, allowing a model to jointly attend to information from different representation subspaces. Consists of num_heads scaled dot-product attention modules, whose outputs are concatenated and then combined into an output sequence via a linear layer.

The module supports nested or padded tensors and is inspired by the following implementation.

Parameters:

Name	Type	Description	Default
`embed_dim`	`int`	Dimensionality of the inputs and outputs to the layer (i.e. dimensionality of the query embeddings).	required
`num_heads`	`int`	Number of attention heads.	required
`dropout`	`float`	Dropout probability; if greater than 0.0, dropout is applied.	`0.0`
`bias`	`bool`	Whether to add bias to query, key, value and output projections.	`True`
`kdim`	`int \| None`	Dimensionality of the key embeddings.	`None`
`vdim`	`int \| None`	Dimensionality of the value embeddings.	`None`
`embed_dim_out`	`int \| None`	Dimensionality of the output embeddings. If `None`, set to `embed_dim`.	`None`
`out_proj`	`bool`	Whether to include the output projection layer.	`True`
`cov`	`FactorizedCovariance \| dict[FactorizedCovariance] \| None`	Covariance structure of the weights. Either a single covariance structure used in all linear projections, or a dictionary with keys `k`, `q`, `v` and `out` and values containing either covariance structures or `None`.	`None`
`parametrization`	`Parametrization`	The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module.	`MaximalUpdate()`
`device`	`device \| None`	Device on which to instantiate the parameters.	`None`
`dtype`	`dtype \| None`	Data type of the parameters.	`None`

Methods:

Name	Description
`forward`	Computes scaled dot product attention on query, key and value tensors, using an optional attention mask.
`parameters_and_lrs`	Get the parameters of the module and their learning rates for the chosen optimizer
`reset_parameters`	Reset the parameters of the module and set the parametrization of all children

Attributes:

Name	Type	Description
`bias`
`dropout`
`embed_dim`
`embed_dim_out`
`head_dim`
`k_proj`
`kdim`
`num_heads`
`out_proj`
`parametrization`	`Parametrization`	Parametrization of the module.
`q_proj`
`v_proj`
`vdim`

bias #

bias = bias

dropout #

dropout = dropout

embed_dim #

embed_dim = embed_dim

embed_dim_out #

embed_dim_out = (
    embed_dim_out
    if embed_dim_out is not None
    else embed_dim
)

head_dim #

head_dim = embed_dim // num_heads

k_proj #

k_proj = Linear(
    kdim,
    embed_dim,
    bias=bias,
    cov=cov["k"],
    parametrization=parametrization,
    **factory_kwargs
)

kdim #

kdim = kdim if kdim is not None else embed_dim

num_heads #

num_heads = num_heads

out_proj #

out_proj = (
    Linear(
        embed_dim,
        embed_dim_out,
        bias=bias,
        cov=cov["out"],
        parametrization=parametrization,
        **factory_kwargs
    )
    if out_proj
    else None
)

parametrization #

parametrization: Parametrization

Parametrization of the module.

q_proj #

q_proj = Linear(
    embed_dim,
    embed_dim,
    bias=bias,
    cov=cov["q"],
    parametrization=parametrization,
    **factory_kwargs
)

v_proj #

v_proj = Linear(
    vdim,
    embed_dim,
    bias=bias,
    cov=cov["v"],
    parametrization=parametrization,
    **factory_kwargs
)

vdim #

vdim = vdim if vdim is not None else embed_dim

forward #

forward(
    query: Float[
        Tensor, "*sample batch query_token embed_dim"
    ],
    key: (
        Float[
            Tensor, "*sample batch keyval_token embed_dim_k"
        ]
        | None
    ),
    value: (
        Float[Tensor, "*sample batch token embed_dim_v"]
        | None
    ),
    attn_mask: (
        Float[Tensor, "batch query_token keyval_token"]
        | None
    ) = None,
    is_causal: bool = False,
    sample_shape: Size | None = Size([]),
    generator: Generator | None = None,
    input_contains_samples: bool = False,
    parameter_samples: (
        dict[str, Float[Tensor, "*sample parameter"]] | None
    ) = None,
) -> Float[Tensor, "*sample batch query_token embed_dim"]

Computes scaled dot product attention on query, key and value tensors, using an optional attention mask.

Parameters:

Name	Type	Description	Default
`query`	`Float[Tensor, '*sample batch query_token embed_dim']`	Query tensor / embeddings.	required
`key`	`Float[Tensor, '*sample batch keyval_token embed_dim_k'] \| None`	Key tensor / embeddings.	required
`value`	`Float[Tensor, '*sample batch token embed_dim_v'] \| None`	Value tensor / embeddings.	required
`attn_mask`	`Float[Tensor, 'batch query_token keyval_token'] \| None`	Attention mask; shape must be broadcastable to the shape of attention weights. Two types of masks are supported. A boolean mask where a value of True indicates that the element should take part in attention. A float mask of the same type as query, key, value that is added to the attention score.	`None`
`is_causal`	`bool`	If set to true, the attention masking is a lower triangular matrix when the mask is a square matrix. The attention masking has the form of the upper left causal bias due to the alignment (see `torch.nn.attention.bias.CausalBias`) when the mask is a non-square matrix. An error is thrown if both `attn_mask` and `is_causal` are set.	`False`
`sample_shape`	`Size \| None`	Shape of samples. If None, runs a forward pass with just the mean parameters.	`Size([])`
`generator`	`Generator \| None`	Random number generator.	`None`
`input_contains_samples`	`bool`	Whether the input already contains samples. If True, the input is assumed to have `len(sample_shape)` many leading dimensions containing input samples (typically outputs from previous layers).	`False`
`parameter_samples`	`dict[str, Float[Tensor, '*sample parameter']] \| None`	Dictionary of parameter samples. Used to pass sampled parameters to the module. Useful to jointly sample parameters of multiple layers.	`None`

parameters_and_lrs #

parameters_and_lrs(
    lr: float, optimizer: Literal["SGD", "Adam"]
) -> list[dict[str, Tensor | float]]

Get the parameters of the module and their learning rates for the chosen optimizer and the parametrization of the module.

Parameters:

Name	Type	Description	Default
`lr`	`float`	The global learning rate.	required
`optimizer`	`Literal['SGD', 'Adam']`	The optimizer being used.	required

reset_parameters #

reset_parameters() -> None

Reset the parameters of the module and set the parametrization of all children to the parametrization of the module.

This method should be implemented by subclasses to reset the parameters of the module.

Sequential #

Sequential(
    *args: BNNMixin | Module,
    parametrization: Parametrization | None = None
)

Sequential(
    arg: OrderedDict[str, BNNMixin | Module],
    parametrization: Parametrization | None = None,
)

Sequential(
    *args, parametrization: Parametrization | None = None
)

Bases: BNNMixin, Sequential

A sequential container for modules.

Modules will be added to it in the order they are passed in the constructor. Alternatively, an OrderedDict of modules can be passed in. The forward() method of Sequential accepts any input and forwards it to the first module it contains. It then "chains" outputs to inputs sequentially for each subsequent module, finally returning the output of the last module.

The value a Sequential provides over manually calling a sequence of modules is that it allows treating the whole container as a single module, such that performing a transformation on the Sequential applies to each of the modules it stores (which are each a registered submodule of the Sequential)

Parameters:

Name	Type	Description	Default
`*args`		Any number of modules to add to the container.	`()`
`parametrization`	`Parametrization \| None`	The parametrization to use. If `None`, the parametrization of the modules in the container will be used. If a `Parametrization` object is passed, it will be used for all modules in the container.	`None`

Methods:

Name	Description
`forward`
`parameters_and_lrs`	Get the parameters of the module and their learning rates for the chosen optimizer
`reset_parameters`	Reset the parameters of the module and set the parametrization of all children

Attributes:

Name	Type	Description
`parametrization`		Parametrization of the module.

parametrization #

parametrization = parametrization

Parametrization of the module.

forward #

forward(
    input: Float[Tensor, "*batch in_feature"],
    /,
    sample_shape: Size | None = Size([]),
    generator: Generator | None = None,
    input_contains_samples: bool = False,
    parameter_samples: (
        dict[str, Float[Tensor, "*sample parameter"]] | None
    ) = None,
) -> Float[Tensor, "*sample *batch out_feature"]

parameters_and_lrs #

parameters_and_lrs(
    lr: float, optimizer: Literal["SGD", "Adam"]
) -> list[dict[str, Tensor | float]]

Get the parameters of the module and their learning rates for the chosen optimizer and the parametrization of the module.

Parameters:

Name	Type	Description	Default
`lr`	`float`	The global learning rate.	required
`optimizer`	`Literal['SGD', 'Adam']`	The optimizer being used.	required

reset_parameters #

reset_parameters() -> None

Reset the parameters of the module and set the parametrization of all children to the parametrization of the module.

This method should be implemented by subclasses to reset the parameters of the module.

SinusoidalPositionalEncoding #

SinusoidalPositionalEncoding(
    embed_dim: int,
    max_seq_len: int = 4096,
    base: int = 10000,
    device: device | None = None,
    dtype: dtype | None = None,
)

Bases: Module

Sinusoidal Positional Encoding.

This module adds fixed sinusoidal positional embeddings (Vaswani et al., 2017; Sec. 3.5) to input embeddings to provide the model with information about the relative or absolute position of tokens in a sequence.

The sinusoidal positional encoding uses sine and cosine functions of different frequencies:

\[ \begin{align*} \operatorname{PE}(\text{pos}, 2i) &= \sin(\text{base}^{-\frac{2i}{\text{embed\_dim}}} \cdot \text{pos}) \\ \operatorname{PE}(\text{pos}, 2i+1) &= \cos(\text{base}^{-\frac{2i}{\text{embed\_dim}}} \cdot \text{pos}) \end{align*} \]

where \(\text{pos}\) is the position index, \(0\leq i \leq \frac{\text{embed\_dim}{2}\) is the embedding dimension index and \(\text{embed\_dim}\) is the embedding dimensionality.

The encoding is designed so that each dimension of the positional encoding corresponds to a sinusoid with wavelengths forming a geometric progression from 2π to base·2π. This allows the model to easily learn to attend by relative positions.

Notes:

The positional encodings are added to the input embeddings, so both must have the same embedding dimension (embed_dim).
If the input sequence length exceeds max_seq_len, the encoding will be truncated to match the input length, which may cause issues. Ensure max_seq_len ≥ expected sequence lengths.
The encoding is deterministic and does not require training.

import torch
from inferno.bnn.modules import SinusoidalPositionalEncoding

# Create positional encoding for 512-dim embeddings
pos_enc = SinusoidalPositionalEncoding(embed_dim=512)

# Apply to input embeddings
batch_size, seq_len, embed_dim = 32, 100, 512
input_embeddings = torch.randn(batch_size, seq_len, embed_dim)
output = pos_enc(input_embeddings)
print(output.shape)  # torch.Size([32, 100, 512])

Parameters:

Name	Type	Description	Default
`embed_dim`	`int`	The embedding dimension. Should be even for proper sine/cosine pairing.	required
`max_seq_len`	`int`	Maximum sequence length to precompute encodings for.	`4096`
`base`	`int`	Base of the angular frequency.	`10000`
`dtype`	`dtype \| None`	Data type for the positional encodings.	`None`
`device`	`device \| None`	Device to place the positional encodings on.	`None`

Methods:

Name	Description
`forward`	Add positional encoding to input embeddings.

Attributes:

Name	Type	Description
`base`
`max_seq_len`

base #

base = base

max_seq_len #

max_seq_len = max_seq_len

forward #

forward(
    x: Float[Tensor, "batch token embed_dim"],
) -> Float[Tensor, "batch token embed_dim"]

Add positional encoding to input embeddings.

Parameters:

Name	Type	Description	Default
`x`	`Float[Tensor, 'batch token embed_dim']`	Input embeddings.	required

Returns:

Type	Description
`Float[Tensor, 'batch token embed_dim']`	Input embeddings with added positional encoding of the same shape.

batched_forward #

batched_forward(
    obj: Module, num_batch_dims: int
) -> Callable[
    [Float[Tensor, "*sample batch *in_feature"]],
    Float[Tensor, "*sample batch *out_feature"],
]

Call a torch.nn.Module on inputs with arbitrary many batch dimensions rather than just a single one.

This is useful to extend the functionality of a torch.nn.Module to work with arbitrary many batch dimensions, for example arbitrary many sampling dimensions.

Parameters:

Name	Type	Description	Default
`obj`	`Module`	The torch.nn.Module to call.	required
`num_batch_dims`	`int`	The number of batch dimensions.	required