Skip to content

modules #

Classes:

Name Description
BNNMixin

Abstract mixin class turning a torch module into a Bayesian neural network module.

Conv1d

Applies a 1D convolution over an input signal composed of several input planes.

Conv2d

Applies a 2D convolution over an input signal composed of several input planes.

Conv3d

Applies a 3D convolution over an input signal composed of several input planes.

Linear

Applies an affine transformation to the input.

MultiheadAttention

Attention layer (with multiple heads).

Sequential

A sequential container for modules.

SinusoidalPositionalEncoding

Sinusoidal Positional Encoding.

Functions:

Name Description
batched_forward

Call a torch.nn.Module on inputs with arbitrary many batch dimensions rather than

BNNMixin #

BNNMixin(
    parametrization: (
        Parametrization | None
    ) = MaximalUpdate(),
    *args,
    **kwargs
)

Bases: ABC

Abstract mixin class turning a torch module into a Bayesian neural network module.

Parameters:

Name Type Description Default
parametrization Parametrization | None

The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module.

MaximalUpdate()

Methods:

Name Description
forward

Forward pass of the module.

parameters_and_lrs

Get the parameters of the module and their learning rates for the chosen optimizer

reset_parameters

Reset the parameters of the module and set the parametrization of all children

Attributes:

Name Type Description
parametrization Parametrization

Parametrization of the module.

parametrization #

parametrization: Parametrization

Parametrization of the module.

forward #

forward(
    input: Float[Tensor, "*sample batch *in_feature"],
    /,
    sample_shape: Size = Size([]),
    generator: Generator | None = None,
    input_contains_samples: bool = False,
    parameter_samples: (
        dict[str, Float[Tensor, "*sample parameter"]] | None
    ) = None,
) -> Float[Tensor, "*sample *batch *out_feature"]

Forward pass of the module.

Parameters:

Name Type Description Default
input Float[Tensor, '*sample batch *in_feature']

Input tensor.

required
sample_shape Size

Shape of samples.

Size([])
generator Generator | None

Random number generator.

None
input_contains_samples bool

Whether the input already contains samples. If True, the input is assumed to have len(sample_shape) many leading dimensions containing input samples (typically outputs from previous layers).

False
parameter_samples dict[str, Float[Tensor, '*sample parameter']] | None

Dictionary of parameter samples. Used to pass sampled parameters to the module. Useful to jointly sample parameters of multiple layers.

None

parameters_and_lrs #

parameters_and_lrs(
    lr: float, optimizer: Literal["SGD", "Adam"]
) -> list[dict[str, Tensor | float]]

Get the parameters of the module and their learning rates for the chosen optimizer and the parametrization of the module.

Parameters:

Name Type Description Default
lr float

The global learning rate.

required
optimizer Literal['SGD', 'Adam']

The optimizer being used.

required

reset_parameters #

reset_parameters() -> None

Reset the parameters of the module and set the parametrization of all children to the parametrization of the module.

This method should be implemented by subclasses to reset the parameters of the module.

Conv1d #

Conv1d(
    in_channels: int,
    out_channels: int,
    kernel_size: _size_1_t,
    stride: _size_1_t = 1,
    padding: str | _size_1_t = 0,
    dilation: _size_1_t = 1,
    groups: int = 1,
    bias: bool = True,
    padding_mode: str = "zeros",
    layer_type: Literal[
        "input", "hidden", "output"
    ] = "hidden",
    cov: FactorizedCovariance | None = None,
    parametrization: Parametrization = MaximalUpdate(),
    device: device | None = None,
    dtype: dtype | None = None,
)

Bases: _ConvNd

Applies a 1D convolution over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size :math:(N, C_{\text{in}}, L) and output :math:(N, C_{\text{out}}, L_{\text{out}}) can be precisely described as:

.. math:: \text{out}(N_i, C_{\text{out}j}) = \text{bias}(Cj}) + \sum, k) \star \text{input}(N_i, k)}^{C_{in} - 1} \text{weight}(C_{\text{out}_j

where :math:\star is the valid cross-correlation_ operator, :math:N is a batch size, :math:C denotes a number of channels, :math:L is a length of signal sequence.

Parameters:

Name Type Description Default
in_channels int

Number of channels in the input image.

required
out_channels int

Number of channels produced by the convolution.

required
kernel_size _size_1_t

Size of the convolving kernel.

required
stride _size_1_t

Stride of the convolution.

1
padding str | _size_1_t

Padding added to both sides of the input.

0
dilation _size_1_t

Spacing between kernel elements.

1
groups int

Number of blocked connections from input channels to output channels.

1
bias bool

If True, adds a learnable bias to the output.

True
padding_mode str

'zeros', 'reflect', 'replicate' or 'circular'. Default is 'zeros'.

'zeros'
layer_type Literal['input', 'hidden', 'output']

Type of the layer. Can be one of "input", "hidden", or "output". Controls the initialization and learning rate scaling of the parameters.

'hidden'
cov FactorizedCovariance | None

The covariance of the parameters.

None
parametrization Parametrization

The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module.

MaximalUpdate()
device device | None

The device on which to place the tensor.

None
dtype dtype | None

The desired data type of the returned tensor.

None

Methods:

Name Description
extra_repr
forward
parameters_and_lrs
reset_parameters

Reset the parameters of the module.

Attributes:

Name Type Description
bias Parameter
dilation
groups
in_channels
kernel_size
layer_type
out_channels
output_padding
padding
padding_mode
parametrization Parametrization

Parametrization of the module.

params
stride
transposed
weight Parameter

bias #

bias: Parameter

dilation #

dilation = dilation

groups #

groups = groups

in_channels #

in_channels = in_channels

kernel_size #

kernel_size = kernel_size

layer_type #

layer_type = layer_type

out_channels #

out_channels = out_channels

output_padding #

output_padding = output_padding

padding #

padding = padding

padding_mode #

padding_mode = padding_mode

parametrization #

parametrization: Parametrization

Parametrization of the module.

params #

params = ParameterDict(mean_param_dict)

stride #

stride = stride

transposed #

transposed = transposed

weight #

weight: Parameter

extra_repr #

extra_repr()

forward #

forward(
    input: Float[Tensor, "*sample batch *in_feature"],
    /,
    sample_shape: Size = Size([]),
    generator: Generator | None = None,
    input_contains_samples: bool = False,
    parameter_samples: (
        dict[str, Float[Tensor, "*sample parameter"]] | None
    ) = None,
) -> Float[
    Tensor, "*sample *batch out_channel *out_feature"
]

parameters_and_lrs #

parameters_and_lrs(
    lr: float, optimizer: Literal["SGD", "Adam"] = "SGD"
) -> list[dict[str, Tensor | float]]

reset_parameters #

reset_parameters() -> None

Reset the parameters of the module.

Conv2d #

Conv2d(
    in_channels: int,
    out_channels: int,
    kernel_size: _size_2_t,
    stride: _size_2_t = 1,
    padding: str | _size_2_t = 0,
    dilation: _size_2_t = 1,
    groups: int = 1,
    bias: bool = True,
    padding_mode: str = "zeros",
    layer_type: Literal[
        "input", "hidden", "output"
    ] = "hidden",
    cov: FactorizedCovariance | None = None,
    parametrization: Parametrization = MaximalUpdate(),
    device: device | None = None,
    dtype: dtype | None = None,
)

Bases: _ConvNd

Applies a 2D convolution over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size :math:(N, C_{\text{in}}, H, W) and output :math:(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}}) can be precisely described as:

.. math:: \text{out}(N_i, C_{\text{out}j}) = \text{bias}(Cj}) + \sum(N_i, k)}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input

where :math:\star is the valid 2D cross-correlation_ operator, :math:N is a batch size, :math:C denotes a number of channels, :math:H is a height of input planes in pixels, and :math:W is width in pixels.

Parameters:

Name Type Description Default
in_channels int

Number of channels in the input image.

required
out_channels int

Number of channels produced by the convolution.

required
kernel_size _size_2_t

Size of the convolving kernel.

required
stride _size_2_t

Stride of the convolution.

1
padding str | _size_2_t

Padding added to all four sides of the input.

0
dilation _size_2_t

Spacing between kernel elements.

1
groups int

Number of blocked connections from input channels to output channels.

1
bias bool

If True, adds a learnable bias to the output.

True
padding_mode str

'zeros', 'reflect', 'replicate' or 'circular'. Default is 'zeros'.

'zeros'
layer_type Literal['input', 'hidden', 'output']

Type of the layer. Can be one of "input", "hidden", or "output". Controls the initialization and learning rate scaling of the parameters.

'hidden'
cov FactorizedCovariance | None

The covariance of the parameters.

None
parametrization Parametrization

The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module.

MaximalUpdate()
device device | None

The device on which to place the tensor.

None
dtype dtype | None

The desired data type of the returned tensor.

None

Methods:

Name Description
extra_repr
forward
parameters_and_lrs
reset_parameters

Reset the parameters of the module.

Attributes:

Name Type Description
bias Parameter
dilation
groups
in_channels
kernel_size
layer_type
out_channels
output_padding
padding
padding_mode
parametrization Parametrization

Parametrization of the module.

params
stride
transposed
weight Parameter

bias #

bias: Parameter

dilation #

dilation = dilation

groups #

groups = groups

in_channels #

in_channels = in_channels

kernel_size #

kernel_size = kernel_size

layer_type #

layer_type = layer_type

out_channels #

out_channels = out_channels

output_padding #

output_padding = output_padding

padding #

padding = padding

padding_mode #

padding_mode = padding_mode

parametrization #

parametrization: Parametrization

Parametrization of the module.

params #

params = ParameterDict(mean_param_dict)

stride #

stride = stride

transposed #

transposed = transposed

weight #

weight: Parameter

extra_repr #

extra_repr()

forward #

forward(
    input: Float[Tensor, "*sample batch *in_feature"],
    /,
    sample_shape: Size = Size([]),
    generator: Generator | None = None,
    input_contains_samples: bool = False,
    parameter_samples: (
        dict[str, Float[Tensor, "*sample parameter"]] | None
    ) = None,
) -> Float[
    Tensor, "*sample *batch out_channel *out_feature"
]

parameters_and_lrs #

parameters_and_lrs(
    lr: float, optimizer: Literal["SGD", "Adam"] = "SGD"
) -> list[dict[str, Tensor | float]]

reset_parameters #

reset_parameters() -> None

Reset the parameters of the module.

Conv3d #

Conv3d(
    in_channels: int,
    out_channels: int,
    kernel_size: _size_3_t,
    stride: _size_3_t = 1,
    padding: str | _size_3_t = 0,
    dilation: _size_3_t = 1,
    groups: int = 1,
    bias: bool = True,
    padding_mode: str = "zeros",
    layer_type: Literal[
        "input", "hidden", "output"
    ] = "hidden",
    cov: FactorizedCovariance | None = None,
    parametrization: Parametrization = MaximalUpdate(),
    device: device | None = None,
    dtype: dtype | None = None,
)

Bases: _ConvNd

Applies a 3D convolution over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size :math:(N, C_{in}, D, H, W) and output :math:(N, C_{out}, D_{out}, H_{out}, W_{out}) can be precisely described as:

.. math:: out(N_i, C_{out_j}) = bias(C_{out_j}) + \sum_{k = 0}^{C_{in} - 1} weight(C_{out_j}, k) \star input(N_i, k)

where :math:\star is the valid 3D cross-correlation_ operator

Parameters:

Name Type Description Default
in_channels int

Number of channels in the input image.

required
out_channels int

Number of channels produced by the convolution.

required
kernel_size _size_3_t

Size of the convolving kernel.

required
stride _size_3_t

Stride of the convolution.

1
padding str | _size_3_t

Padding added to all six sides of the input.

0
dilation _size_3_t

Spacing between kernel elements.

1
groups int

Number of blocked connections from input channels to output channels.

1
bias bool

If True, adds a learnable bias to the output.

True
padding_mode str

'zeros', 'reflect', 'replicate' or 'circular'. Default is 'zeros'.

'zeros'
layer_type Literal['input', 'hidden', 'output']

Type of the layer. Can be one of "input", "hidden", or "output". Controls the initialization and learning rate scaling of the parameters.

'hidden'
cov FactorizedCovariance | None

The covariance of the parameters.

None
parametrization Parametrization

The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module.

MaximalUpdate()
device device | None

The device on which to place the tensor.

None
dtype dtype | None

The desired data type of the returned tensor.

None

Methods:

Name Description
extra_repr
forward
parameters_and_lrs
reset_parameters

Reset the parameters of the module.

Attributes:

Name Type Description
bias Parameter
dilation
groups
in_channels
kernel_size
layer_type
out_channels
output_padding
padding
padding_mode
parametrization Parametrization

Parametrization of the module.

params
stride
transposed
weight Parameter

bias #

bias: Parameter

dilation #

dilation = dilation

groups #

groups = groups

in_channels #

in_channels = in_channels

kernel_size #

kernel_size = kernel_size

layer_type #

layer_type = layer_type

out_channels #

out_channels = out_channels

output_padding #

output_padding = output_padding

padding #

padding = padding

padding_mode #

padding_mode = padding_mode

parametrization #

parametrization: Parametrization

Parametrization of the module.

params #

params = ParameterDict(mean_param_dict)

stride #

stride = stride

transposed #

transposed = transposed

weight #

weight: Parameter

extra_repr #

extra_repr()

forward #

forward(
    input: Float[Tensor, "*sample batch *in_feature"],
    /,
    sample_shape: Size = Size([]),
    generator: Generator | None = None,
    input_contains_samples: bool = False,
    parameter_samples: (
        dict[str, Float[Tensor, "*sample parameter"]] | None
    ) = None,
) -> Float[
    Tensor, "*sample *batch out_channel *out_feature"
]

parameters_and_lrs #

parameters_and_lrs(
    lr: float, optimizer: Literal["SGD", "Adam"] = "SGD"
) -> list[dict[str, Tensor | float]]

reset_parameters #

reset_parameters() -> None

Reset the parameters of the module.

Linear #

Linear(
    in_features: int,
    out_features: int,
    bias: bool = True,
    layer_type: Literal[
        "input", "hidden", "output"
    ] = "hidden",
    cov: FactorizedCovariance | None = None,
    parametrization: Parametrization = MaximalUpdate(),
    device: device | None = None,
    dtype: dtype | None = None,
)

Bases: BNNMixin, Module

Applies an affine transformation to the input.

Parameters:

Name Type Description Default
in_features int

Size of each input sample.

required
out_features int

Size of each output sample.

required
bias bool

If set to False, the layer will not learn an additive bias.

True
layer_type Literal['input', 'hidden', 'output']

Type of the layer. Can be one of "input", "hidden", or "output". Controls the initialization and learning rate scaling of the parameters.

'hidden'
cov FactorizedCovariance | None

Covariance object for the parameters.

None
parametrization Parametrization

The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module.

MaximalUpdate()
device device | None

Device on which to instantiate the parameters.

None
dtype dtype | None

Data type of the parameters.

None

Methods:

Name Description
extra_repr
forward
parameters_and_lrs
reset_parameters

Reset the parameters of the module.

Attributes:

Name Type Description
bias Parameter
in_features
layer_type
out_features
parametrization Parametrization

Parametrization of the module.

params
weight Parameter

bias #

bias: Parameter

in_features #

in_features = in_features

layer_type #

layer_type = layer_type

out_features #

out_features = out_features

parametrization #

parametrization: Parametrization

Parametrization of the module.

params #

params = ParameterDict(mean_param_dict)

weight #

weight: Parameter

extra_repr #

extra_repr() -> str

forward #

forward(
    input: Float[Tensor, "*sample *batch in_feature"],
    /,
    sample_shape: Size = Size([]),
    generator: Generator | None = None,
    input_contains_samples: bool = False,
    parameter_samples: (
        dict[str, Float[Tensor, "*sample parameter"]] | None
    ) = None,
) -> Float[Tensor, "*sample *batch out_feature"]

parameters_and_lrs #

parameters_and_lrs(
    lr: float, optimizer: Literal["SGD", "Adam"] = "SGD"
) -> list[dict[str, Tensor | float]]

reset_parameters #

reset_parameters() -> None

Reset the parameters of the module.

MultiheadAttention #

MultiheadAttention(
    embed_dim: int,
    num_heads: int,
    dropout: float = 0.0,
    bias: bool = True,
    kdim: int | None = None,
    vdim: int | None = None,
    embed_dim_out: int | None = None,
    out_proj: bool = True,
    cov: (
        FactorizedCovariance
        | dict[FactorizedCovariance]
        | None
    ) = None,
    parametrization: Parametrization = MaximalUpdate(),
    device: device | None = None,
    dtype: dtype | None = None,
)

Bases: BNNMixin, Module

Attention layer (with multiple heads).

Multi-head (self-)attention layer with an optional attention mask, allowing a model to jointly attend to information from different representation subspaces. Consists of num_heads scaled dot-product attention modules, whose outputs are concatenated and then combined into an output sequence via a linear layer.

The module supports nested or padded tensors and is inspired by the following implementation.

Parameters:

Name Type Description Default
embed_dim int

Dimensionality of the inputs and outputs to the layer (i.e. dimensionality of the query embeddings).

required
num_heads int

Number of attention heads.

required
dropout float

Dropout probability; if greater than 0.0, dropout is applied.

0.0
bias bool

Whether to add bias to query, key, value and output projections.

True
kdim int | None

Dimensionality of the key embeddings.

None
vdim int | None

Dimensionality of the value embeddings.

None
embed_dim_out int | None

Dimensionality of the output embeddings. If None, set to embed_dim.

None
out_proj bool

Whether to include the output projection layer.

True
cov FactorizedCovariance | dict[FactorizedCovariance] | None

Covariance structure of the weights. Either a single covariance structure used in all linear projections, or a dictionary with keys k, q, v and out and values containing either covariance structures or None.

None
parametrization Parametrization

The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module.

MaximalUpdate()
device device | None

Device on which to instantiate the parameters.

None
dtype dtype | None

Data type of the parameters.

None

Methods:

Name Description
forward

Computes scaled dot product attention on query, key and value tensors, using an optional attention mask.

parameters_and_lrs

Get the parameters of the module and their learning rates for the chosen optimizer

reset_parameters

Reset the parameters of the module and set the parametrization of all children

Attributes:

Name Type Description
bias
dropout
embed_dim
embed_dim_out
head_dim
k_proj
kdim
num_heads
out_proj
parametrization Parametrization

Parametrization of the module.

q_proj
v_proj
vdim

bias #

bias = bias

dropout #

dropout = dropout

embed_dim #

embed_dim = embed_dim

embed_dim_out #

embed_dim_out = (
    embed_dim_out
    if embed_dim_out is not None
    else embed_dim
)

head_dim #

head_dim = embed_dim // num_heads

k_proj #

k_proj = Linear(
    kdim,
    embed_dim,
    bias=bias,
    cov=cov["k"],
    parametrization=parametrization,
    **factory_kwargs
)

kdim #

kdim = kdim if kdim is not None else embed_dim

num_heads #

num_heads = num_heads

out_proj #

out_proj = (
    Linear(
        embed_dim,
        embed_dim_out,
        bias=bias,
        cov=cov["out"],
        parametrization=parametrization,
        **factory_kwargs
    )
    if out_proj
    else None
)

parametrization #

parametrization: Parametrization

Parametrization of the module.

q_proj #

q_proj = Linear(
    embed_dim,
    embed_dim,
    bias=bias,
    cov=cov["q"],
    parametrization=parametrization,
    **factory_kwargs
)

v_proj #

v_proj = Linear(
    vdim,
    embed_dim,
    bias=bias,
    cov=cov["v"],
    parametrization=parametrization,
    **factory_kwargs
)

vdim #

vdim = vdim if vdim is not None else embed_dim

forward #

forward(
    query: Float[
        Tensor, "*sample batch query_token embed_dim"
    ],
    key: (
        Float[
            Tensor, "*sample batch keyval_token embed_dim_k"
        ]
        | None
    ),
    value: (
        Float[Tensor, "*sample batch token embed_dim_v"]
        | None
    ),
    attn_mask: (
        Float[Tensor, "batch query_token keyval_token"]
        | None
    ) = None,
    is_causal: bool = False,
    sample_shape: Size = Size([]),
    generator: Generator | None = None,
    input_contains_samples: bool = False,
    parameter_samples: (
        dict[str, Float[Tensor, "*sample parameter"]] | None
    ) = None,
) -> Float[Tensor, "*sample batch query_token embed_dim"]

Computes scaled dot product attention on query, key and value tensors, using an optional attention mask.

Parameters:

Name Type Description Default
query Float[Tensor, '*sample batch query_token embed_dim']

Query tensor / embeddings.

required
key Float[Tensor, '*sample batch keyval_token embed_dim_k'] | None

Key tensor / embeddings.

required
value Float[Tensor, '*sample batch token embed_dim_v'] | None

Value tensor / embeddings.

required
attn_mask Float[Tensor, 'batch query_token keyval_token'] | None

Attention mask; shape must be broadcastable to the shape of attention weights. Two types of masks are supported. A boolean mask where a value of True indicates that the element should take part in attention. A float mask of the same type as query, key, value that is added to the attention score.

None
is_causal bool

If set to true, the attention masking is a lower triangular matrix when the mask is a square matrix. The attention masking has the form of the upper left causal bias due to the alignment (see :class:~torch.nn.attention.bias.CausalBias) when the mask is a non-square matrix. An error is thrown if both attn_mask and is_causal are set.

False
sample_shape Size

Shape of samples.

Size([])
generator Generator | None

Random number generator.

None
input_contains_samples bool

Whether the input already contains samples. If True, the input is assumed to have len(sample_shape) many leading dimensions containing input samples (typically outputs from previous layers).

False
parameter_samples dict[str, Float[Tensor, '*sample parameter']] | None

Dictionary of parameter samples. Used to pass sampled parameters to the module. Useful to jointly sample parameters of multiple layers.

None

parameters_and_lrs #

parameters_and_lrs(
    lr: float, optimizer: Literal["SGD", "Adam"]
) -> list[dict[str, Tensor | float]]

Get the parameters of the module and their learning rates for the chosen optimizer and the parametrization of the module.

Parameters:

Name Type Description Default
lr float

The global learning rate.

required
optimizer Literal['SGD', 'Adam']

The optimizer being used.

required

reset_parameters #

reset_parameters() -> None

Reset the parameters of the module and set the parametrization of all children to the parametrization of the module.

This method should be implemented by subclasses to reset the parameters of the module.

Sequential #

Sequential(
    *args: BNNMixin | Module,
    parametrization: Parametrization | None = None
)
Sequential(
    arg: OrderedDict[str, BNNMixin | Module],
    parametrization: Parametrization | None = None,
)
Sequential(
    *args, parametrization: Parametrization | None = None
)

Bases: BNNMixin, Sequential

A sequential container for modules.

Modules will be added to it in the order they are passed in the constructor. Alternatively, an OrderedDict of modules can be passed in. The forward() method of Sequential accepts any input and forwards it to the first module it contains. It then "chains" outputs to inputs sequentially for each subsequent module, finally returning the output of the last module.

The value a Sequential provides over manually calling a sequence of modules is that it allows treating the whole container as a single module, such that performing a transformation on the Sequential applies to each of the modules it stores (which are each a registered submodule of the Sequential)

Parameters:

Name Type Description Default
*args

Any number of modules to add to the container.

()
parametrization Parametrization | None

The parametrization to use. If None, the parametrization of the modules in the container will be used. If a :class:~inferno.bnn.params.Parametrization object is passed, it will be used for all modules in the container.

None

Methods:

Name Description
forward
parameters_and_lrs

Get the parameters of the module and their learning rates for the chosen optimizer

reset_parameters

Reset the parameters of the module and set the parametrization of all children

Attributes:

Name Type Description
parametrization

Parametrization of the module.

parametrization #

parametrization = parametrization

Parametrization of the module.

forward #

forward(
    input: Float[Tensor, "*batch in_feature"],
    /,
    sample_shape: Size = Size([]),
    generator: Generator | None = None,
    input_contains_samples: bool = False,
    parameter_samples: (
        dict[str, Float[Tensor, "*sample parameter"]] | None
    ) = None,
) -> Float[Tensor, "*sample *batch out_feature"]

parameters_and_lrs #

parameters_and_lrs(
    lr: float, optimizer: Literal["SGD", "Adam"]
) -> list[dict[str, Tensor | float]]

Get the parameters of the module and their learning rates for the chosen optimizer and the parametrization of the module.

Parameters:

Name Type Description Default
lr float

The global learning rate.

required
optimizer Literal['SGD', 'Adam']

The optimizer being used.

required

reset_parameters #

reset_parameters() -> None

Reset the parameters of the module and set the parametrization of all children to the parametrization of the module.

This method should be implemented by subclasses to reset the parameters of the module.

SinusoidalPositionalEncoding #

SinusoidalPositionalEncoding(
    embed_dim: int,
    max_seq_len: int = 4096,
    base: int = 10000,
    device: device | None = None,
    dtype: dtype | None = None,
)

Bases: Module

Sinusoidal Positional Encoding.

This module adds fixed sinusoidal positional embeddings (Vaswani et al., 2017; Sec. 3.5) to input embeddings to provide the model with information about the relative or absolute position of tokens in a sequence.

The sinusoidal positional encoding uses sine and cosine functions of different frequencies:

\[ \begin{align*} \operatorname{PE}(\text{pos}, 2i) &= \sin(\text{base}^{-\frac{2i}{\text{embed\_dim}}} \cdot \text{pos}) \\ \operatorname{PE}(\text{pos}, 2i+1) &= \cos(\text{base}^{-\frac{2i}{\text{embed\_dim}}} \cdot \text{pos}) \end{align*} \]

where \(\text{pos}\) is the position index, \(0\leq i \leq \frac{\text{embed\_dim}{2}\) is the embedding dimension index and \(\text{embed\_dim}\) is the embedding dimensionality.

The encoding is designed so that each dimension of the positional encoding corresponds to a sinusoid with wavelengths forming a geometric progression from 2π to base·2π. This allows the model to easily learn to attend by relative positions.

Notes:

  • The positional encodings are added to the input embeddings, so both must have the same embedding dimension (embed_dim).
  • If the input sequence length exceeds max_seq_len, the encoding will be truncated to match the input length, which may cause issues. Ensure max_seq_len ≥ expected sequence lengths.
  • The encoding is deterministic and does not require training.
import torch
from inferno.bnn.modules import SinusoidalPositionalEncoding

# Create positional encoding for 512-dim embeddings
pos_enc = SinusoidalPositionalEncoding(embed_dim=512)

# Apply to input embeddings
batch_size, seq_len, embed_dim = 32, 100, 512
input_embeddings = torch.randn(batch_size, seq_len, embed_dim)
output = pos_enc(input_embeddings)
print(output.shape)  # torch.Size([32, 100, 512])

Parameters:

Name Type Description Default
embed_dim int

The embedding dimension. Should be even for proper sine/cosine pairing.

required
max_seq_len int

Maximum sequence length to precompute encodings for.

4096
base int

Base of the angular frequency.

10000
dtype dtype | None

Data type for the positional encodings.

None
device device | None

Device to place the positional encodings on.

None

Methods:

Name Description
forward

Add positional encoding to input embeddings.

Attributes:

Name Type Description
base
max_seq_len

base #

base = base

max_seq_len #

max_seq_len = max_seq_len

forward #

forward(
    x: Float[Tensor, "batch token embed_dim"],
) -> Float[Tensor, "batch token embed_dim"]

Add positional encoding to input embeddings.

Parameters:

Name Type Description Default
x Float[Tensor, 'batch token embed_dim']

Input embeddings.

required

Returns:

Type Description
Float[Tensor, 'batch token embed_dim']

Input embeddings with added positional encoding of the same shape.

batched_forward #

batched_forward(
    obj: Module, num_batch_dims: int
) -> Callable[
    [Float[Tensor, "*sample batch *in_feature"]],
    Float[Tensor, "*sample batch *out_feature"],
]

Call a torch.nn.Module on inputs with arbitrary many batch dimensions rather than just a single one.

This is useful to extend the functionality of a torch.nn.Module to work with arbitrary many batch dimensions, for example arbitrary many sampling dimensions.

Parameters:

Name Type Description Default
obj Module

The torch.nn.Module to call.

required
num_batch_dims int

The number of batch dimensions.

required