modules
#
Classes:
| Name | Description |
|---|---|
BNNMixin |
Abstract mixin class turning a torch module into a Bayesian neural network module. |
Conv1d |
Applies a 1D convolution over an input signal composed of several input planes. |
Conv2d |
Applies a 2D convolution over an input signal composed of several input planes. |
Conv3d |
Applies a 3D convolution over an input signal composed of several input planes. |
Linear |
Applies an affine transformation to the input. |
MultiheadAttention |
Attention layer (with multiple heads). |
Sequential |
A sequential container for modules. |
SinusoidalPositionalEncoding |
Sinusoidal Positional Encoding. |
Functions:
| Name | Description |
|---|---|
batched_forward |
Call a torch.nn.Module on inputs with arbitrary many batch dimensions rather than |
BNNMixin
#
BNNMixin(
parametrization: (
Parametrization | None
) = MaximalUpdate(),
*args,
**kwargs
)
Bases: ABC
Abstract mixin class turning a torch module into a Bayesian neural network module.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parametrization
|
Parametrization | None
|
The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module. |
MaximalUpdate()
|
Methods:
| Name | Description |
|---|---|
forward |
Forward pass of the module. |
parameters_and_lrs |
Get the parameters of the module and their learning rates for the chosen optimizer |
reset_parameters |
Reset the parameters of the module and set the parametrization of all children |
Attributes:
| Name | Type | Description |
|---|---|---|
parametrization |
Parametrization
|
Parametrization of the module. |
forward
#
forward(
input: Float[Tensor, "*sample batch *in_feature"],
/,
sample_shape: Size = Size([]),
generator: Generator | None = None,
input_contains_samples: bool = False,
parameter_samples: (
dict[str, Float[Tensor, "*sample parameter"]] | None
) = None,
) -> Float[Tensor, "*sample *batch *out_feature"]
Forward pass of the module.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
Float[Tensor, '*sample batch *in_feature']
|
Input tensor. |
required |
sample_shape
|
Size
|
Shape of samples. |
Size([])
|
generator
|
Generator | None
|
Random number generator. |
None
|
input_contains_samples
|
bool
|
Whether the input already contains samples. If True, the input is assumed to have |
False
|
parameter_samples
|
dict[str, Float[Tensor, '*sample parameter']] | None
|
Dictionary of parameter samples. Used to pass sampled parameters to the module. Useful to jointly sample parameters of multiple layers. |
None
|
parameters_and_lrs
#
reset_parameters
#
Reset the parameters of the module and set the parametrization of all children to the parametrization of the module.
This method should be implemented by subclasses to reset the parameters of the module.
Conv1d
#
Conv1d(
in_channels: int,
out_channels: int,
kernel_size: _size_1_t,
stride: _size_1_t = 1,
padding: str | _size_1_t = 0,
dilation: _size_1_t = 1,
groups: int = 1,
bias: bool = True,
padding_mode: str = "zeros",
layer_type: Literal[
"input", "hidden", "output"
] = "hidden",
cov: FactorizedCovariance | None = None,
parametrization: Parametrization = MaximalUpdate(),
device: device | None = None,
dtype: dtype | None = None,
)
Bases: _ConvNd
Applies a 1D convolution over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size
:math:(N, C_{\text{in}}, L) and output :math:(N, C_{\text{out}}, L_{\text{out}}) can be
precisely described as:
.. math:: \text{out}(N_i, C_{\text{out}j}) = \text{bias}(Cj}) + \sum, k) \star \text{input}(N_i, k)}^{C_{in} - 1} \text{weight}(C_{\text{out}_j
where :math:\star is the valid cross-correlation_ operator,
:math:N is a batch size, :math:C denotes a number of channels,
:math:L is a length of signal sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
in_channels
|
int
|
Number of channels in the input image. |
required |
out_channels
|
int
|
Number of channels produced by the convolution. |
required |
kernel_size
|
_size_1_t
|
Size of the convolving kernel. |
required |
stride
|
_size_1_t
|
Stride of the convolution. |
1
|
padding
|
str | _size_1_t
|
Padding added to both sides of the input. |
0
|
dilation
|
_size_1_t
|
Spacing between kernel elements. |
1
|
groups
|
int
|
Number of blocked connections from input channels to output channels. |
1
|
bias
|
bool
|
If |
True
|
padding_mode
|
str
|
|
'zeros'
|
layer_type
|
Literal['input', 'hidden', 'output']
|
Type of the layer. Can be one of "input", "hidden", or "output". Controls the initialization and learning rate scaling of the parameters. |
'hidden'
|
cov
|
FactorizedCovariance | None
|
The covariance of the parameters. |
None
|
parametrization
|
Parametrization
|
The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module. |
MaximalUpdate()
|
device
|
device | None
|
The device on which to place the tensor. |
None
|
dtype
|
dtype | None
|
The desired data type of the returned tensor. |
None
|
Methods:
| Name | Description |
|---|---|
extra_repr |
|
forward |
|
parameters_and_lrs |
|
reset_parameters |
Reset the parameters of the module. |
Attributes:
| Name | Type | Description |
|---|---|---|
bias |
Parameter
|
|
dilation |
|
|
groups |
|
|
in_channels |
|
|
kernel_size |
|
|
layer_type |
|
|
out_channels |
|
|
output_padding |
|
|
padding |
|
|
padding_mode |
|
|
parametrization |
Parametrization
|
Parametrization of the module. |
params |
|
|
stride |
|
|
transposed |
|
|
weight |
Parameter
|
|
forward
#
forward(
input: Float[Tensor, "*sample batch *in_feature"],
/,
sample_shape: Size = Size([]),
generator: Generator | None = None,
input_contains_samples: bool = False,
parameter_samples: (
dict[str, Float[Tensor, "*sample parameter"]] | None
) = None,
) -> Float[
Tensor, "*sample *batch out_channel *out_feature"
]
Conv2d
#
Conv2d(
in_channels: int,
out_channels: int,
kernel_size: _size_2_t,
stride: _size_2_t = 1,
padding: str | _size_2_t = 0,
dilation: _size_2_t = 1,
groups: int = 1,
bias: bool = True,
padding_mode: str = "zeros",
layer_type: Literal[
"input", "hidden", "output"
] = "hidden",
cov: FactorizedCovariance | None = None,
parametrization: Parametrization = MaximalUpdate(),
device: device | None = None,
dtype: dtype | None = None,
)
Bases: _ConvNd
Applies a 2D convolution over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size
:math:(N, C_{\text{in}}, H, W) and output :math:(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})
can be precisely described as:
.. math:: \text{out}(N_i, C_{\text{out}j}) = \text{bias}(Cj}) + \sum(N_i, k)}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input
where :math:\star is the valid 2D cross-correlation_ operator,
:math:N is a batch size, :math:C denotes a number of channels,
:math:H is a height of input planes in pixels, and :math:W is
width in pixels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
in_channels
|
int
|
Number of channels in the input image. |
required |
out_channels
|
int
|
Number of channels produced by the convolution. |
required |
kernel_size
|
_size_2_t
|
Size of the convolving kernel. |
required |
stride
|
_size_2_t
|
Stride of the convolution. |
1
|
padding
|
str | _size_2_t
|
Padding added to all four sides of the input. |
0
|
dilation
|
_size_2_t
|
Spacing between kernel elements. |
1
|
groups
|
int
|
Number of blocked connections from input channels to output channels. |
1
|
bias
|
bool
|
If |
True
|
padding_mode
|
str
|
|
'zeros'
|
layer_type
|
Literal['input', 'hidden', 'output']
|
Type of the layer. Can be one of "input", "hidden", or "output". Controls the initialization and learning rate scaling of the parameters. |
'hidden'
|
cov
|
FactorizedCovariance | None
|
The covariance of the parameters. |
None
|
parametrization
|
Parametrization
|
The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module. |
MaximalUpdate()
|
device
|
device | None
|
The device on which to place the tensor. |
None
|
dtype
|
dtype | None
|
The desired data type of the returned tensor. |
None
|
Methods:
| Name | Description |
|---|---|
extra_repr |
|
forward |
|
parameters_and_lrs |
|
reset_parameters |
Reset the parameters of the module. |
Attributes:
| Name | Type | Description |
|---|---|---|
bias |
Parameter
|
|
dilation |
|
|
groups |
|
|
in_channels |
|
|
kernel_size |
|
|
layer_type |
|
|
out_channels |
|
|
output_padding |
|
|
padding |
|
|
padding_mode |
|
|
parametrization |
Parametrization
|
Parametrization of the module. |
params |
|
|
stride |
|
|
transposed |
|
|
weight |
Parameter
|
|
forward
#
forward(
input: Float[Tensor, "*sample batch *in_feature"],
/,
sample_shape: Size = Size([]),
generator: Generator | None = None,
input_contains_samples: bool = False,
parameter_samples: (
dict[str, Float[Tensor, "*sample parameter"]] | None
) = None,
) -> Float[
Tensor, "*sample *batch out_channel *out_feature"
]
Conv3d
#
Conv3d(
in_channels: int,
out_channels: int,
kernel_size: _size_3_t,
stride: _size_3_t = 1,
padding: str | _size_3_t = 0,
dilation: _size_3_t = 1,
groups: int = 1,
bias: bool = True,
padding_mode: str = "zeros",
layer_type: Literal[
"input", "hidden", "output"
] = "hidden",
cov: FactorizedCovariance | None = None,
parametrization: Parametrization = MaximalUpdate(),
device: device | None = None,
dtype: dtype | None = None,
)
Bases: _ConvNd
Applies a 3D convolution over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size :math:(N, C_{in}, D, H, W)
and output :math:(N, C_{out}, D_{out}, H_{out}, W_{out}) can be precisely described as:
.. math:: out(N_i, C_{out_j}) = bias(C_{out_j}) + \sum_{k = 0}^{C_{in} - 1} weight(C_{out_j}, k) \star input(N_i, k)
where :math:\star is the valid 3D cross-correlation_ operator
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
in_channels
|
int
|
Number of channels in the input image. |
required |
out_channels
|
int
|
Number of channels produced by the convolution. |
required |
kernel_size
|
_size_3_t
|
Size of the convolving kernel. |
required |
stride
|
_size_3_t
|
Stride of the convolution. |
1
|
padding
|
str | _size_3_t
|
Padding added to all six sides of the input. |
0
|
dilation
|
_size_3_t
|
Spacing between kernel elements. |
1
|
groups
|
int
|
Number of blocked connections from input channels to output channels. |
1
|
bias
|
bool
|
If |
True
|
padding_mode
|
str
|
|
'zeros'
|
layer_type
|
Literal['input', 'hidden', 'output']
|
Type of the layer. Can be one of "input", "hidden", or "output". Controls the initialization and learning rate scaling of the parameters. |
'hidden'
|
cov
|
FactorizedCovariance | None
|
The covariance of the parameters. |
None
|
parametrization
|
Parametrization
|
The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module. |
MaximalUpdate()
|
device
|
device | None
|
The device on which to place the tensor. |
None
|
dtype
|
dtype | None
|
The desired data type of the returned tensor. |
None
|
Methods:
| Name | Description |
|---|---|
extra_repr |
|
forward |
|
parameters_and_lrs |
|
reset_parameters |
Reset the parameters of the module. |
Attributes:
| Name | Type | Description |
|---|---|---|
bias |
Parameter
|
|
dilation |
|
|
groups |
|
|
in_channels |
|
|
kernel_size |
|
|
layer_type |
|
|
out_channels |
|
|
output_padding |
|
|
padding |
|
|
padding_mode |
|
|
parametrization |
Parametrization
|
Parametrization of the module. |
params |
|
|
stride |
|
|
transposed |
|
|
weight |
Parameter
|
|
forward
#
forward(
input: Float[Tensor, "*sample batch *in_feature"],
/,
sample_shape: Size = Size([]),
generator: Generator | None = None,
input_contains_samples: bool = False,
parameter_samples: (
dict[str, Float[Tensor, "*sample parameter"]] | None
) = None,
) -> Float[
Tensor, "*sample *batch out_channel *out_feature"
]
Linear
#
Linear(
in_features: int,
out_features: int,
bias: bool = True,
layer_type: Literal[
"input", "hidden", "output"
] = "hidden",
cov: FactorizedCovariance | None = None,
parametrization: Parametrization = MaximalUpdate(),
device: device | None = None,
dtype: dtype | None = None,
)
Applies an affine transformation to the input.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
in_features
|
int
|
Size of each input sample. |
required |
out_features
|
int
|
Size of each output sample. |
required |
bias
|
bool
|
If set to |
True
|
layer_type
|
Literal['input', 'hidden', 'output']
|
Type of the layer. Can be one of "input", "hidden", or "output". Controls the initialization and learning rate scaling of the parameters. |
'hidden'
|
cov
|
FactorizedCovariance | None
|
Covariance object for the parameters. |
None
|
parametrization
|
Parametrization
|
The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module. |
MaximalUpdate()
|
device
|
device | None
|
Device on which to instantiate the parameters. |
None
|
dtype
|
dtype | None
|
Data type of the parameters. |
None
|
Methods:
| Name | Description |
|---|---|
extra_repr |
|
forward |
|
parameters_and_lrs |
|
reset_parameters |
Reset the parameters of the module. |
Attributes:
| Name | Type | Description |
|---|---|---|
bias |
Parameter
|
|
in_features |
|
|
layer_type |
|
|
out_features |
|
|
parametrization |
Parametrization
|
Parametrization of the module. |
params |
|
|
weight |
Parameter
|
|
forward
#
forward(
input: Float[Tensor, "*sample *batch in_feature"],
/,
sample_shape: Size = Size([]),
generator: Generator | None = None,
input_contains_samples: bool = False,
parameter_samples: (
dict[str, Float[Tensor, "*sample parameter"]] | None
) = None,
) -> Float[Tensor, "*sample *batch out_feature"]
MultiheadAttention
#
MultiheadAttention(
embed_dim: int,
num_heads: int,
dropout: float = 0.0,
bias: bool = True,
kdim: int | None = None,
vdim: int | None = None,
embed_dim_out: int | None = None,
out_proj: bool = True,
cov: (
FactorizedCovariance
| dict[FactorizedCovariance]
| None
) = None,
parametrization: Parametrization = MaximalUpdate(),
device: device | None = None,
dtype: dtype | None = None,
)
Attention layer (with multiple heads).
Multi-head (self-)attention layer with an optional attention mask, allowing a model to jointly attend
to information from different representation subspaces. Consists of num_heads scaled dot-product
attention modules, whose outputs are concatenated and then combined into an output sequence via a
linear layer.
The module supports nested or padded tensors and is inspired by the following implementation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embed_dim
|
int
|
Dimensionality of the inputs and outputs to the layer (i.e. dimensionality of the query embeddings). |
required |
num_heads
|
int
|
Number of attention heads. |
required |
dropout
|
float
|
Dropout probability; if greater than 0.0, dropout is applied. |
0.0
|
bias
|
bool
|
Whether to add bias to query, key, value and output projections. |
True
|
kdim
|
int | None
|
Dimensionality of the key embeddings. |
None
|
vdim
|
int | None
|
Dimensionality of the value embeddings. |
None
|
embed_dim_out
|
int | None
|
Dimensionality of the output embeddings. If |
None
|
out_proj
|
bool
|
Whether to include the output projection layer. |
True
|
cov
|
FactorizedCovariance | dict[FactorizedCovariance] | None
|
Covariance structure of the weights. Either a single covariance structure used in all linear projections, or a dictionary with keys |
None
|
parametrization
|
Parametrization
|
The parametrization to use. Defines the initialization and learning rate scaling for the parameters of the module. |
MaximalUpdate()
|
device
|
device | None
|
Device on which to instantiate the parameters. |
None
|
dtype
|
dtype | None
|
Data type of the parameters. |
None
|
Methods:
| Name | Description |
|---|---|
forward |
Computes scaled dot product attention on query, key and value tensors, using an optional attention mask. |
parameters_and_lrs |
Get the parameters of the module and their learning rates for the chosen optimizer |
reset_parameters |
Reset the parameters of the module and set the parametrization of all children |
Attributes:
| Name | Type | Description |
|---|---|---|
bias |
|
|
dropout |
|
|
embed_dim |
|
|
embed_dim_out |
|
|
head_dim |
|
|
k_proj |
|
|
kdim |
|
|
num_heads |
|
|
out_proj |
|
|
parametrization |
Parametrization
|
Parametrization of the module. |
q_proj |
|
|
v_proj |
|
|
vdim |
|
k_proj
#
k_proj = Linear(
kdim,
embed_dim,
bias=bias,
cov=cov["k"],
parametrization=parametrization,
**factory_kwargs
)
out_proj
#
out_proj = (
Linear(
embed_dim,
embed_dim_out,
bias=bias,
cov=cov["out"],
parametrization=parametrization,
**factory_kwargs
)
if out_proj
else None
)
q_proj
#
q_proj = Linear(
embed_dim,
embed_dim,
bias=bias,
cov=cov["q"],
parametrization=parametrization,
**factory_kwargs
)
v_proj
#
v_proj = Linear(
vdim,
embed_dim,
bias=bias,
cov=cov["v"],
parametrization=parametrization,
**factory_kwargs
)
forward
#
forward(
query: Float[
Tensor, "*sample batch query_token embed_dim"
],
key: (
Float[
Tensor, "*sample batch keyval_token embed_dim_k"
]
| None
),
value: (
Float[Tensor, "*sample batch token embed_dim_v"]
| None
),
attn_mask: (
Float[Tensor, "batch query_token keyval_token"]
| None
) = None,
is_causal: bool = False,
sample_shape: Size = Size([]),
generator: Generator | None = None,
input_contains_samples: bool = False,
parameter_samples: (
dict[str, Float[Tensor, "*sample parameter"]] | None
) = None,
) -> Float[Tensor, "*sample batch query_token embed_dim"]
Computes scaled dot product attention on query, key and value tensors, using an optional attention mask.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
Float[Tensor, '*sample batch query_token embed_dim']
|
Query tensor / embeddings. |
required |
key
|
Float[Tensor, '*sample batch keyval_token embed_dim_k'] | None
|
Key tensor / embeddings. |
required |
value
|
Float[Tensor, '*sample batch token embed_dim_v'] | None
|
Value tensor / embeddings. |
required |
attn_mask
|
Float[Tensor, 'batch query_token keyval_token'] | None
|
Attention mask; shape must be broadcastable to the shape of attention weights. Two types of masks are supported. A boolean mask where a value of True indicates that the element should take part in attention. A float mask of the same type as query, key, value that is added to the attention score. |
None
|
is_causal
|
bool
|
If set to true, the attention masking is a lower triangular matrix when the mask is a square matrix. The attention masking has the form of the upper left causal bias due to the alignment (see :class: |
False
|
sample_shape
|
Size
|
Shape of samples. |
Size([])
|
generator
|
Generator | None
|
Random number generator. |
None
|
input_contains_samples
|
bool
|
Whether the input already contains samples. If True, the input is assumed to have |
False
|
parameter_samples
|
dict[str, Float[Tensor, '*sample parameter']] | None
|
Dictionary of parameter samples. Used to pass sampled parameters to the module. Useful to jointly sample parameters of multiple layers. |
None
|
parameters_and_lrs
#
reset_parameters
#
Reset the parameters of the module and set the parametrization of all children to the parametrization of the module.
This method should be implemented by subclasses to reset the parameters of the module.
Sequential
#
Sequential(
*args: BNNMixin | Module,
parametrization: Parametrization | None = None
)
Sequential(
arg: OrderedDict[str, BNNMixin | Module],
parametrization: Parametrization | None = None,
)
Sequential(
*args, parametrization: Parametrization | None = None
)
Bases: BNNMixin, Sequential
A sequential container for modules.
Modules will be added to it in the order they are passed in the
constructor. Alternatively, an OrderedDict of modules can be
passed in. The forward() method of Sequential accepts any
input and forwards it to the first module it contains. It then
"chains" outputs to inputs sequentially for each subsequent module,
finally returning the output of the last module.
The value a Sequential provides over manually calling a sequence
of modules is that it allows treating the whole container as a
single module, such that performing a transformation on the
Sequential applies to each of the modules it stores (which are
each a registered submodule of the Sequential)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*args
|
Any number of modules to add to the container. |
()
|
|
parametrization
|
Parametrization | None
|
The parametrization to use. If |
None
|
Methods:
| Name | Description |
|---|---|
forward |
|
parameters_and_lrs |
Get the parameters of the module and their learning rates for the chosen optimizer |
reset_parameters |
Reset the parameters of the module and set the parametrization of all children |
Attributes:
| Name | Type | Description |
|---|---|---|
parametrization |
Parametrization of the module. |
forward
#
forward(
input: Float[Tensor, "*batch in_feature"],
/,
sample_shape: Size = Size([]),
generator: Generator | None = None,
input_contains_samples: bool = False,
parameter_samples: (
dict[str, Float[Tensor, "*sample parameter"]] | None
) = None,
) -> Float[Tensor, "*sample *batch out_feature"]
parameters_and_lrs
#
reset_parameters
#
Reset the parameters of the module and set the parametrization of all children to the parametrization of the module.
This method should be implemented by subclasses to reset the parameters of the module.
SinusoidalPositionalEncoding
#
SinusoidalPositionalEncoding(
embed_dim: int,
max_seq_len: int = 4096,
base: int = 10000,
device: device | None = None,
dtype: dtype | None = None,
)
Bases: Module
Sinusoidal Positional Encoding.
This module adds fixed sinusoidal positional embeddings (Vaswani et al., 2017; Sec. 3.5) to input embeddings to provide the model with information about the relative or absolute position of tokens in a sequence.
The sinusoidal positional encoding uses sine and cosine functions of different frequencies:
where \(\text{pos}\) is the position index, \(0\leq i \leq \frac{\text{embed\_dim}{2}\) is the embedding dimension index and \(\text{embed\_dim}\) is the embedding dimensionality.
The encoding is designed so that each dimension of the positional encoding corresponds to a sinusoid with wavelengths forming a geometric progression from 2π to base·2π. This allows the model to easily learn to attend by relative positions.
Notes:
- The positional encodings are added to the input embeddings, so both must have
the same embedding dimension (
embed_dim). - If the input sequence length exceeds
max_seq_len, the encoding will be truncated to match the input length, which may cause issues. Ensuremax_seq_len≥ expected sequence lengths. - The encoding is deterministic and does not require training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embed_dim
|
int
|
The embedding dimension. Should be even for proper sine/cosine pairing. |
required |
max_seq_len
|
int
|
Maximum sequence length to precompute encodings for. |
4096
|
base
|
int
|
Base of the angular frequency. |
10000
|
dtype
|
dtype | None
|
Data type for the positional encodings. |
None
|
device
|
device | None
|
Device to place the positional encodings on. |
None
|
Methods:
| Name | Description |
|---|---|
forward |
Add positional encoding to input embeddings. |
Attributes:
| Name | Type | Description |
|---|---|---|
base |
|
|
max_seq_len |
|
forward
#
batched_forward
#
batched_forward(
obj: Module, num_batch_dims: int
) -> Callable[
[Float[Tensor, "*sample batch *in_feature"]],
Float[Tensor, "*sample batch *out_feature"],
]
Call a torch.nn.Module on inputs with arbitrary many batch dimensions rather than just a single one.
This is useful to extend the functionality of a torch.nn.Module to work with arbitrary many batch dimensions, for example arbitrary many sampling dimensions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj
|
Module
|
The torch.nn.Module to call. |
required |
num_batch_dims
|
int
|
The number of batch dimensions. |
required |