Skip to content

datasets > Tabular Data #

Classes:

Name Description
ConcreteCompressiveStrength

Concrete compressive strength (1,030 × 8).

ParkinsonsTelemonitoring

Parkinsons telemonitoring (5,875 × 21).

ProteinStructure

Physicochemical properties of protein tertiary structure (45,730 × 9).

RoadNetwork

3D Road Network (434,874 × 2).

WineQuality

Wine quality prediction from physicochemical properties (4,898 × 11).

ConcreteCompressiveStrength #

ConcreteCompressiveStrength(
    root: str | Path = None,
    transform: Callable | None = Lambda(
        lambda x: (
            x
            - as_tensor(
                [
                    281.1656,
                    73.8955,
                    54.1871,
                    181.5664,
                    6.2031,
                    972.9186,
                    773.5789,
                    45.6621,
                ]
            )
        )
        / as_tensor(
            [
                104.5071,
                86.2791,
                63.9965,
                21.3556,
                5.9735,
                77.7538,
                80.1754,
                63.1699,
            ]
        )
    ),
    target_transform: Callable | None = Lambda(
        lambda y: (y - 35.8178) / 16.7057
    ),
    download: bool = False,
)

Bases: RegressionDataset

Concrete compressive strength (1,030 × 8).

This UCI dataset contains the ingredients of concrete mixtures and their age. The regression task is to predict the concrete's compressive strength.

Source: https://archive.ics.uci.edu/dataset/165/concrete+compressive+strength

Methods:

Name Description
download

Attributes:

Name Type Description
URL
filepath str
md5
root
target_transform
transform

URL #

URL = "https://archive.ics.uci.edu/static/public/165/concrete+compressive+strength.zip"

filepath #

filepath: str

md5 #

md5 = '4aaeecaf0bf2eefccb8a4a6d4cc12785'

root #

root = Path(root)

target_transform #

target_transform = target_transform

transform #

transform = transform

download #

download() -> None

ParkinsonsTelemonitoring #

ParkinsonsTelemonitoring(
    root: str | Path = None,
    transform: Callable | None = Lambda(
        lambda x: (
            x
            - as_tensor(
                [
                    21.494,
                    64.805,
                    0.31779,
                    92.864,
                    21.296,
                    0.0061538,
                    4.4027e-05,
                    0.0029872,
                    0.0032769,
                    0.0089617,
                    0.034035,
                    0.31096,
                    0.017156,
                    0.020144,
                    0.027481,
                    0.051467,
                    0.03212,
                    21.679,
                    0.54147,
                    0.65324,
                    0.21959,
                ]
            )
        )
        / as_tensor(
            [
                12.372,
                8.8215,
                0.46566,
                53.446,
                8.1293,
                0.0056242,
                3.5983e-05,
                0.0031238,
                0.0037315,
                0.0093715,
                0.025835,
                0.23025,
                0.013237,
                0.016664,
                0.019986,
                0.039711,
                0.059692,
                4.2911,
                0.10099,
                0.070902,
                0.091498,
            ]
        )
    ),
    target_transform: Callable | None = Lambda(
        lambda y: (y - 29.0189) / 10.7003
    ),
    download: bool = False,
)

Bases: RegressionDataset

Parkinsons telemonitoring (5,875 × 21).

This UCI dataset is composed of a range of biomedical voice measurements from 42 people with early-stage Parkinson's disease recruited to a six-month trial of a telemonitoring device for remote symptom progression monitoring. The recordings were automatically captured in the patient's homes. The original study used a range of linear and nonlinear regression methods to predict the clinician's Parkinson's disease symptom score on the UPDRS scale.

Source: https://archive.ics.uci.edu/ml/datasets/parkinsons+telemonitoring

Methods:

Name Description
download

Attributes:

Name Type Description
URL
filepath str
md5
root
target_transform
transform

URL #

URL = "https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/telemonitoring/"

filepath #

filepath: str

md5 #

md5 = 'eba8e7531ac24fbe8473085a0a48e556'

root #

root = Path(root)

target_transform #

target_transform = target_transform

transform #

transform = transform

download #

download() -> None

ProteinStructure #

ProteinStructure(
    root: str | Path = None,
    transform: Callable | None = Lambda(
        lambda x: (
            x
            - as_tensor(
                [
                    9871.6,
                    3017.4,
                    0.30239,
                    103.49,
                    1368300.0,
                    145.64,
                    3989.8,
                    69.975,
                    34.524,
                ]
            )
        )
        / as_tensor(
            [
                4058.1,
                1464.3,
                0.062886,
                55.425,
                564040.0,
                69.999,
                1993.6,
                56.493,
                5.9798,
            ]
        )
    ),
    target_transform: Callable | None = Lambda(
        lambda y: (y - 7.7485) / 6.1183
    ),
    download: bool = False,
)

Bases: RegressionDataset

Physicochemical properties of protein tertiary structure (45,730 × 9).

This UCI dataset encompasses the physicochemical properties of protein tertiary structure, sourced from CASP 5-9. There are 45,730 decoys with 9 attributes and sizes varying from 0 to 21 angstroms.

Source: https://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+Structure

Methods:

Name Description
download

Attributes:

Name Type Description
URL
filepath str
md5
root
target_transform
transform

URL #

URL = "https://archive.ics.uci.edu/ml/machine-learning-databases/00265/"

filepath #

filepath: str

md5 #

md5 = '2cd0971a73f135ceb6aae74fe724a6f5'

root #

root = Path(root)

target_transform #

target_transform = target_transform

transform #

transform = transform

download #

download() -> None

RoadNetwork #

RoadNetwork(
    root: str | Path = None,
    transform: Callable | None = Lambda(
        lambda x: (x - as_tensor([9.7318, 57.0838]))
        / as_tensor([0.6273, 0.2895])
    ),
    target_transform: Callable | None = Lambda(
        lambda y: (y - 22.1854) / 18.618
    ),
    download: bool = False,
)

Bases: RegressionDataset

3D Road Network (434,874 × 2).

This UCI Dataset contains longitude, latitude and altitude values of a road network in North Jutland, Denmark (covering a region of 185x135 km2). Elevation values where extracted from a publicly available massive Laser Scan Point Cloud for Denmark. The regression task is to predict the altitude from longitude and latitude measurements.

Source: https://archive.ics.uci.edu/ml/datasets/3D+Road+Network+(North+Jutland,+Denmark)

Methods:

Name Description
download

Attributes:

Name Type Description
URL
filepath str
md5
root
target_transform
transform

URL #

URL = "https://archive.ics.uci.edu/ml/machine-learning-databases/00246/"

filepath #

filepath: str

md5 #

md5 = '989a6f4574e09ee6735d8af2e5885cc1'

root #

root = Path(root)

target_transform #

target_transform = target_transform

transform #

transform = transform

download #

download() -> None

WineQuality #

WineQuality(
    root: str | Path = None,
    transform: Callable | None = Lambda(
        lambda x: (
            x
            - as_tensor(
                [
                    8.3196,
                    0.5278,
                    0.271,
                    2.5388,
                    0.0875,
                    15.8749,
                    46.4678,
                    0.9967,
                    3.3111,
                    0.6581,
                    10.423,
                ]
            )
        )
        / as_tensor(
            [
                1.7411,
                0.17906,
                0.1948,
                1.4099,
                0.047065,
                10.46,
                32.895,
                0.0018873,
                0.15439,
                0.16951,
                1.0657,
            ]
        )
    ),
    target_transform: Callable | None = Lambda(
        lambda y: (y - 5.636) / 0.8076
    ),
    download: bool = False,
    wine_type: Literal["red", "white"] = "red",
)

Bases: RegressionDataset

Wine quality prediction from physicochemical properties (4,898 × 11).

This UCI dataset contains red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests.

Source: https://archive.ics.uci.edu/dataset/186/wine+quality

Methods:

Name Description
download

Attributes:

Name Type Description
URL
filepath str
md5
root
target_transform
transform
wine_type

URL #

URL = "https://archive.ics.uci.edu/static/public/186/wine+quality.zip"

filepath #

filepath: str

md5 #

md5 = {
    "red": "7d814a1bda02145efe703f4e1c01847a",
    "white": "b56c9a78a7fcad87a58fc586bf5298bc",
}

root #

root = Path(root)

target_transform #

target_transform = target_transform

transform #

transform = transform

wine_type #

wine_type = wine_type

download #

download() -> None