Link Search Menu Expand Document

readData Github code

Load built-in datasets


Syntax

data = readData(dataName,Name,Value)

Description

data = readData(DataName,Name,Value) returns one of the built-in datasets in selected format. Name and Value specifies additional options using one or more name-value pair arguments. For example, users can specify if the data is stored in a 2D array or a table.

See: Input Arguments, Output Argument


Input Arguments

dataName - Name of the dataset

Data type: string


Name of the built-in datasets of the VBLab package. See list of VBLab built-in datasets.

Data Name Description Task Argument
'Abalon' Cross-sectional data Regression 'Intercept', 'Normalized','Type'
'Cencus' Cross-sectional data Classification 'Intercept', 'Normalized','Type'
'DirectMarketing' Cross-sectional data Regression 'Intercept', 'Normalized','Type'
'GermanCredit' Cross-sectional data Classification 'Intercept', 'Normalized','Type'
'LabourForce' Cross-sectional data Classification 'Intercept', 'Normalized','Type'
'RealizedLibrary' Time-series data Regression 'Index','Length', 'RealizedMeasure','Type'

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Type','Matrix','Intercept',true specifies that the output dataset is stored in a 2D array and a column of 1 will be added to the data matrix as intercepts.

'Intercept' - Adding intercept column

Data Type: true | false


Flag to add a column of $1$ to the data as intercepts.

Note: Only available for cross-sectional (tabular) data

Default: false

Example: 'Intercept',true

'Type' - Data format

Data Type: string


Format of the output data. Can be specified as

  • 'Matrix' Data is stored in a 2D array, for cross-sectional data or multivariate time series data, or 1D array, for univaritate time series. For cross-sectional, the last data column contains the response values.
  • 'Table' Data is stored in a table. For cross-sectional, the last data column contains the response values.

Default: Matrix

Example: 'Type',Table

'Normalized' - Normalization flag

Data Type: true | false


Flag to normalize numerical variables. Neural network based models such as DeepGLM work more efficient with normalized data.

Note: Only available for cross-sectional (tabular) data

Default: false

Example: 'Normalized',true

'Index' - Stock index

Data Type: string


Stock return indices of the 'RealizedLibrary' data.

Default: None

Example: 'Index','SP500'

'Length' - Number of observations of time series

Data Type: Integer | Positive


Number of observations of the time series. Only available for the 'RealizedLibrary' data.

Default: None

Example: 'Length',1000

'RealizedMeasure' - Realized measures of stock return volatility

Data Type: string | cell array of strings


Realized measures

Default: None

Example: 'RealizedMeasure','rk_parzen'

Example: 'RealizedMeasure',{'rk_parzen','medrv'}


Output Arguments

data - Output dataset

Data type: array | table


The dataset stored as an 1D or 2D array or a table.

  • For cross-sectional data, the last column of the 2D array or table contains response values.
  • For time series data, data can be 1D (univariate) or 2D (multivariate) array.