Link Search Menu Expand Document

Built-in datasets

VBLab provides provide a few built-in datasets (already-cleaned, in *.mat format) that can be used for debugging a model or creating simple code examples.

List of datasets

  1. Abalon data
  2. DirectMarketing data
  3. GermanCredit data
  4. LabourForce data
  5. RealizedLibrary

Abalon data

Information

Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope – a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem.

The dataset has $4177$ rows and $9$ columns. The last column is used as the dependent variable.

For more information or to download the dataset, please visit the dataset website.

Attribute Information

Given is the attribute name, attribute type, the measurement unit and a brief description.

Name Data Type Measurement Unit Description
Sex categorical M, F, and I (infant)
Length continuous mm Longest shell measurement
Diameter continuous mm perpendicular to length
Height continuous mm with meat in shell
Whole weight continuous grams whole abalone
Shucked weight continuous grams weight of meat
Viscera weight continuous grams gut weight (after bleeding)
Shell weight continuous grams after being dried
Rings integer +1.5 gives the age in years

Citation

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Or bibtex entry :

@misc{Dua:2019 ,
author = "Dua, Dheeru and Graff, Casey",
year = "2017",
title = "{UCI} Machine Learning Repository",
url = "http://archive.ics.uci.edu/ml",
institution = "University of California, Irvine, School of Information and Computer Sciences" }

DirectMarketing data

Information

The data set includes data from a direct marketer who sells his products only via direct mail. He sends catalogs with product characteristics to customers who then order directly from the catalogs. The marketer has developed customer records to learn what makes some customers spend more than others.

The objective is to explain AmountSpent in terms of the provided customer characteristics. The dataset has $1000$ rows and $10$ columns. The last column is used as the dependent variable.

For more information or to download the dataset, please visit the dataset on Kaggle website.

Attribute Information

Given is the attribute name, attribute type, values and a brief description.

Name Data Type Values Description
Age categorical old/middle/young Age of customer
Gender binary male/female Customer gender
OwnHome binary yes/no Whether customer owns home
Married binary single/married Whether customer is married or single
Location binary far/close Location in terms of distance to the nearest brick and mortar store that sells similar products
Salary continuous in USD Annual salary of customer
Children integer 0–3 Number of childrens
History categorical low/medium/high/NA History of previous purchase volume. NA means that this customer has not yet purchased
Catalogs continuous in USD Number of catalogs sent
AmountSpent continuous in USD Amount spent of customer

GermanCredit data

Information

This dataset classifies people described by a set of attributes as good or bad credit risks.

The dataset has $1000$ rows and $21$ column. The last column is used as the dependent variable.

For more information or to download the dataset, please visit the dataset website.

Attribute Information

Given is the attribute name, attribute type, values and a brief description.

Order Name Data Type Values Description
1 status ordinal 1-4 Status of the debtor’s checking account with the bank
2 duration integer   Credit duration in months
3 credit_history categorical 0-4 History of compliance with previous or concurrent credit contracts
4 credit_amount integer   credit amount in DM$100
5 savings categorical 0-5 Debtor’s savings
6 employment_duration ordinal 1-5 Present employment since
7 personal_status_sex categorical 1-5 Combined information on sex and marital status
8 present_residence ordinal 1-4 Length of time (in years) the debtor lives in the present residence
9 Property ordinal 1-4 The debtor’s most valuable property, i.e. the highest possible code is used
10 age integer   age in years
11 other_installment_plans categorical 1-3 Installment plans from providers other than the credit-giving bank
12 number_credits integer   Number of credits including the current one the debtor has (or had) at this bank
13 people_liable integer   Number of persons who financially depend on the debtor (i.e., are entitled to maintenance)
14 telephone binary yes/no Is there a telephone landline registered on the debtor’s name
15 foreign_worker binary yes/no Is the debtor a foreign worker?
16 purpose_1 binary 0/1 Purpose for which the credit is needed
17 purpose_2 binary 0/1 Purpose for which the credit is needed
18 other_debtors_1 binary 0/1 Is there another debtor or a guarantor for the credit? (none)
19 other_debtors_2 binary 0/1 Is there another debtor or a guarantor for the credit? (co-applicant)
20 housing_1 binary 0/1 Type of housing the debtor lives in (rent)
21 housing_2 binary 0/1 Type of housing the debtor lives in (own)
22 jobs_1 binary 0/1 Quality of debtor’s job (unemployed/ unskilled - non-resident)
23 jobs_2 binary 0/1 Quality of debtor’s job (unskilled - resident)
24 jobs_3 binary 0/1 Quality of debtor’s job (skilled employee / official)
25 credit_risk binary yes/no Has the credit contract been complied with (good) or not (bad)?

Citation

Grömping, U. (2019). South German Credit Data: Correcting a Widely Used Data Set. Report 4/2019, Reports in Mathematics, Physics and Chemistry, Department II, Beuth University of Applied Sciences Berlin.

Or bibtex entry :

@Techreport{Grömping:2019,
title = {South German Credit Data: Correcting a Widely Used Data Set.},
author = {Grömping, U.},
year = {2019},
institution = {"Department II, Beuth University of Applied Sciences Berlin.}}

LabourForce data

Information

The Labour Force Participation dataset contains information indicating whether or not the participants (women) are currently in the labour force.

The dataset has $753$ rows and $7$ column. The last column is used as the dependent variable.

Attribute Information

Given is the attribute name, attribute type, and a brief description.

Name Data Type Description
kidslt6 integer # kids < 6 years
kidsge6 integer # kids 6-18
age integer woman’s age in years
edu integer years of schooling
huswage continuous husband’s hourly wage
log_faminc continuous log of family income
inlf binary =1 if in labor force

Citation

Mroz, T. (1987) “The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions”, Econometrica, 55, 765-799.

Or bibtex entry :

@article{Thomas:1987,
 author = {Thomas A. Mroz},
 journal = {Econometrica},
 number = {4},
 pages = {765--799},
 publisher = {[Wiley, Econometric Society]},
 title = {The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions},
 volume = {55},
 year = {1987}
}


RealizedLibrary

Information

The Oxford-Man Institute’s “realised library” contains daily non-parametric measures of how volatility financial assets or indexes were in the past.

For more information or to download the dataset, please visit the dataset website.

Available Assets and realized measures of volatility

List of available assets:

Symbol Name Earliest Available Latest Available
AEX AEX index January 03, 2000 February 02, 2021
AORD All Ordinaries January 04, 2000 February 02, 2021
BFX Bell 20 Index January 03, 2000 February 02, 2021
BSESN S&P BSE Sensex January 03, 2000 February 02, 2021
BVLG PSI All-Share Index October 15, 2012 February 02, 2021
BVSP BVSP BOVESPA Index January 03, 2000 February 02, 2021
DJI Dow Jones Industrial Average January 03, 2000 February 02, 2021
FCHI CAC 40 January 03, 2000 February 02, 2021
FTMIB FTSE MIB June 01, 2009 February 02, 2021
FTSE FTSE 100 January 04, 2000 February 02, 2021
GDAXI DAX January 03, 2000 February 02, 2021
GSPTSE S&P/TSX Composite index May 02, 2002 February 02, 2021
HSI HANG SENG Index January 03, 2000 February 02, 2021
IBEX IBEX 35 Index January 03, 2000 February 02, 2021
IXIC Nasdaq 100 January 03, 2000 February 02, 2021
KS11 Korea Composite Stock Price Index (KOSPI) January 04, 2000 February 02, 2021
KSE Karachi SE 100 Index January 03, 2000 February 02, 2021
MXX IPC Mexico January 03, 2000 February 02, 2021
N225 Nikkei 225 February 02, 2000 February 02, 2021
NSEI NIFTY 50 January 03, 2000 February 02, 2021
OMXC20 OMX Copenhagen 20 Index October 03, 2005 February 02, 2021
OMXHPI OMX Helsinki All Share Index October 03, 2005 February 02, 2021
OMXSPI OMX Stockholm All Share Index October 03, 2005 February 02, 2021
OSEAX Oslo Exchange All-share Index September 03, 2001 February 02, 2021
RUT Russel 2000 January 03, 2000 February 02, 2021
SMSI Madrid General Index July 04, 2005 February 02, 2021
SPX S&P 500 Index January 03, 2000 February 02, 2021
SSEC Shanghai Composite Index January 04, 2000 February 02, 2021
SSMI Swiss Stock Market Index January 04, 2000 February 02, 2021
STI Straits Times Index January 03, 2000 February 02, 2021
STOXX50E EURO STOXX 50 January 03, 2000 February 02, 2021

List ealized measures of daily volatility that can be used to access the predictive performance of volatility models.

Code Description
bv Bipower Variation (5-min)
bv_ss Bipower Variation (5-min Sub-sampled)
medrv Median Realized Variance (5-min)
rk_parzen Realized Kernel Variance (Non-Flat Parzen)
rk_th2 Realized Kernel Variance (Tukey-Hanning(2))
rk_twoscale Realized Kernel Variance (Two-Scale/Bartlett)
rsv Realized Semi-variance (5-min)
rsv_ss Realized Semi-variance (5-min Sub-sampled)
rv10 Realized Variance (10-min)
rv10_ss Realized Variance (10-min Sub-sampled)
rv5 Realized Variance (5-min)
rv5_ss Realized Variance (5-min Sub-sampled)

Citation

Researchers may use this library freely without restrictions so long as they quote in any work which uses it:

Heber, Gerd, Asger Lunde, Neil Shephard and Kevin Sheppard (2009) “Oxford-Man Institute’s realized library”, Oxford-Man Institute, University of Oxford

Or bibtex entry :

@misc{Heber:2009,
author = "Heber, Gerd, Asger Lunde, Neil Shephard and Kevin Sheppard",
year = "2009",
title = "Oxford-Man Institute’s realized library",
url = "https://realized.oxford-man.ox.ac.uk/",
institution = "Oxford-Man Institute, University of Oxford" }