Link Search Menu Expand Document

LogisticRegression class Github code

Create Logistic Regression model object


Syntax

Mdl = LogisticRegression(NumFeatures,Name,Value)

Description

mdl = LogisticRegression(NumFeatures,Name,Value) returns LogisticRegression model object Mdl given number of model parameters NumFeatures. Name and Value specify additional options using one or more name-value pair arguments. For example, users can specify different choices of priors for model parameters.

See: Input Arguments, Output Argument, Examples


Input Arguments

NumFeatures - Number of model parameters

Data type: positive integer


Number of model parameters, which is also the number of data features or covariates or independent variables.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Prior',{'Normal',[0,10]},'CutOff',0.6 specifies that the prior distribution of model parameters is a normal distribution with $0$ mean and $10$ variance, and the cutoff probability is $0.06$.

'Prior' - Prior distribution of model parameters

Data Type: Cell Array


Prior distribution of the model parameters, specified as the comma-separated pair consisting of 'Prior' and a cell array of distribution name and distribution parameters. Distribution parameters are specified as an array.

Prior Name Description
'Normal' Normal distribution $\mathcal{N}(\mu,\sigma^2)$ (default)
'Gamma' Gamma distribution $\Gamma(\alpha,\beta)$
'Inverse-Gamma' Gamma distribution $\Gamma^{-1}(\alpha,\beta)$

Default: {'Normal',[0,1]}

Example: 'Prior',{'Normal',[0,10]}

'CutOff' - Cut-off probability

Data Type: Float


The cut-off probability for class assignment. Must be between 0 and 1.

Default: 0.5

Example: 'CutOff',0.6

'Description' - Model description

Data Type: string


Model description, specified as a string scalar or character vector. Provide additional information about the model.

Default: Empty string

Example: 'Description','Logistic Regression model with gamma prior'


Output Arguments

Mdl - LogisticRegression Object

Data type: LogisticRegression Object


LogisticRegression is an object of the LogisticRegression class with pre-defined properties and functions.

Object Properties

The LogisticRegression object properties include information about model-specific information, coefficient estimates and fitting method.

Properties Data type Description{: .text-center}
ModelName string (r) Name of the model, which is 'LogisticRegression'
NumParams integer (+) Number of model parameters
Cutoff float Cut-off probabitlity
Post * struct Information about the fittedd method used to estimate model paramters
Coefficient * cell array (r) • Estimated Mean of model parameters
• Used to doing point estimation for new test data
CoefficientVar * cell array (r) Variance of coefficient estimates
LogLikelihood * double (r) Loglikelihood of the fitted model.
FittedValue * array (r) Fitted probability

Notation:

  • * $\rightarrow$ object properties which are only available when the model is fitted. Default value is None.
  • (+) $\rightarrow$ positive number.
  • (r) $\rightarrow$ read-only properties.

Object Methods

Use the object methods to initialize model parameters, predict responses, and to visualize the prediction.

vbayesInit Initialization method of model parameters
vbayesPredict Predict responses of fitted DeepGLM models

Examples

Fit a LogisticRegression model for binary response Github code

Fit a LogisticRegression model to LabourForce data using CGVB

Load the LabourForce data using the readData() function. The data is a matrix with the last column being the response variable. Set the 'Intercept' argument to be true to add a column of 1 to the data matrix as intercepts.

% Load the LabourForce dataset
labour = readData('LabourForce',...    % Dataset name
                  'Type','Matrix',...  % Store data as a 2D array (default)
                  'Intercept',true);   % Add column of intercept (default)

Create a LogisticRegression model object by specifying the number of parameters as the input argument. Change the variance of the normal prior to $10$.

% Number of input features
n_features = size(labour,2)-1;
% Define a LogisticRegression model object
Mdl = LogisticRegression(n_features,...
                         'Prior',{'Normal',[0,10]});

Run CGVB to obtain VB approximation of the posterior distribution of model parameters.

%% Run Cholesky GVB with random initialization
Post_CGVB = CGVB(Mdl,labour,...
                'LearningRate',0.002,...  % Learning rate
                'NumSample',50,...        % Number of samples to estimate gradient of lowerbound
                'MaxPatience',20,...      % For Early stopping
                'MaxIter',5000,...        % Maximum number of iterations
                'GradWeight1',0.9,...     % Momentum weight 1
                'GradWeight2',0.9,...     % Momentum weight 2
                'WindowSize',10,...       % Smoothing window for lowerbound
                'StepAdaptive',500,...    % For adaptive learning rate
                'GradientMax',10,...      % For gradient clipping    
                'LBPlot',false);          % Dont plot the lowerbound when finish

Given the estimation results, we can plot the variational distribution together with the lowerbound to check the performance of the CGVB algorithm.

% Plot variational distributions and lowerbound 
figure
% Extract variation mean and variance
mu_vb     = Post_CGVB.Post.mu;
sigma2_vb = Post_CGVB.Post.sigma2;

% Plot the variational distribution for the first 8 parameters
for i=1:n_features
    subplot(3,3,i)
    vbayesPlot('Density',{'Normal',[mu_vb(i),sigma2_vb(i)]})
    grid on
    title(['\theta_',num2str(i)])
    set(gca,'FontSize',15)
end

% Plot the smoothed lower bound
subplot(3,3,9)
plot(Post_CGVB.Post.LB_smooth,'LineWidth',2)
grid on
title('Lower bound')
set(gca,'FontSize',15)

The plot of lowerbound shows that the CGVB algorithm works properly.


Reference

[1] Tran, M.-N., Nguyen, T.-N., Nott, D., and Kohn, R. (2020). Bayesian deep net GLM and GLMM. Journal of Computational and Graphical Statistics, 29(1):97-113. Read the paper


See Also

DeepGLM $\mid$ rech $\mid$ Custom model $\mid$ CGVB