Negative logs

#Negative logs update

The results of the simulation are that model V2 neurons are acquired that capture some of the biologically observed response properties of V2 neurons in the monkey ( Hegde and Van Essen, 2000).

#Negative logs update

Update parameters using gradient descent on the sparsity term. Update parameters using contrastive divergence as explained in previous section. The training algorithm for the RBM repeats the two steps below until convergence. However, training of the RBM must take into account the sparsity term. Training proceeds layer by layer as with the standard DBN. Training finds parameter values w i, j, c i, and b j to minimize the cost. The first component of the cost function is the negative log likelihood which can be optimized using the contrastive divergence approximation and the second component is a sparsity regularization term which can be optimized using gradient descent.

In practice, it turns out that using the logistic regression is, in general, a safer bet compared to the linear discriminant analysis (LDA). This is natural, because more information concerning the distribution of the data is exploited. Of course, assuming that the Gaussian assumption is valid, if one can obtain good estimates of the covariance matrix, employing this extra information can lead to more efficient estimates, in the sense of lower variance. That is, once we know about the linear dependence of the log ratio on x, we can use this a priori information to simplify the model. The logistic regression formulation only involves l + 1 parameters. In the latter formulation, the covariance matrix has to be estimated, amounting to O ( l 2 / 2 ) parameters. Moreover, even if the data are distributed according to Gaussians, it may still be preferable to adopt the logistic regression formulation instead of that in (7.36). Thus, in logistic regression, all we do is to go one step ahead and adopt such a linear model, irrespective of the underlying data distributions. In other words, when the distributions that describe the data are Gaussians with a common covariance matrix, then the log ratio of the posteriors is a linear function.

Where “constants” refers to all terms that do not depend on x.