Binary sentiment classification using neural network (IMDB)
Project description
Input: 25000 highly polar movie reviews. The first 10000 reviews are kept as the development set while the rest is used for training set. There is an additional test set of 25000 reviews.
Goal: Classifying a new review as positive or negative.
Network model: It is shown in the following figure:
Details: Optimization of stochastic gradient descent algorithm is done using RMSProp. Accuracy is the performance metric and binary cross-entropy is the loss function.
The resulting loss and accuracy are shown in the following figures.
Modification:
-
To combat the overfitting problem, L2 regularization is used with weight coefficient \(lambda=0.001\). The entire 25000 reviews are used for training and then the resulting trained model is evaluated on the test set of 25000 reviews. The accuracy for the test set is 0.92 after 4 epochs.
-
As another approach to overcome overfitting problem, dropout regularization is used. The training, development and test tests are defined in the same way as original problem. The probability that each unit is kept is equal to 0.5.
The source code of this project is available in my Github page.