Summary of "Lec 44 Practical session - 2"
Summary of "Lec 44 Practical session - 2"
This lecture covers the practical aspects of designing and training a neural network, focusing on optimizers, data preprocessing, model building, training, evaluation, and fine-tuning, specifically using a multi-layer perceptron (MLP) for a binary classification problem. The session transitions from theory to hands-on coding in Google Colab.
Main Ideas and Concepts
1. Optimizers in Neural Networks
- Purpose: Optimizers reduce the loss by updating the weights during training.
- Common Optimizers Discussed:
- Adam (Adaptive Moment Estimation): Most commonly used, adaptable learning rate.
- Stochastic Gradient Descent (SGD): Can be used with or without momentum.
- RMSProp (Root Mean Square Propagation)
- Adagrad
- Adadelta
- Adamax: Variant of Adam.
- Gradient Descent Variants: Batch Gradient Descent, Mini-batch Gradient Descent, and Stochastic Gradient Descent.
- Batch Size Impact:
- Batch Gradient Descent: Batch size = entire training set.
- Mini-batch Gradient Descent: Batch size = subset (e.g., 32 samples).
- SGD: Batch size = 1 sample.
- Optimizer is a hyperparameter and should be chosen/tuned based on the problem.
2. Neural Network Design for Different Problems
- Regression Problem:
- Output layer neurons = 1.
- Activation function = Linear.
- Loss functions = Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE).
- Metrics = MSE, MAE, etc.
- Classification Problem:
- Binary Classification:
- Output neurons = 1.
- Activation function = Sigmoid.
- Loss function = Binary Cross-Entropy.
- Metric = Accuracy.
- Multiclass Classification:
- Output neurons = number of classes.
- Activation function = Softmax.
- Loss function = Categorical Cross-Entropy or Sparse Categorical Cross-Entropy (depending on label encoding).
- Binary Classification:
- Hidden Layers:
- No fixed restriction on number of neurons or layers.
- Activation function typically ReLU (Rectified Linear Unit), with variants like Leaky ReLU or Parametric ReLU.
3. Data Preprocessing
- Reading Data:
- Using pandas
read_csvfor CSV files. - For Excel files, use
read_excel.
- Using pandas
- Separating Features and Labels:
- Features assigned to
Xby dropping label column. - Labels assigned to
Y.
- Features assigned to
- Feature Scaling:
- Important to scale features to avoid bias due to differing value ranges.
- MinMaxScaler: Scales features to range [0, 1] using formula:
xscaled = (x - xmin) / (xmax - xmin) - StandardScaler: Another scaling method (mean=0, variance=1).
- Fit-transform approach: first compute scaling parameters (fit), then apply transformation.
- Train-Test Split:
- Using
train_test_splitfrom sklearn. - Typical split: 80% training, 20% testing (can vary).
- Use
random_statefor reproducibility.
- Using
4. Building the Neural Network Model
- Framework: Keras (part of TensorFlow).
- Model Type: Sequential model with fully connected (Dense) layers.
- Input Layer: Input dimension = number of features (e.g., 49).
- Hidden Layers: Example given with two hidden layers, each with 32 neurons and ReLU activation.
- Output Layer: One neuron with sigmoid activation for binary classification.
- Compiling Model:
- Specify optimizer (e.g., Adam).
- Specify loss function (binary cross-entropy for binary classification).
- Specify metrics (accuracy).
- Training Model:
- Use
model.fit()with training features and labels. - Specify number of epochs (e.g., 15).
- Specify batch size (e.g., 32).
- Batch size controls how many samples are processed before updating weights.
- Use
5. Model Evaluation and Prediction
- Prediction:
- Use
model.predict()on unseen test data (X_test). - Output values are probabilities between 0 and 1 (due to sigmoid).
- Threshold at 0.5 to convert probabilities to class labels (0 or 1).
- Use
- Visualizing Predictions:
- Use
np.column_stackto compare predicted labels and true labels side by side.
- Use
Category
Educational