Summary of "Lec 44 Practical session - 2"

This lecture covers the practical aspects of designing and training a neural network, focusing on optimizers, data preprocessing, model building, training, evaluation, and fine-tuning, specifically using a multi-layer perceptron (MLP) for a binary classification problem. The session transitions from theory to hands-on coding in Google Colab.

Main Ideas and Concepts

1. Optimizers in Neural Networks

Purpose: Optimizers reduce the loss by updating the weights during training.
Common Optimizers Discussed:
- Adam (Adaptive Moment Estimation): Most commonly used, adaptable learning rate.
- Stochastic Gradient Descent (SGD): Can be used with or without momentum.
- RMSProp (Root Mean Square Propagation)
- Adagrad
- Adadelta
- Adamax: Variant of Adam.
- Gradient Descent Variants: Batch Gradient Descent, Mini-batch Gradient Descent, and Stochastic Gradient Descent.
Batch Size Impact:
- Batch Gradient Descent: Batch size = entire training set.
- Mini-batch Gradient Descent: Batch size = subset (e.g., 32 samples).
- SGD: Batch size = 1 sample.
Optimizer is a hyperparameter and should be chosen/tuned based on the problem.

2. Neural Network Design for Different Problems

Regression Problem:
- Output layer neurons = 1.
- Activation function = Linear.
- Loss functions = Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE).
- Metrics = MSE, MAE, etc.
Classification Problem:
- Binary Classification:
  - Output neurons = 1.
  - Activation function = Sigmoid.
  - Loss function = Binary Cross-Entropy.
  - Metric = Accuracy.
- Multiclass Classification:
  - Output neurons = number of classes.
  - Activation function = Softmax.
  - Loss function = Categorical Cross-Entropy or Sparse Categorical Cross-Entropy (depending on label encoding).
Hidden Layers:
- No fixed restriction on number of neurons or layers.
- Activation function typically ReLU (Rectified Linear Unit), with variants like Leaky ReLU or Parametric ReLU.

3. Data Preprocessing

Reading Data:
- Using pandas read_csv for CSV files.
- For Excel files, use read_excel.
Separating Features and Labels:
- Features assigned to X by dropping label column.
- Labels assigned to Y.
Feature Scaling:
- Important to scale features to avoid bias due to differing value ranges.
- MinMaxScaler: Scales features to range [0, 1] using formula: xscaled = (x - xmin) / (xmax - xmin)
- StandardScaler: Another scaling method (mean=0, variance=1).
- Fit-transform approach: first compute scaling parameters (fit), then apply transformation.
Train-Test Split:
- Using train_test_split from sklearn.
- Typical split: 80% training, 20% testing (can vary).
- Use random_state for reproducibility.

4. Building the Neural Network Model

Framework: Keras (part of TensorFlow).
Model Type: Sequential model with fully connected (Dense) layers.
Input Layer: Input dimension = number of features (e.g., 49).
Hidden Layers: Example given with two hidden layers, each with 32 neurons and ReLU activation.
Output Layer: One neuron with sigmoid activation for binary classification.
Compiling Model:
- Specify optimizer (e.g., Adam).
- Specify loss function (binary cross-entropy for binary classification).
- Specify metrics (accuracy).
Training Model:
- Use model.fit() with training features and labels.
- Specify number of epochs (e.g., 15).
- Specify batch size (e.g., 32).
- Batch size controls how many samples are processed before updating weights.

5. Model Evaluation and Prediction

Prediction:
- Use model.predict() on unseen test data (X_test).
- Output values are probabilities between 0 and 1 (due to sigmoid).
- Threshold at 0.5 to convert probabilities to class labels (0 or 1).
Visualizing Predictions:
- Use np.column_stack to compare predicted labels and true labels side by side.