DL Topology Example
prompt1: I am having trouble understanding the transformation done by each neuron in the deep learning layers and associated dimensions. Assume there are 5 features and there are couple of hidden layers with different number of neurons and the goal is for classification into 3 classes. I undersand there is Weight matrix associated with each neuron and there is activation function as well. Assume appropriate ReLU, sigmoid or similar activation functions at right places. And consider the first neuron in the first hidden layer. For each input row of features that it processes, what does it emit ? Also consider the first neuron in the second hidden layer, what is the dimension of the input for that neuron and what is the dimension that it emits ? Does each neuron contribute to generating single number that gets fed into all neurons in the next layer in that fixed position ? Explain with a simple working pytorch code.
promp2: I am confused with the dimensions of deep learning layers, activation
functions, logit values
and softmax
functions. Just give a trivial example
classification program using pytorch which classifies given input of features
into 3 categories output. Use a dimension of hidden layers more than 3 for
better clarification of dimensions.
Certainly! I'll provide a simple example of a classification program using PyTorch that classifies input features into 3 categories. This example will help clarify the dimensions of deep learning layers, activation functions, logit values, and the softmax function. Here's a trivial example using PyTorch:
import torch
import torch.nn as nn
import torch.optim as optim
# Define the neural network
class SimpleClassifier(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super(SimpleClassifier, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU() # Rectified Linear Unit
self.fc2 = nn.Linear(hidden_size, num_classes)
def forward(self, x):
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
return x
# Set dimensions
input_size = 5
hidden_size = 10
num_classes = 3
# Create the model
model = SimpleClassifier(input_size, hidden_size, num_classes)
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
# Generate some dummy data
X = torch.randn(100, input_size)
y = torch.randint(0, num_classes, (100,))
# Training loop
for epoch in range(100):
# Forward pass
outputs = model(X)
# Compute loss
loss = criterion(outputs, y)
# Backward pass and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch + 1) % 10 == 0:
print(f'Epoch [{epoch+1}/100], Loss: {loss.item():.4f}')
# Test the model
with torch.no_grad():
test_input = torch.randn(1, input_size)
output = model(test_input)
_, predicted = torch.max(output, 1)
print(f"Predicted class: {predicted.item()}")
# Print model architecture
print(model)
# Print shapes of intermediate outputs
x = torch.randn(1, input_size)
fc1_out = model.fc1(x)
relu_out = model.relu(fc1_out)
fc2_out = model.fc2(relu_out)
print(f"Input shape: {x.shape}")
print(f"FC1 output shape: {fc1_out.shape}")
print(f"ReLU output shape: {relu_out.shape}")
print(f"FC2 output shape (logits): {fc2_out.shape}")
# Apply softmax to get probabilities
softmax = nn.Softmax(dim=1)
probabilities = softmax(fc2_out)
print(f"Softmax output shape (probabilities): {probabilities.shape}")
print(f"Probabilities: {probabilities}")
Let's break down the dimensions and concepts:
Input layer:
First fully connected layer (fc1):
ReLU activation: - Input and output dimensions: (1, 10) - Applies element-wise ReLU function, keeping the shape unchanged
Second fully connected layer (fc2): - Input dimension: (1, 10) - Output dimension: (1, num_classes) = (1, 3) - This layer produces the logits for each class
Logits: - The output of the last layer (fc2) before applying softmax - Dimension: (1, 3) - These are unnormalized scores for each class
Softmax function: - Input: logits with dimension (1, 3) - Output: probabilities with dimension (1, 3) - Converts logits to probabilities that sum to 1
This example demonstrates how the dimensions change through the network and how the final output is transformed into class probabilities using the softmax function. The CrossEntropyLoss combines the softmax operation and the negative log-likelihood loss, so we don't need to explicitly apply softmax during training.