DL Topology Example

Synopsis

  • Each Neuron produces single output and forwards to each neuron in next layer.
  • Each Neuron uses N weights where N is the total neurons in the previous layer.
  • First Layer is usually Embedding Layer.
  • Number of neurons in Embedding Layer is the Total Vocabulary Size. ??? (Clarify: ToDO)
  • How batch size, input words determine the dimensions. (Clarify)

Prompt: Explain the topology of embedding and hidden layers

prompt1: I am having trouble understanding the transformation done by each neuron in the deep learning layers and associated dimensions. Assume there are 5 features and there are couple of hidden layers with different number of neurons and the goal is for classification into 3 classes. I undersand there is Weight matrix associated with each neuron and there is activation function as well. Assume appropriate ReLU, sigmoid or similar activation functions at right places. And consider the first neuron in the first hidden layer. For each input row of features that it processes, what does it emit ? Also consider the first neuron in the second hidden layer, what is the dimension of the input for that neuron and what is the dimension that it emits ? Does each neuron contribute to generating single number that gets fed into all neurons in the next layer in that fixed position ? Explain with a simple working pytorch code.

promp2: I am confused with the dimensions of deep learning layers, activation functions, logit values and softmax functions. Just give a trivial example classification program using pytorch which classifies given input of features into 3 categories output. Use a dimension of hidden layers more than 3 for better clarification of dimensions.

Sonnet Explanation1

Certainly! I'll provide a simple example of a classification program using PyTorch that classifies input features into 3 categories. This example will help clarify the dimensions of deep learning layers, activation functions, logit values, and the softmax function. Here's a trivial example using PyTorch:


import torch
import torch.nn as nn
import torch.optim as optim

# Define the neural network
class SimpleClassifier(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(SimpleClassifier, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()  # Rectified Linear Unit
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Set dimensions
input_size = 5
hidden_size = 10
num_classes = 3

# Create the model
model = SimpleClassifier(input_size, hidden_size, num_classes)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Generate some dummy data
X = torch.randn(100, input_size)
y = torch.randint(0, num_classes, (100,))

# Training loop
for epoch in range(100):
    # Forward pass
    outputs = model(X)

    # Compute loss
    loss = criterion(outputs, y)

    # Backward pass and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch+1}/100], Loss: {loss.item():.4f}')

# Test the model
with torch.no_grad():
    test_input = torch.randn(1, input_size)
    output = model(test_input)
    _, predicted = torch.max(output, 1)
    print(f"Predicted class: {predicted.item()}")

# Print model architecture
print(model)

# Print shapes of intermediate outputs
x = torch.randn(1, input_size)
fc1_out = model.fc1(x)
relu_out = model.relu(fc1_out)
fc2_out = model.fc2(relu_out)

print(f"Input shape: {x.shape}")
print(f"FC1 output shape: {fc1_out.shape}")
print(f"ReLU output shape: {relu_out.shape}")
print(f"FC2 output shape (logits): {fc2_out.shape}")

# Apply softmax to get probabilities
softmax = nn.Softmax(dim=1)
probabilities = softmax(fc2_out)
print(f"Softmax output shape (probabilities): {probabilities.shape}")
print(f"Probabilities: {probabilities}")

Let's break down the dimensions and concepts:

  • Input layer:

    • Dimension: (batch_size, input_size) = (1, 5)
    • Each input sample has 5 features
  • First fully connected layer (fc1):

    • Input dimension: (1, 5)
    • Output dimension: (1, hidden_size) = (1, 10)
    • This layer transforms the input from 5 features to 10 features
  • ReLU activation: - Input and output dimensions: (1, 10) - Applies element-wise ReLU function, keeping the shape unchanged

  • Second fully connected layer (fc2): - Input dimension: (1, 10) - Output dimension: (1, num_classes) = (1, 3) - This layer produces the logits for each class

  • Logits: - The output of the last layer (fc2) before applying softmax - Dimension: (1, 3) - These are unnormalized scores for each class

  • Softmax function: - Input: logits with dimension (1, 3) - Output: probabilities with dimension (1, 3) - Converts logits to probabilities that sum to 1

This example demonstrates how the dimensions change through the network and how the final output is transformed into class probabilities using the softmax function. The CrossEntropyLoss combines the softmax operation and the negative log-likelihood loss, so we don't need to explicitly apply softmax during training.