# Introduction to Neural Networks
## Lesson 4: Building Your First Neural Network

**Evolve AI Institute**

In this notebook, you'll learn to build and train a neural network from scratch to classify iris flowers.

### Learning Goals:
- Understand neural network architecture
- Implement forward and backward propagation
- Train a model and evaluate its performance
- Experiment with hyperparameters

## Part 1: Setup and Imports

First, let's import the libraries we'll need.

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns

# Set random seed for reproducibility
np.random.seed(42)

print("Libraries imported successfully!")

## Part 2: Load and Explore the Data

We'll use the Iris dataset - a classic machine learning dataset with 3 species of iris flowers.

In [None]:
# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features: sepal length, sepal width, petal length, petal width
y = iris.target  # Labels: 0=setosa, 1=versicolor, 2=virginica

print(f"Dataset shape: {X.shape}")
print(f"Number of samples: {X.shape[0]}")
print(f"Number of features: {X.shape[1]}")
print(f"Number of classes: {len(np.unique(y))}")
print(f"\nClass distribution: {np.bincount(y)}")
print(f"\nFeature names: {iris.feature_names}")
print(f"Target names: {iris.target_names}")

In [None]:
# Visualize the data
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
fig.suptitle('Iris Dataset Features', fontsize=16)

for idx, feature_idx in enumerate(range(4)):
    ax = axes[idx // 2, idx % 2]
    for target in range(3):
        ax.scatter(X[y == target, feature_idx], 
                  X[y == target, (feature_idx + 1) % 4],
                  label=iris.target_names[target],
                  alpha=0.6)
    ax.set_xlabel(iris.feature_names[feature_idx])
    ax.set_ylabel(iris.feature_names[(feature_idx + 1) % 4])
    ax.legend()

plt.tight_layout()
plt.show()

## Part 3: Data Preprocessing

Neural networks work best with normalized data and one-hot encoded labels.

In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training set size: {X_train.shape[0]}")
print(f"Testing set size: {X_test.shape[0]}")

In [None]:
# Standardize the features (mean=0, std=1)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("Feature scaling applied.")
print(f"Train mean: {X_train_scaled.mean(axis=0)}")
print(f"Train std: {X_train_scaled.std(axis=0)}")

In [None]:
# One-hot encode the labels
encoder = OneHotEncoder(sparse_output=False)
y_train_encoded = encoder.fit_transform(y_train.reshape(-1, 1))
y_test_encoded = encoder.transform(y_test.reshape(-1, 1))

print(f"Original label shape: {y_train.shape}")
print(f"Encoded label shape: {y_train_encoded.shape}")
print(f"\nExample encoding:")
print(f"Original: {y_train[:3]}")
print(f"Encoded:\n{y_train_encoded[:3]}")

## Part 4: Neural Network Architecture

### Student Exercise 1: Design the Network

**TODO:** Fill in the blanks below to define your network architecture.

Questions to consider:
- How many input neurons do we need? (Hint: number of features)
- How many output neurons? (Hint: number of classes)
- How many hidden layers? How many neurons in each?

In [None]:
# Define network architecture
# TODO: Complete these values
input_size = ___  # Number of input features
hidden_size = ___  # Number of neurons in hidden layer (try 10, 20, or 50)
output_size = ___  # Number of output classes
learning_rate = ___  # Learning rate (try 0.01, 0.1, or 0.001)
epochs = ___  # Number of training epochs (try 100, 500, or 1000)

print(f"Network Architecture:")
print(f"Input Layer: {input_size} neurons")
print(f"Hidden Layer: {hidden_size} neurons")
print(f"Output Layer: {output_size} neurons")
print(f"\nTraining Parameters:")
print(f"Learning Rate: {learning_rate}")
print(f"Epochs: {epochs}")

## Part 5: Activation Functions

Activation functions introduce non-linearity into the network.

In [None]:
def sigmoid(x):
    """Sigmoid activation function: maps values to (0, 1)"""
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))  # Clip to prevent overflow

def sigmoid_derivative(x):
    """Derivative of sigmoid function"""
    return x * (1 - x)

def relu(x):
    """ReLU activation function: max(0, x)"""
    return np.maximum(0, x)

def relu_derivative(x):
    """Derivative of ReLU function"""
    return (x > 0).astype(float)

def softmax(x):
    """Softmax activation: converts logits to probabilities"""
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))  # Numerical stability
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)

# Visualize activation functions
x = np.linspace(-5, 5, 100)
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

axes[0].plot(x, sigmoid(x), label='Sigmoid')
axes[0].plot(x, relu(x), label='ReLU')
axes[0].set_title('Activation Functions')
axes[0].set_xlabel('Input')
axes[0].set_ylabel('Output')
axes[0].legend()
axes[0].grid(True)

axes[1].plot(x, sigmoid_derivative(sigmoid(x)), label='Sigmoid Derivative')
axes[1].plot(x, relu_derivative(x), label='ReLU Derivative')
axes[1].set_title('Activation Function Derivatives')
axes[1].set_xlabel('Input')
axes[1].set_ylabel('Derivative')
axes[1].legend()
axes[1].grid(True)

plt.tight_layout()
plt.show()

## Part 6: Initialize Network Parameters

We need to initialize weights and biases for our network.

In [None]:
# Initialize weights and biases with random values
# Weights: connections between layers
# Biases: threshold adjustments for each neuron

# Input to Hidden layer
W1 = np.random.randn(input_size, hidden_size) * 0.01
b1 = np.zeros((1, hidden_size))

# Hidden to Output layer
W2 = np.random.randn(hidden_size, output_size) * 0.01
b2 = np.zeros((1, output_size))

print("Network parameters initialized!")
print(f"W1 shape: {W1.shape} (Input -> Hidden)")
print(f"b1 shape: {b1.shape}")
print(f"W2 shape: {W2.shape} (Hidden -> Output)")
print(f"b2 shape: {b2.shape}")
print(f"\nTotal trainable parameters: {W1.size + b1.size + W2.size + b2.size}")

## Part 7: Forward Propagation

Forward propagation: passing input through the network to get predictions.

In [None]:
def forward_propagation(X, W1, b1, W2, b2):
    """
    Forward pass through the network
    
    Args:
        X: Input data
        W1, b1: First layer weights and biases
        W2, b2: Second layer weights and biases
    
    Returns:
        Dictionary containing intermediate values and predictions
    """
    # Hidden layer
    z1 = np.dot(X, W1) + b1  # Linear combination
    a1 = relu(z1)  # Activation
    
    # Output layer
    z2 = np.dot(a1, W2) + b2  # Linear combination
    a2 = softmax(z2)  # Softmax for probabilities
    
    # Store values for backpropagation
    cache = {
        'z1': z1,
        'a1': a1,
        'z2': z2,
        'a2': a2
    }
    
    return cache

# Test forward propagation
test_cache = forward_propagation(X_train_scaled[:5], W1, b1, W2, b2)
print("Forward propagation test:")
print(f"Hidden layer output shape: {test_cache['a1'].shape}")
print(f"Output probabilities shape: {test_cache['a2'].shape}")
print(f"\nExample predictions (first 3 samples):")
print(test_cache['a2'][:3])
print(f"\nPredicted classes: {np.argmax(test_cache['a2'][:3], axis=1)}")
print(f"True classes: {y_train[:3]}")

## Part 8: Loss Function

We use cross-entropy loss to measure prediction error.

In [None]:
def cross_entropy_loss(predictions, targets):
    """
    Calculate cross-entropy loss
    
    Args:
        predictions: Predicted probabilities
        targets: True labels (one-hot encoded)
    
    Returns:
        Average loss across all samples
    """
    m = targets.shape[0]
    # Add small epsilon to prevent log(0)
    loss = -np.sum(targets * np.log(predictions + 1e-8)) / m
    return loss

# Test loss calculation
test_loss = cross_entropy_loss(test_cache['a2'], y_train_encoded[:5])
print(f"Example loss (random initialization): {test_loss:.4f}")
print(f"\nNote: Loss should decrease as we train the network!")

## Part 9: Backward Propagation

Backpropagation: calculating gradients to update weights.

In [None]:
def backward_propagation(X, y_true, cache, W1, W2):
    """
    Backward pass to calculate gradients
    
    Args:
        X: Input data
        y_true: True labels (one-hot encoded)
        cache: Dictionary from forward propagation
        W1, W2: Current weights
    
    Returns:
        Dictionary containing gradients for all parameters
    """
    m = X.shape[0]
    
    # Output layer gradients
    dz2 = cache['a2'] - y_true  # Derivative of softmax + cross-entropy
    dW2 = np.dot(cache['a1'].T, dz2) / m
    db2 = np.sum(dz2, axis=0, keepdims=True) / m
    
    # Hidden layer gradients
    da1 = np.dot(dz2, W2.T)
    dz1 = da1 * relu_derivative(cache['z1'])  # Element-wise multiplication
    dW1 = np.dot(X.T, dz1) / m
    db1 = np.sum(dz1, axis=0, keepdims=True) / m
    
    gradients = {
        'dW1': dW1,
        'db1': db1,
        'dW2': dW2,
        'db2': db2
    }
    
    return gradients

# Test backward propagation
test_grads = backward_propagation(X_train_scaled[:5], y_train_encoded[:5], 
                                  test_cache, W1, W2)
print("Backward propagation test:")
print(f"dW1 shape: {test_grads['dW1'].shape}")
print(f"dW2 shape: {test_grads['dW2'].shape}")
print(f"\nGradients calculated successfully!")

## Part 10: Training the Network

Now let's put it all together and train our network!

In [None]:
def train_network(X_train, y_train, X_test, y_test, 
                  W1, b1, W2, b2, learning_rate, epochs):
    """
    Train the neural network
    
    Returns:
        Updated weights, biases, and training history
    """
    train_losses = []
    test_losses = []
    train_accuracies = []
    test_accuracies = []
    
    for epoch in range(epochs):
        # Forward propagation
        train_cache = forward_propagation(X_train, W1, b1, W2, b2)
        
        # Calculate loss
        train_loss = cross_entropy_loss(train_cache['a2'], y_train)
        train_losses.append(train_loss)
        
        # Calculate accuracy
        train_predictions = np.argmax(train_cache['a2'], axis=1)
        train_true = np.argmax(y_train, axis=1)
        train_accuracy = accuracy_score(train_true, train_predictions)
        train_accuracies.append(train_accuracy)
        
        # Backward propagation
        gradients = backward_propagation(X_train, y_train, train_cache, W1, W2)
        
        # Update parameters (gradient descent)
        W1 -= learning_rate * gradients['dW1']
        b1 -= learning_rate * gradients['db1']
        W2 -= learning_rate * gradients['dW2']
        b2 -= learning_rate * gradients['db2']
        
        # Evaluate on test set
        test_cache = forward_propagation(X_test, W1, b1, W2, b2)
        test_loss = cross_entropy_loss(test_cache['a2'], y_test)
        test_losses.append(test_loss)
        
        test_predictions = np.argmax(test_cache['a2'], axis=1)
        test_true = np.argmax(y_test, axis=1)
        test_accuracy = accuracy_score(test_true, test_predictions)
        test_accuracies.append(test_accuracy)
        
        # Print progress every 100 epochs
        if (epoch + 1) % 100 == 0 or epoch == 0:
            print(f"Epoch {epoch + 1}/{epochs}")
            print(f"  Train Loss: {train_loss:.4f} | Train Acc: {train_accuracy:.4f}")
            print(f"  Test Loss: {test_loss:.4f} | Test Acc: {test_accuracy:.4f}")
            print()
    
    history = {
        'train_losses': train_losses,
        'test_losses': test_losses,
        'train_accuracies': train_accuracies,
        'test_accuracies': test_accuracies
    }
    
    return W1, b1, W2, b2, history

# Train the network!
print("Starting training...\n")
W1_trained, b1_trained, W2_trained, b2_trained, history = train_network(
    X_train_scaled, y_train_encoded,
    X_test_scaled, y_test_encoded,
    W1, b1, W2, b2,
    learning_rate, epochs
)
print("Training complete!")

## Part 11: Visualize Training Progress

In [None]:
# Plot training history
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss plot
axes[0].plot(history['train_losses'], label='Training Loss', linewidth=2)
axes[0].plot(history['test_losses'], label='Testing Loss', linewidth=2)
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].set_title('Training and Testing Loss Over Time')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Accuracy plot
axes[1].plot(history['train_accuracies'], label='Training Accuracy', linewidth=2)
axes[1].plot(history['test_accuracies'], label='Testing Accuracy', linewidth=2)
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')
axes[1].set_title('Training and Testing Accuracy Over Time')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Final Training Accuracy: {history['train_accuracies'][-1]:.4f}")
print(f"Final Testing Accuracy: {history['test_accuracies'][-1]:.4f}")

## Part 12: Evaluate Model Performance

In [None]:
# Make predictions on test set
test_cache = forward_propagation(X_test_scaled, W1_trained, b1_trained, 
                                 W2_trained, b2_trained)
test_predictions = np.argmax(test_cache['a2'], axis=1)
test_true = np.argmax(y_test_encoded, axis=1)

# Confusion matrix
cm = confusion_matrix(test_true, test_predictions)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=iris.target_names,
            yticklabels=iris.target_names)
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

# Print classification report
from sklearn.metrics import classification_report
print("Classification Report:")
print(classification_report(test_true, test_predictions, 
                          target_names=iris.target_names))

## Part 13: Student Experiments

### Exercise 2: Experiment with Hyperparameters

Try modifying these parameters and observe the results:

1. **Number of hidden neurons**: Try 5, 10, 20, 50, 100
2. **Learning rate**: Try 0.001, 0.01, 0.1, 1.0
3. **Number of epochs**: Try 100, 500, 1000, 2000
4. **Activation function**: Change ReLU to sigmoid in hidden layer

Record your observations:
- Which settings give the best accuracy?
- Which settings train fastest?
- Do you see any overfitting (train accuracy much higher than test)?

In [None]:
# Experiment area - modify and re-run Part 4 through Part 12
# Copy the code cells above and modify parameters here

# Example experiment:
experiment_hidden_size = 20
experiment_learning_rate = 0.01
experiment_epochs = 500

# Re-initialize and train with new parameters
# (Code here)

## Part 14: Challenge - MNIST Digits

For advanced students: Apply your network to the MNIST handwritten digits dataset!

In [None]:
# Load MNIST dataset (this will download if not already present)
from sklearn.datasets import load_digits

digits = load_digits()
X_digits = digits.data
y_digits = digits.target

print(f"MNIST digits dataset shape: {X_digits.shape}")
print(f"Number of classes: {len(np.unique(y_digits))}")

# Visualize some digits
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for i, ax in enumerate(axes.flat):
    ax.imshow(digits.images[i], cmap='gray')
    ax.set_title(f'Label: {y_digits[i]}')
    ax.axis('off')
plt.tight_layout()
plt.show()

# Challenge: Modify the network architecture and training code
# to work with this larger dataset!
# Hints:
# - Input size = 64 (8x8 pixels)
# - Output size = 10 (digits 0-9)
# - You may need more hidden neurons
# - Training might take longer

## Reflection Questions

Answer these questions in your lab notebook:

1. **Architecture**: How does the number of hidden neurons affect model performance? Is more always better?

2. **Learning Rate**: What happens if the learning rate is too high? Too low? How did you find a good value?

3. **Training**: Describe what happens during one training epoch. Use the terms: forward propagation, loss calculation, backward propagation, gradient descent.

4. **Overfitting**: Did you observe any overfitting? What are the signs? How could you prevent it?

5. **Real World**: Where might you apply neural networks in your daily life or community? What problems could they help solve?

6. **Limitations**: What are some limitations of the simple network we built? What would you need for more complex problems?

7. **Ethics**: Neural networks are used in facial recognition, hiring decisions, and criminal justice. What ethical concerns should we consider?

## Next Steps

Congratulations on building your first neural network! Here are some ways to continue learning:

1. **Deep Learning Libraries**: Learn TensorFlow/Keras or PyTorch for more advanced networks
2. **Convolutional Neural Networks**: Specialized for image processing
3. **Recurrent Neural Networks**: For sequential data like text or time series
4. **Transfer Learning**: Use pre-trained models for your own tasks
5. **Projects**: Build a real application - maybe a plant identifier or handwriting recognizer!

### Resources:
- 3Blue1Brown Neural Network series (YouTube)
- Fast.ai Practical Deep Learning course
- TensorFlow tutorials
- Kaggle competitions and datasets