Builder.io 06ebf2132f AI & Machine Learning Beginners Course

cgen-84a6956d7ddd467f8577818c16163a0b

2025-11-13 04:16:53 +00:00

12 KiB

Raw Permalink Blame History

AI & Machine Learning Beginners

Introduction to AI and ML
Machine Learning Fundamentals
Supervised Learning
Unsupervised Learning
Model Evaluation
Practical Implementation
Real-World Applications
Getting Started

Introduction to AI and ML

Artificial Intelligence (AI) and Machine Learning (ML) are transforming how we solve problems. This course introduces fundamental concepts needed to understand and build ML systems.

AI vs Machine Learning vs Deep Learning

Artificial Intelligence

Broad field: machines doing tasks requiring human intelligence
Includes robotics, natural language processing, computer vision
Includes both rule-based and learning systems

Machine Learning

Subset of AI: systems that learn from data
No explicit programming for each case
Improve performance with more data

Deep Learning

Subset of ML: neural networks with many layers
Powers image recognition, language models
Requires significant computational resources

Why Machine Learning?

Traditional programming is limited:

Writing rules for every scenario is impractical
Rules can't adapt to new data
Pattern recognition is difficult for humans

Machine learning solves this:

System learns patterns from examples
Automatically adapts to new data
Discovers non-obvious relationships

Machine Learning Fundamentals

The ML Workflow

1. Problem Definition
   ↓
2. Data Collection
   ↓
3. Data Preparation
   ↓
4. Model Selection
   ↓
5. Training
   ↓
6. Evaluation
   ↓
7. Deployment
   ↓
8. Monitoring

Types of Learning

Supervised Learning

You provide labeled examples
System learns to predict based on examples
Example: Email spam classification

Unsupervised Learning

Data has no labels
System finds patterns or structure
Example: Customer segmentation

Reinforcement Learning

System learns through trial and error
Receives rewards for good actions
Example: Game playing AI

Training and Testing

Dataset
  ↓
├─→ Training Set (70%)    → Train Model
│
├─→ Validation Set (15%)  → Tune Model
│
└─→ Test Set (15%)        → Final Evaluation

Never test on training data!

Supervised Learning

Classification vs Regression

Classification

Predict category (discrete output)
Example: Dog or Cat?
Example: Email spam (yes/no)?

Regression

Predict numerical value (continuous output)
Example: House price
Example: Temperature tomorrow

Decision Trees

Simple, interpretable models:

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load data
iris = load_iris()
X = iris.data
y = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create and train model
model = DecisionTreeClassifier(max_depth=5)
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

# Evaluate
accuracy = model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.2%}")

Linear Models

Quick, efficient baseline models:

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# Prepare data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Logistic Regression (classification)
model = LogisticRegression(max_iter=1000)
model.fit(X_train_scaled, y_train)

# Predictions with probabilities
probabilities = model.predict_proba(X_test_scaled)
predictions = model.predict(X_test_scaled)

Support Vector Machines (SVM)

Powerful for classification:

from sklearn.svm import SVC

# Create SVM model
model = SVC(kernel='rbf', C=1.0, gamma='scale')
model.fit(X_train_scaled, y_train)

# Evaluate
accuracy = model.score(X_test_scaled, y_test)
print(f"SVM Accuracy: {accuracy:.2%}")

Neural Networks

Deep learning basics:

import tensorflow as tf
from tensorflow import keras

# Build model
model = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(10, activation='softmax')
])

# Compile model
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train model
history = model.fit(
    X_train, y_train,
    epochs=10,
    batch_size=32,
    validation_split=0.2
)

# Evaluate
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy:.2%}")

# Make predictions
predictions = model.predict(X_test)

Unsupervised Learning

Clustering

Grouping similar data points:

from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# K-Means clustering
kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(X)

# Visualize
plt.scatter(X[:, 0], X[:, 1], c=clusters, cmap='viridis')
plt.scatter(
    kmeans.cluster_centers_[:, 0],
    kmeans.cluster_centers_[:, 1],
    marker='X',
    s=200,
    c='red'
)
plt.title('K-Means Clustering')
plt.show()

Dimensionality Reduction

Reducing features while preserving information:

from sklearn.decomposition import PCA

# Principal Component Analysis
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)

# Explained variance
print(f"Explained variance ratio: {pca.explained_variance_ratio_}")

# Visualize
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='viridis')
plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.1%})')
plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.1%})')
plt.title('PCA Visualization')
plt.show()

Anomaly Detection

Finding unusual patterns:

from sklearn.ensemble import IsolationForest

# Isolation Forest for anomaly detection
model = IsolationForest(contamination=0.1)
anomalies = model.fit_predict(X)

# -1 indicates anomaly, 1 indicates normal
anomaly_count = (anomalies == -1).sum()
print(f"Anomalies detected: {anomaly_count}")

Model Evaluation

Metrics for Classification

from sklearn.metrics import (
    accuracy_score, precision_score, recall_score,
    f1_score, confusion_matrix, roc_auc_score, roc_curve
)

y_pred = model.predict(X_test)

# Basic metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Accuracy:  {accuracy:.2%}")
print(f"Precision: {precision:.2%}")
print(f"Recall:    {recall:.2%}")
print(f"F1 Score:  {f1:.2%}")

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)

Metrics for Regression

from sklearn.metrics import (
    mean_squared_error, mean_absolute_error, r2_score
)

y_pred = model.predict(X_test)

# Regression metrics
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = mse ** 0.5
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.4f}")
print(f"Root Mean Squared Error: {rmse:.4f}")
print(f"Mean Absolute Error: {mae:.4f}")
print(f"R² Score: {r2:.4f}")

Cross-Validation

Better evaluation of model performance:

from sklearn.model_selection import cross_val_score, KFold

# K-Fold Cross-Validation
kfold = KFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X_train, y_train, cv=kfold, scoring='accuracy')

print(f"Cross-validation scores: {scores}")
print(f"Mean CV Score: {scores.mean():.2%}")
print(f"Std Dev: {scores.std():.4f}")

Practical Implementation

Complete ML Pipeline

import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Load data
df = pd.read_csv('data.csv')

# Handle missing values
df = df.dropna()

# Separate features and target
X = df.drop('target', axis=1)
y = df['target']

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', RandomForestClassifier(n_estimators=100))
])

# Train
pipeline.fit(X_train, y_train)

# Evaluate
predictions = pipeline.predict(X_test)
print(classification_report(y_test, predictions))

Hyperparameter Tuning

Finding optimal parameters:

from sklearn.model_selection import GridSearchCV

# Define parameter grid
param_grid = {
    'max_depth': [5, 10, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'n_estimators': [50, 100, 200]
}

# Grid search
grid_search = GridSearchCV(
    RandomForestClassifier(),
    param_grid,
    cv=5,
    n_jobs=-1
)

grid_search.fit(X_train, y_train)

print(f"Best parameters: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.2%}")

# Use best model
best_model = grid_search.best_estimator_
accuracy = best_model.score(X_test, y_test)
print(f"Test accuracy: {accuracy:.2%}")

Real-World Applications

Fraud Detection

# Detect fraudulent transactions
from sklearn.ensemble import IsolationForest

# Train on historical data (mostly non-fraud)
model = IsolationForest(contamination=0.05)
model.fit(historical_transactions)

# Detect fraud in new transactions
predictions = model.predict(new_transactions)
fraud_mask = predictions == -1
flagged_transactions = new_transactions[fraud_mask]

Recommendation Systems

from sklearn.neighbors import NearestNeighbors

# Find similar users
model = NearestNeighbors(n_neighbors=5)
model.fit(user_features)

# Find neighbors for a user
distances, indices = model.kneighbors(target_user)

# Recommend items liked by similar users
recommendations = items[indices].mean(axis=0)

Sentiment Analysis

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB

# Vectorize text
vectorizer = TfidfVectorizer(max_features=1000)
X = vectorizer.fit_transform(texts)

# Train classifier
classifier = MultinomialNB()
classifier.fit(X, sentiments)

# Predict sentiment of new text
new_text_vector = vectorizer.transform(new_texts)
sentiments = classifier.predict(new_text_vector)

Getting Started

Essential Libraries

# Data manipulation
pip install pandas numpy

# Machine Learning
pip install scikit-learn

# Deep Learning
pip install tensorflow keras torch

# Visualization
pip install matplotlib seaborn

# Jupyter for experimentation
pip install jupyter

Learning Path

Understand the fundamentals: Variables, data types, control flow
Learn data preprocessing: Cleaning, normalization, feature engineering
Implement simple models: Decision trees, linear models
Understand evaluation: Metrics, validation strategies
Explore complex models: Ensemble methods, neural networks
Work on projects: Compete on Kaggle, contribute to open source

Resources

Kaggle: Datasets and competitions to practice
Papers with Code: Latest research implementations
Fast.ai: Practical deep learning courses
Google Colab: Free GPU for experimentation
Scikit-learn documentation: Comprehensive API reference

Conclusion

Machine learning is a powerful tool for solving complex problems. Master these fundamentals and you'll have the foundation to explore advanced topics like deep learning, reinforcement learning, and specialized applications.

Start small with simple datasets, understand each component deeply, and gradually take on more complex challenges. The field rewards both theoretical understanding and practical experience.

12 KiB Raw Permalink Blame History