Files
it_portfolio/content/projects/ai_ml.md
2026-04-21 11:49:33 +02:00

72 lines
1.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AI & Machine Learning Projects
Experiments at the boundary of classical algorithms and modern deep learning.
---
## Handwritten Digit Classifier
A convolutional neural network trained on MNIST, built from scratch using **PyTorch** without relying on pretrained weights.
**Architecture:**
```
Input (1×28×28)
→ Conv2d(1, 32, 3) + ReLU + MaxPool
→ Conv2d(32, 64, 3) + ReLU + MaxPool
→ Dropout(0.25)
→ Linear(64×5×5 → 128) + ReLU
→ Dropout(0.5)
→ Linear(128 → 10)
→ LogSoftmax
```
**Results:** 99.2 % test accuracy after 10 epochs on a single CPU.
---
## Sentence Similarity Engine
A small semantic search tool that encodes sentences into embeddings and retrieves the most similar entries from a knowledge base.
**Approach:**
- Sentence embeddings via a fine-tuned BERT variant (`sentence-transformers`)
- FAISS index for approximate nearest-neighbour search at scale
- CLI interface — `search.py "your query"`
**Use case:** powering a private personal knowledge base search over Markdown notes.
---
## Reinforcement Learning: Grid World
A from-scratch implementation of Q-Learning and SARSA applied to a configurable grid-world environment.
**Implemented:**
- Tabular Q-learning with ε-greedy exploration
- SARSA (on-policy variant)
- Policy iteration and value iteration for comparison
- Visualiser showing the learned value function as a heatmap
**Written in pure Python + NumPy** — no RL libraries — for learning purposes.
---
## Anomaly Detection on Time Series
A pipeline for detecting anomalies in server metric data (CPU, memory, latency).
**Methods compared:**
- Z-score baseline
- Isolation Forest
- LSTM autoencoder (reconstruction error threshold)
**Outcome:** LSTM autoencoder outperformed statistical methods by ~18 % precision on labelled incidents from a personal homelab dataset.
---
*Press Q or ESC to return.*