backend & ml infrastructure · alexandria, egypt
Founding engineer at Nover. I run GPU inference infrastructure on AWS, build distributed systems in Go and Python, and occasionally trace deep learning all the way back to statistical mechanics.
fig. 0 — a live 2D Ising model (Metropolis–Hastings), taken from my Physics-AI Bridge research. Drag the temperature through the critical point Tc ≈ 2.27 and watch order emerge from noise. This isn't a looping gif — it's computing in your browser right now.
§0 — brief
I'm a final-year Electrical Engineering (Electronics & Communications) student at Alexandria University who didn't wait for graduation to build real things.
Right now I'm the founding engineer at Nover, an AI image and video generation startup, where I own the production stack end-to-end — from the web app at nover.studio down to the GPU workers fine-tuning our proprietary diffusion models on H100s. We're raising pre-seed.
Before that, I spent a year and a half as a software engineer at a US neuromodulation research company, building a real-time EEG "digital twin" pipeline — streaming brainwave data from OpenBCI hardware through Hilbert transforms into coupled-oscillator models — and co-authored a research paper on Kuramoto-based EEG synchronization.
On the side, I led Mind Cloud, a 70+ member robotics organization, from a 9th-place finish at the European Rover Challenge to 2nd and 3rd place at UGVC — migrating the entire autonomy stack to ROS 2 along the way.
Outside the day job, I write open-source research: a Physics → AI series tracing modern deep learning back to statistical mechanics, with two published papers so far. I like building things that are hard. I like shipping them even more.
§1 — what I'm building now
Nover is an AI media-generation startup. We fine-tune our own diffusion models, serve them at scale, and wrap the whole thing in a creative tool people actually want to use. "Founding engineer" here means I own everything between the user pressing generate and the pixels showing up on their screen — and a fair bit of what happens before either of those things.
I won't tell you exactly how the pipeline is built — pre-launch, the boring details are a small competitive edge. Instead, here are the three pieces I'm responsible for:
I built the full creative tool — the editor, the canvas, the asset pipeline, account & billing, real-time job state. It's the surface every user touches; it has to feel instant even when the GPUs are saturated.
An autoscaled GPU fleet on AWS that turns prompts into images and video. Job queues, worker orchestration, model storage, global delivery. Infrastructure-as-code so the next engineer can rebuild it from scratch by reading a repo.
Full fine-tunes of our proprietary diffusion models on rented H100s, plus the release infrastructure that ships new checkpoints behind the same API surface — including the throttled, resumable pipeline that handled our waitlist launch.
FROM THE TRENCHES
Our heaviest generation workloads kept hitting a memory-bound failure mode that looked like a GPU problem. It wasn't — the bug lived in the request path. Fixed the path, then upgraded the fleet anyway. The full story comes after launch.
§2 — flagship case study
A fault-tolerant task queue built on the Reliable Queue Pattern: at-least-once delivery
through atomic Redis LMOVE operations, multi-tier priority queues, delayed scheduling via
sorted sets, batch ingestion, and dead-letter queues for poison messages. CI is green; the README has the
architecture diagram in Mermaid.
The interesting part is failure. A lease-based reaper watches for workers that died mid-task and atomically reclaims their work via Lua, and the whole system is instrumented with native Prometheus metrics. With a 30-second lease and 5-second reaper sweep, the reclaim bound is provable:
reclaim_latencymax ≈ LEASE_TTL + REAPER_INTERVAL ≈ 35s
For non-idempotent handlers, producers can attach an idempotency_key — workers use a
claim-then-confirm pattern to guarantee at-most-once execution within the configured TTL
window. I validated all of this the only way that counts: automated chaos tests that kill workers at
random while the queue is under load.
WHAT BROKE
Chaos testing exposed a race between a slow worker renewing its lease and the reaper reclaiming the same task — two consumers, one job. The fix was making check-and-extend a single atomic Lua script on Redis, closing the window entirely.
§3 — case study
Agent Mesh is an event-driven orchestration system that simulates a software-engineering squad — architect, developer, QA — running as Go worker nodes against a Redis-backed task graph, with a React command center that visualizes the whole thing in real time. The hard part isn't the agents; it's keeping the UI honest when 750 events per second hit the browser.
The naive WebSocket-to-state loop tries to repaint every 1.33 ms; a 60 Hz display can only honor a repaint every 16.67 ms. The result is a "rendering avalanche" — the main thread saturates, layouts thrash, cards teleport across the screen faster than the eye can track. Agent Mesh fixes this with a throttled batching engine: a non-reactive ref buffers incoming updates, then flushes them to React state on a fixed 100 ms cadence.
DESIGN NOTE
Trading 100 ms of perceived latency for a 10× drop in render frequency is the whole game. The UI feels real-time, the main thread sits 85% idle, and the system can scale ingest without scaling pixels.
On the backend, workers report RSS (resident set size) via gopsutil — not
heap, because heap is noisy. When RSS crosses a soft limit, the worker raises backpressure on the producer
until GC reclaims. Task transitions go through atomic Redis BLMOVE and ACID PostgreSQL writes;
graceful shutdown rides Go's sync.WaitGroup + context cancellation so no task gets
half-persisted on SIGTERM.
metric no batching agent mesh
─────────────────────────────────────────────────────
max ui events/sec ~60 (60Hz cap) 750+
main thread idle % <5% (laggy) ~85% (smooth)
state consistency fragile guaranteed
§4 — case study
A 3D kinematic simulation of a THAAD-style terminal missile defense system, written from scratch to close a complete Guidance, Navigation, and Control (GNC) loop: noisy radar → state estimator → launch decision → guided interception → engagement assessment. It's a sandbox for the maths of hitting a bullet with another bullet.
The estimator is a 6-state Kalman filter tracking position and velocity from Gaussian-noise position measurements. Fire-control launches only when the predicted impact point (PIP) is plausible, the track is mature, and the threat is clearly descending. The interceptor is a two-stage rocket with mass depletion, drag, and G-limits, steered by Augmented Proportional Navigation with a gravity bias.
DESIGN NOTE — WHY CPA MATTERS
At closing speeds of a few km/s, objects move tens of metres per timestep. A naive distance check between frames misses collisions — the interceptor and threat can "jump over" each other entirely. Continuous collision detection via Closest Point of Approach over each step (tmin = -(Δp·Δv)/|Δv|²) gives you the actual minimum miss distance, not a sampling artifact.
Every run ends with a Battle Damage Assessment table in the terminal alongside an interactive 3D Matplotlib replay with a time slider. A typical kill looks like this:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ BATTLE DAMAGE ASSESSMENT ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ outcome │ KILL │
│ kill alt │ 42.5 km │
│ miss dist │ 0.12 m │
│ closing vel │ 3 100 m/s │
│ max g-load │ 12.4 G │
└──────────────────────────┘
possible outcomes: KILL · GROUND · TIMEOUT
§5 — case study
A streaming pipeline that watches BTC trade on two exchanges at once and spots the moments they disagree. WebSocket feeds from Coinbase and Binance land in Kafka; Flink tumbling windows align the two streams in time and compute the spread; an alerter pushes live notifications to Discord while a Streamlit dashboard charts spreads and alert history.
The system is decomposed into four services — producer, stream processor, alerter, dashboard — each independently deployable and orchestrated with Docker Compose, so any piece can fail or restart without taking down the pipeline.
DESIGN NOTE
Two exchanges tick at different rates, so comparing "latest price vs latest price" lies to you. Tumbling windows force both streams onto the same clock before the spread is computed — the alignment, not the math, is the hard part.
§6 — case study
A high-performance packet sniffer in modern C++. NetProbe captures live traffic through Npcap, performs deep inspection of TCP/IP headers, and extracts TLS SNI — so you can see which hosts a machine is talking to even when the payload is encrypted — rendering everything in a real-time Dear ImGui interface.
Packets arrive in bursts far faster than any UI can draw. The architecture is a multi-threaded
producer–consumer pipeline using std::jthread: a capture thread feeds a
lock-guarded buffer, parser workers drain it, and the render thread never blocks on the network.
DESIGN NOTE
The rule that shaped everything: the thread that draws pixels must never wait on the thread that reads the wire. Decoupling them is the difference between a tool engineers trust during a traffic spike and one that freezes exactly when it matters.
§7 — research
Deep learning didn't appear from nowhere — a surprising amount of it is statistical mechanics wearing a different hat. Physics-AI Bridge traces that lineage explicitly, with working code and a published paper at every phase: from the 2D Ising model through Hopfield networks toward Boltzmann machines, Neural Network Gaussian Processes, and ultimately the connection to free quantum field theory.
Phase 1 is a fully vectorized Ising simulation — the one running at the top of this page — that reproduces the critical temperature to within 0.05% of Onsager's exact solution, runs at 60 FPS on 65,536 spins, and matches critical exponents to within 4.5% of theory. Phase 2 maps Ising dynamics onto Hopfield networks as learnable energy models. The five stored patterns spell N · O · V · E · R — yes, after the startup — and we recover 68% strict recall at 25% corruption with a 9.65-unit energy gap between stored attractors and spurious states. A 500-trial spurious-states sweep empirically confirmed the predicted Z₂ symmetry to a perfect 201/201 split.
You can play with Phase 2 right here. The Hopfield network below has the five NOVER patterns baked into its weights. Click cells to corrupt the current state, or hit corrupt 30%, then press recall and watch the synchronous update converge toward the nearest stored attractor:
fig. 6 — five 7×9 attractors (NOVER) stored via Hebbian learning in a 63-neuron network. Synchronous updates roll the state downhill in energy until it settles — usually into one of the stored letters, occasionally into a "spurious state" the math also predicts.
Both phases ship with peer-reviewable PDFs in the repo:
WHY IT MATTERS
Energy landscapes, temperature, phase transitions — these aren't metaphors in machine learning, they're the actual math. Understanding a model as a physical system is the difference between tuning hyperparameters by superstition and knowing why they work. Phases 2.5 → 4 (Boltzmann machines, NNGPs, the QFT connection) are on the roadmap.
§8 — case study
CerebralFlow is a framework I built for constructing "digital twins" of brain dynamics — pipelines that take real physiological signals, extract a functional network from them, validate that network against random null models, and then drive a generative simulation that reproduces the observed activity.
The signal side uses Hilbert transforms to extract instantaneous phase and intrinsic frequency per channel. The connectivity side uses Phase Lag Index and weighted PLI to estimate coupling while suppressing volume-conduction artifacts. The validation side generates phase-shuffled surrogates and runs a significance test — so you know whether the connectivity you measured is real structure or noise dressed up as structure.
DESIGN NOTE
The unglamorous part of neural-signal work is that correlation between two phase-locked sine waves isn't connectivity — volume conduction will fake any structure you want. PLI throws away the zero-lag component on principle. Surrogate testing then proves the remainder is non-random. Without these two steps the simulation is decorative; with them, it's a hypothesis.
The simulation engine is a time-varying Kuramoto network — a coupled-oscillator model whose generative dynamics, calibrated to the data, can be perturbed to ask "what happens if we stimulate node X?" That's the digital-twin promise: you simulate before you intervene.
step 1 signal data inversion
extracted phases shape: (10, 1024)
estimated frequencies: mean = 10.12 Hz
step 1.5 statistical validation (surrogates)
generating 20 phase-shuffled surrogates...
observed mean connectivity: 0.4215
surrogate mean (N=20): 0.4208
z-score: 0.15 · p-value: 0.4500
result is NOT statistically significant
(expected — input was random noise)
step 1.6 advanced connectivity (PLI)
PLI matrix mean: 0.3805
step 9 closed-loop optimization
final error: 1.0520
pipeline completed successfully.
§9 — leadership
Mind Cloud is the autonomous robotics organization I led at Alexandria University — 70+ student engineers across mechanical, electrical, software, and AI subteams, all converging on a competition rover. My job was technical direction, sub-team coordination, and personally owning the autonomy stack.
The migration was the big one: I rebuilt the rover's autonomy on ROS 2 from scratch — modular packages for LiDAR processing, TF2-based coordinate-frame state management, and a containerized deployment model so the whole stack came up reproducibly on any team member's machine. Before the migration, we were debugging the build system. After, we were debugging the rover.
LESSON LEARNED
Coordinating 70+ people who all want to be the smartest person in the room is its own engineering problem. The architecture decisions that mattered most weren't algorithmic — they were the ones that let four subteams work in parallel without stepping on each other's branches at 2am the night before a comp.
The competition results, in chronological order:
§10 — devops
Every project I ship rides a fully automated deployment pipeline. Code goes from PR to production without anyone logging into a server — linting, testing, building container images, and rolling them out through blue-green deploys that can be rolled back with a single commit revert.
The Nover stack is a good example: a GitHub Actions workflow runs the test suite, builds a Docker image tagged with the commit SHA, pushes it to ECR, and triggers a rolling update on the GPU fleet — all gated by branch protection and required status checks. Infrastructure is Terraform, version-controlled alongside the application code so the provisioning history is as auditable as the feature history.
For side projects and open-source repos, I use lighter-weight pipelines: GitHub Actions for CI (lint, test, build), Docker Compose for local-to-staging parity, and Nginx with Let's Encrypt for TLS termination on self-hosted services. The principle is the same everywhere: if a human has to remember a deploy step, the pipeline is broken.
PHILOSOPHY
A deploy should be boring. If your heart rate goes up when you push to main, your pipeline is telling you something. Automate the fear away — tests, canary checks, instant rollback — until shipping is just another commit.
§10 — also built
§11 — toolbox
python · go · c++ · java
typescript · sql · c · matlab
fastapi · spring boot · redis
kafka · flink · postgres · docker · prometheus
aws · terraform · github actions · jenkins
pytorch · comfyui · lora fine-tuning
diffusion inference · h100 · hugging face
numpy · scipy ·
kuramoto · hilbert · kalman
ros 2 · tf2 · slam · lidar
std::jthread · npcap · pcap
embedded C · esp8266
github actions · docker compose · terraform
jenkins · nginx · let's encrypt
blue-green deploys · rollback automation
§12 — contact
Building something interesting? I'd love to hear about it.