Yehia Gewily | Software & Systems Engineer

§0 — brief

Hey, I'm Yehia.

I'm a final-year Electrical Engineering (Electronics & Communications) student at Alexandria University who didn't wait for graduation to build real things.

Right now I'm the founding engineer at Nover, an AI image and video generation startup, where I own the production stack end-to-end — from the web app at nover.studio down to the GPU workers fine-tuning our proprietary diffusion models on H100s. We're raising pre-seed.

Before that, I spent a year and a half as a software engineer at a US neuromodulation research company, building a real-time EEG "digital twin" pipeline — streaming brainwave data from OpenBCI hardware through Hilbert transforms into coupled-oscillator models — and co-authored a research paper on Kuramoto-based EEG synchronization.

On the side, I led Mind Cloud, a 70+ member robotics organization, from a 9th-place finish at the European Rover Challenge to 2nd and 3rd place at UGVC — migrating the entire autonomy stack to ROS 2 along the way.

Outside the day job, I write open-source research: a Physics → AI series tracing modern deep learning back to statistical mechanics, with two published papers so far. I like building things that are hard. I like shipping them even more.

status: open to remote software & ML-infrastructure roles
currently: founding engineer @ nover
based in: alexandria, egypt (utc+2)
email: yehiasaidgewily@gmail.com
elsewhere: github · linkedin

§1 — what I'm building now

Founding engineer at Nover

ai image & video generation · 2025–present — nover.studio ↗

Nover is an AI media-generation startup. We fine-tune our own diffusion models, serve them at scale, and wrap the whole thing in a creative tool people actually want to use. "Founding engineer" here means I own everything between the user pressing generate and the pixels showing up on their screen — and a fair bit of what happens before either of those things.

I won't tell you exactly how the pipeline is built — pre-launch, the boring details are a small competitive edge. Instead, here are the three pieces I'm responsible for:

01 · Product

The web app at nover.studio

I built the full creative tool — the editor, the canvas, the asset pipeline, account & billing, real-time job state. It's the surface every user touches; it has to feel instant even when the GPUs are saturated.

02 · Infrastructure

GPU inference at production scale

An autoscaled GPU fleet on AWS that turns prompts into images and video. Job queues, worker orchestration, model storage, global delivery. Infrastructure-as-code so the next engineer can rebuild it from scratch by reading a repo.

03 · ML & Models

Fine-tuning on H100s

Full fine-tunes of our proprietary diffusion models on rented H100s, plus the release infrastructure that ships new checkpoints behind the same API surface — including the throttled, resumable pipeline that handled our waitlist launch.

FROM THE TRENCHES

Our heaviest generation workloads kept hitting a memory-bound failure mode that looked like a GPU problem. It wasn't — the bug lived in the request path. Fixed the path, then upgraded the fleet anyway. The full story comes after launch.

24/7

production gpu fleet

pre-seed

and building

§2 — flagship case study

Go distributed queue

go · redis · docker · prometheus — 2025 — repo ↗

A fault-tolerant task queue built on the Reliable Queue Pattern: at-least-once delivery through atomic Redis LMOVE operations, multi-tier priority queues, delayed scheduling via sorted sets, batch ingestion, and dead-letter queues for poison messages. CI is green; the README has the architecture diagram in Mermaid.

The interesting part is failure. A lease-based reaper watches for workers that died mid-task and atomically reclaims their work via Lua, and the whole system is instrumented with native Prometheus metrics. With a 30-second lease and 5-second reaper sweep, the reclaim bound is provable:

reclaim_latency_max ≈ LEASE_TTL + REAPER_INTERVAL ≈ 35s

For non-idempotent handlers, producers can attach an idempotency_key — workers use a claim-then-confirm pattern to guarantee at-most-once execution within the configured TTL window. I validated all of this the only way that counts: automated chaos tests that kill workers at random while the queue is under load.

WHAT BROKE

Chaos testing exposed a race between a slow worker renewing its lease and the reaper reclaiming the same task — two consumers, one job. The fix was making check-and-extend a single atomic Lua script on Redis, closing the window entirely.

35s

max reclaim after worker death

≥1

delivery guarantee

CI ✓

github actions green

fig. 1 — reliable queue pattern with lease reaper and dead-letter path for poison messages.

§3 — case study

Agent Mesh: orchestrating squad-mode AI at 750+ RPS

go 1.23 · react · redis · postgres · websockets — 2025 — repo ↗

Agent Mesh is an event-driven orchestration system that simulates a software-engineering squad — architect, developer, QA — running as Go worker nodes against a Redis-backed task graph, with a React command center that visualizes the whole thing in real time. The hard part isn't the agents; it's keeping the UI honest when 750 events per second hit the browser.

The naive WebSocket-to-state loop tries to repaint every 1.33 ms; a 60 Hz display can only honor a repaint every 16.67 ms. The result is a "rendering avalanche" — the main thread saturates, layouts thrash, cards teleport across the screen faster than the eye can track. Agent Mesh fixes this with a throttled batching engine: a non-reactive ref buffers incoming updates, then flushes them to React state on a fixed 100 ms cadence.

DESIGN NOTE

Trading 100 ms of perceived latency for a 10× drop in render frequency is the whole game. The UI feels real-time, the main thread sits 85% idle, and the system can scale ingest without scaling pixels.

On the backend, workers report RSS (resident set size) via gopsutil — not heap, because heap is noisy. When RSS crosses a soft limit, the worker raises backpressure on the producer until GC reclaims. Task transitions go through atomic Redis BLMOVE and ACID PostgreSQL writes; graceful shutdown rides Go's sync.WaitGroup + context cancellation so no task gets half-persisted on SIGTERM.

THROUGHPUT · WITH vs WITHOUT BATCHING

metric                  no batching        agent mesh
─────────────────────────────────────────────────────
max ui events/sec       ~60 (60Hz cap)     750+
main thread idle %      <5% (laggy)       ~85% (smooth)
state consistency       fragile            guaranteed

fig. 2 — producer → redis → worker pool with RSS-driven backpressure feeding back into ingest.

§4 — case study

Hitting a bullet with a bullet

python · numpy · kalman filtering · proportional navigation — 2025 — repo ↗

A 3D kinematic simulation of a THAAD-style terminal missile defense system, written from scratch to close a complete Guidance, Navigation, and Control (GNC) loop: noisy radar → state estimator → launch decision → guided interception → engagement assessment. It's a sandbox for the maths of hitting a bullet with another bullet.

The estimator is a 6-state Kalman filter tracking position and velocity from Gaussian-noise position measurements. Fire-control launches only when the predicted impact point (PIP) is plausible, the track is mature, and the threat is clearly descending. The interceptor is a two-stage rocket with mass depletion, drag, and G-limits, steered by Augmented Proportional Navigation with a gravity bias.

DESIGN NOTE — WHY CPA MATTERS

At closing speeds of a few km/s, objects move tens of metres per timestep. A naive distance check between frames misses collisions — the interceptor and threat can "jump over" each other entirely. Continuous collision detection via Closest Point of Approach over each step (t_min = -(Δp·Δv)/|Δv|²) gives you the actual minimum miss distance, not a sampling artifact.

Every run ends with a Battle Damage Assessment table in the terminal alongside an interactive 3D Matplotlib replay with a time slider. A typical kill looks like this:

BATTLE DAMAGE ASSESSMENT · engineering mode

┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ BATTLE DAMAGE ASSESSMENT ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ outcome      │ KILL       │
│ kill alt     │ 42.5 km    │
│ miss dist    │ 0.12 m     │
│ closing vel  │ 3 100 m/s  │
│ max g-load   │ 12.4 G     │
└──────────────────────────┘

possible outcomes: KILL · GROUND · TIMEOUT

fig. 3 — full GNC loop: noisy detection → Kalman estimation → launch decision → APN guidance → CPA scoring.

§5 — case study

Real-time crypto arbitrage engine

python · kafka · apache flink · streamlit · docker — 2025 — repo ↗

A streaming pipeline that watches BTC trade on two exchanges at once and spots the moments they disagree. WebSocket feeds from Coinbase and Binance land in Kafka; Flink tumbling windows align the two streams in time and compute the spread; an alerter pushes live notifications to Discord while a Streamlit dashboard charts spreads and alert history.

The system is decomposed into four services — producer, stream processor, alerter, dashboard — each independently deployable and orchestrated with Docker Compose, so any piece can fail or restart without taking down the pipeline.

DESIGN NOTE

Two exchanges tick at different rates, so comparing "latest price vs latest price" lies to you. Tumbling windows force both streams onto the same clock before the spread is computed — the alignment, not the math, is the hard part.

fig. 4 — two exchange feeds aligned in Flink windows; alerts fan out to Discord and the live dashboard.

§6 — case study

NetProbe: real-time traffic analyzer

c++20 · npcap · dear imgui — 2025 — repo ↗

A high-performance packet sniffer in modern C++. NetProbe captures live traffic through Npcap, performs deep inspection of TCP/IP headers, and extracts TLS SNI — so you can see which hosts a machine is talking to even when the payload is encrypted — rendering everything in a real-time Dear ImGui interface.

Packets arrive in bursts far faster than any UI can draw. The architecture is a multi-threaded producer–consumer pipeline using std::jthread: a capture thread feeds a lock-guarded buffer, parser workers drain it, and the render thread never blocks on the network.

DESIGN NOTE

The rule that shaped everything: the thread that draws pixels must never wait on the thread that reads the wire. Decoupling them is the difference between a tool engineers trust during a traffic spike and one that freezes exactly when it matters.

fig. 5 — producer–consumer pipeline decoupling packet ingestion from the 60 fps render loop.

§7 — research

Physics → AI: the bridge

python · numpy · streamlit — 2025, with omar hosney (cto, nover) — repo ↗

Deep learning didn't appear from nowhere — a surprising amount of it is statistical mechanics wearing a different hat. Physics-AI Bridge traces that lineage explicitly, with working code and a published paper at every phase: from the 2D Ising model through Hopfield networks toward Boltzmann machines, Neural Network Gaussian Processes, and ultimately the connection to free quantum field theory.

Phase 1 is a fully vectorized Ising simulation — the one running at the top of this page — that reproduces the critical temperature to within 0.05% of Onsager's exact solution, runs at 60 FPS on 65,536 spins, and matches critical exponents to within 4.5% of theory. Phase 2 maps Ising dynamics onto Hopfield networks as learnable energy models. The five stored patterns spell N · O · V · E · R — yes, after the startup — and we recover 68% strict recall at 25% corruption with a 9.65-unit energy gap between stored attractors and spurious states. A 500-trial spurious-states sweep empirically confirmed the predicted Z₂ symmetry to a perfect 201/201 split.

You can play with Phase 2 right here. The Hopfield network below has the five NOVER patterns baked into its weights. Click cells to corrupt the current state, or hit corrupt 30%, then press recall and watch the synchronous update converge toward the nearest stored attractor:

load

E = 0.00 · → —

fig. 6 — five 7×9 attractors (NOVER) stored via Hebbian learning in a 63-neuron network. Synchronous updates roll the state downhill in energy until it settles — usually into one of the stored letters, occasionally into a "spurious state" the math also predicts.

Both phases ship with peer-reviewable PDFs in the repo:

Phase 1 paper ↗ Computational 2D Ising Simulation — Tc verified to 0.05% vs Onsager exact. pdf
Phase 2 paper ↗ Hopfield Networks as Learnable Ising Models — Ising/Hopfield isomorphism, Z₂ symmetry. pdf · latex source in repo

WHY IT MATTERS

Energy landscapes, temperature, phase transitions — these aren't metaphors in machine learning, they're the actual math. Understanding a model as a physical system is the difference between tuning hyperparameters by superstition and knowing why they work. Phases 2.5 → 4 (Boltzmann machines, NNGPs, the QFT connection) are on the roadmap.

§8 — case study

CerebralFlow: digital twins for brain dynamics

python · scipy · numpy · kuramoto models — 2025 — repo ↗

CerebralFlow is a framework I built for constructing "digital twins" of brain dynamics — pipelines that take real physiological signals, extract a functional network from them, validate that network against random null models, and then drive a generative simulation that reproduces the observed activity.

The signal side uses Hilbert transforms to extract instantaneous phase and intrinsic frequency per channel. The connectivity side uses Phase Lag Index and weighted PLI to estimate coupling while suppressing volume-conduction artifacts. The validation side generates phase-shuffled surrogates and runs a significance test — so you know whether the connectivity you measured is real structure or noise dressed up as structure.

DESIGN NOTE

The unglamorous part of neural-signal work is that correlation between two phase-locked sine waves isn't connectivity — volume conduction will fake any structure you want. PLI throws away the zero-lag component on principle. Surrogate testing then proves the remainder is non-random. Without these two steps the simulation is decorative; with them, it's a hypothesis.

The simulation engine is a time-varying Kuramoto network — a coupled-oscillator model whose generative dynamics, calibrated to the data, can be perturbed to ask "what happens if we stimulate node X?" That's the digital-twin promise: you simulate before you intervene.

example pipeline output

step 1   signal data inversion
  extracted phases shape: (10, 1024)
  estimated frequencies:   mean = 10.12 Hz

step 1.5 statistical validation (surrogates)
  generating 20 phase-shuffled surrogates...
  observed mean connectivity: 0.4215
  surrogate mean (N=20):      0.4208
  z-score: 0.15  ·  p-value: 0.4500
  result is NOT statistically significant
   (expected — input was random noise)

step 1.6 advanced connectivity (PLI)
  PLI matrix mean: 0.3805

step 9   closed-loop optimization
  final error: 1.0520

pipeline completed successfully.

fig. 7 — signals become connectivity become a generative model — validated against random nulls before anything is simulated.

§9 — leadership

Mind Cloud: 70+ engineers, one rover, ROS 2

ros 2 · docker · slam · tf2 — 2022–2024 · alexandria university

Mind Cloud is the autonomous robotics organization I led at Alexandria University — 70+ student engineers across mechanical, electrical, software, and AI subteams, all converging on a competition rover. My job was technical direction, sub-team coordination, and personally owning the autonomy stack.

The migration was the big one: I rebuilt the rover's autonomy on ROS 2 from scratch — modular packages for LiDAR processing, TF2-based coordinate-frame state management, and a containerized deployment model so the whole stack came up reproducibly on any team member's machine. Before the migration, we were debugging the build system. After, we were debugging the rover.

LESSON LEARNED

Coordinating 70+ people who all want to be the smartest person in the room is its own engineering problem. The architecture decisions that mattered most weren't algorithmic — they were the ones that let four subteams work in parallel without stepping on each other's branches at 2am the night before a comp.

The competition results, in chronological order:

9^th

european rover challenge — first attempt

2^nd

UGVC after the ROS 2 rebuild

3^rd

UGVC — repeat podium finish

fig. 8 — sensor stack → TF2 unified frames → planner → motors, all containerized for reproducibility.

§10 — devops

CI/CD & infrastructure automation

github actions · docker · terraform · nginx · aws — 2024–present

Every project I ship rides a fully automated deployment pipeline. Code goes from PR to production without anyone logging into a server — linting, testing, building container images, and rolling them out through blue-green deploys that can be rolled back with a single commit revert.

The Nover stack is a good example: a GitHub Actions workflow runs the test suite, builds a Docker image tagged with the commit SHA, pushes it to ECR, and triggers a rolling update on the GPU fleet — all gated by branch protection and required status checks. Infrastructure is Terraform, version-controlled alongside the application code so the provisioning history is as auditable as the feature history.

For side projects and open-source repos, I use lighter-weight pipelines: GitHub Actions for CI (lint, test, build), Docker Compose for local-to-staging parity, and Nginx with Let's Encrypt for TLS termination on self-hosted services. The principle is the same everywhere: if a human has to remember a deploy step, the pipeline is broken.

PHILOSOPHY

A deploy should be boring. If your heart rate goes up when you push to main, your pipeline is telling you something. Automate the fear away — tests, canary checks, instant rollback — until shipping is just another commit.

0

manual deploy steps

<5 min

PR to production

IaC

terraform all the way down

fig. 9 — git push → CI → container build → blue-green deploy, with one-commit rollback.

§10 — also built

More from the workshop

Intelligent diagnostic systemGraduation project: telehealth platform streaming ESP8266 wearable data into Spring Boot with an ML diagnostic pipeline.spring boot · flutter
Fourier Epicycle MachineInteractive p5.js visualization — draw any path, watch a chain of rotating epicycles reconstruct it as a discrete Fourier transform.p5.js · dft
dot-spendCross-platform minimalist CLI expense tracker in Python with ASCII analytics, CSV export, and Linux status-bar integration.python · cli
ML from scratch12+ core ML algorithms reimplemented in pure NumPy — no sklearn, just the math and code.python · numpy
Design patterns in modern C++Behavioural, creational, and structural patterns alongside SOLID examples — all buildable with CMake.c++ · cmake
Pintos kernelFull OS coursework implementation — priority-donation scheduling, MLFQS, 13 syscalls. 100% on the official grading suite.c · x86

§11 — toolbox

What I work with

LANGUAGES

python · go · c++ · java
typescript · sql · c · matlab

BACKEND & INFRA

fastapi · spring boot · redis
kafka · flink · postgres · docker · prometheus
aws · terraform · github actions · jenkins

ML & AI SYSTEMS

pytorch · comfyui · lora fine-tuning
diffusion inference · h100 · hugging face
numpy · scipy · kuramoto · hilbert · kalman

ROBOTICS & LOW-LEVEL

ros 2 · tf2 · slam · lidar
std::jthread · npcap · pcap
embedded C · esp8266

CI/CD & DEVOPS

github actions · docker compose · terraform
jenkins · nginx · let's encrypt
blue-green deploys · rollback automation