> hello

I'm Anusha_

I build inference systems, ML infrastructure, and backend products.

I like taking systems from first sketch to working production shape: benchmarked vLLM serving on H100s, implemented distributed transformer parallelism, and built clinical ML pipelines over messy real-world data.

63 req/s

multimodal inference serving

8 GPUs

distributed training benchmarks

110M+

ICU patient-hour records modeled

> projects

Applied ML systems, distributed training, and backend projects with measurable outcomes.

resume ->

5D Parallelism for Transformer Training

complete

2.10M tokens/sec on 8-GPU data parallel training runs

Built the parallel training stack for a GPT-style model as a capstone for a hands-on distributed systems workshop, implementing data, tensor, pipeline, context, and expert parallelism from scratch, then benchmarked throughput, memory, and convergence across 1–8 GPU configurations.

PyTorchDistributed TrainingCUDAPythonML SystemsInfrastructureTransformers

Scaling a Multimodal Tutor Model on Modal

complete

63.05 req/s peak text throughput; 916 ms p95 TTFT on mixed multimodal traffic

Benchmarked Qwen3-VL-4B-Instruct on Modal with vLLM, comparing H100 replica scaling, tensor parallelism, concurrency limits, and mixed multimodal traffic to find a low-latency, high-throughput serving configuration.

ModalvLLMInferenceH100MultimodalQwenBenchmarking

ICU Deterioration Warning System

complete

110M+ hourly ICU records; 0.995 AUROC deterioration prediction

Built an ICU early deterioration warning system on 110M+ hourly records from 50,920 patients (MIMIC-IV), using XGBoost with 12-hour rolling window features to predict vasopressor initiation, intubation, CRRT, or death within 12 hours

MIMICscikitpython