high performance computing with mpi
- Built a parallel histogram generator in C/MPI, distributing up to 10 M+ data points across processes on an HPC cluster to measure scalability and speedup.
- Optimized MPI_Scatter and MPI_Reduce operations, achieving ~3.6× speedup at 8 processes and analyzing communication-to-computation trade-offs.
- Authored a detailed performance analysis on runtime, efficiency, and bin-size scaling consistent with Amdahl's Law and real-world HPC bottlenecks.