Public CV
Saif Ryan Gangaram
Data Scientist 2 at Los Alamos National Laboratory and Computer Science Ph.D. student at the University of New Mexico focused on high performance computing, AI/ML, Big Data, signal processing, and reliable scientific software.
Contact
- Albuquerque, New Mexico
- saifryangangaram@gmail.com
- linkedin.com/in/vrgangaram
- github.com/srgangaram-swe
Education
University of New Mexico, Department of Computer Science
Ph.D. in Computer Science, expected 2028
Focus: high performance computing, Big Data, AI/ML, signal processing, scientific data systems
Research advisor: Dr. Amanda Bienz
University at Buffalo, Institute for AI and Data Science
Master of Professional Studies in Data Sciences and Applications, 2023
University at Buffalo, College of Arts and Sciences
Bachelor of Arts in Mathematics, Computing and Applied Mathematics, 2021
Research Interests
- High performance and distributed scientific computing
- Big Data systems, scalable analytics, and terabyte-scale processing
- Signal processing for large sensor and experimental datasets
- Machine learning for anomaly detection, forecasting, and segmentation
- Reliable data pipelines, validation, and reproducible analysis
- Numerical methods, time series analysis, and statistical modeling
- Human-centered research software and technical communication
Professional Profile
My work sits at the intersection of scientific computing, Big Data, data-intensive software engineering, high performance computing, applied machine learning, and signal analysis. I have built production-oriented research tools for large sensor and experimental datasets, database-backed analysis interfaces, HPC-enabled workflows, streaming inference pipelines, generative AI and deep learning models, and validation tooling for reproducible technical analysis. My model work includes supervised learning, unsupervised learning, clustering, regression, and large-scale training on terabyte-class datasets with models that can reach millions of parameters.
Experience
Data Scientist 2, Los Alamos National Laboratory
2026 - Present- Develop data acquisition, analysis, automation, and validation workflows for engineering test and scientific computing environments.
- Apply statistical modeling, time-frequency analysis, denoising, stationarity testing, and signal characterization to high-volume technical datasets.
- Build Python, C++, Dask, Slurm, NetCDF, and ML-enabled workflows for scalable analysis, batch execution, microbatched inference, and reproducible deployment.
- Develop Big Data, generative AI, supervised, and unsupervised modeling workflows for clustering, regression, anomaly detection, and deep learning on terabyte-scale datasets.
- Harden large-file processing pipelines with chunked execution, manifest tracking, integrity validation, runtime instrumentation, and atomic output publishing.
- Optimize inference and data movement for large collections of multi-terabyte technical files where throughput and low-latency operation are core algorithmic requirements.
- Improve cross-environment portability through local/HPC execution paths, environment-driven configuration, and reproducible workflow documentation.
Software Engineer, Automation and AI/ML, Space Dynamics Laboratory
2023 - 2026- Developed scientific data pipelines, database-backed web tools, and automation workflows for aerospace and remote-sensing research settings.
- Integrated PostgreSQL, SQLite, HDF5, NetCDF, Python, C#, C++, Fortran, MATLAB, Flask, JavaScript, and dashboard tooling into analysis workflows.
- Built searchable data interfaces with visualization, selection, export, compression, schema inspection, and reproducible model-input generation features.
- Applied machine learning, anomaly detection, clustering, regression, simulation, image processing, and verification methods to large sensor, atmospheric, and telemetry-style datasets.
- Designed multithreaded communication and automation components using REST, TCP/UDP interfaces, synchronization primitives, and testable configuration workflows.
- Collaborated with multidisciplinary teams using Git, CI/CD practices, technical documentation, requirements analysis, integration testing, and iterative stakeholder feedback.
Teaching Assistant, University at Buffalo
2019 - 2020- Supported computer science labs and recitations for more than 45 students.
- Assisted with grading, exam administration, student mentoring, and occasional lecture support.
Selected Projects
Hyperion: Self-Hosted Agentic AI Operations Console
Developed a local-first, open-source agentic harness using Deno, TypeScript, vanilla JavaScript, and WebSocket, with no frontend framework, bundler, or required cloud dependency beyond model APIs. Completed Phase 2 with parallel OpenAI and Anthropic agent sessions, token-by-token streaming, live tool and error events, file context injection, persistent cross-session memory, AI-assisted email drafting with tone control, and tmux session management with command suggestions based on live pane output. Added a full mock mode so the interface remains explorable without API keys. Phase 3 is planned to add multi-user authentication, CalDAV integration, MCP server support, and persistent sessions.
Conductor: Agentic AI Dashboard for Software Development
Developing a personal agentic AI dashboard tool for software development and workflow optimization. The project focuses on coordinating development tasks, surfacing project state, supporting debugging and documentation workflows, and helping developers move from intent to verified implementation.
Parallel Principal Component Analysis with MPI
Designed distributed PCA with rank-local sharding, global mean and covariance reductions, eigenpair broadcast, local projection, optional whitening, metadata logging, and scalability benchmarks.
Large-File Scientific ML Inference
Built overlap-aware slab processing and microbatching for large scientific data, improving memory behavior, output validation, manifest tracking, and reproducible inference with Dask, PyTorch, and NetCDF-oriented data products. The workflow is designed for fast inference across large collections of multi-terabyte files, including workloads that may require near-real-time throughput.
Signal Processing and Workflow Reliability
Developed public-safe signal characterization workflows using spectral analysis, spectrograms, power estimates, stationarity tests, runtime tracing, and validation reports to improve confidence in noisy scientific data products.
Real-Time Human Emotion Recognition
Developed a CNN-based computer vision pipeline using public image datasets, augmentation, regularization, feature extraction, and real-time video inference.
Financial Risk Modeling
Developed probability-of-default models with gradient boosting, neural networks, feature engineering, Bayesian hyperparameter optimization, stress testing, and error analysis.
Scientific Database and Web Tooling
Designed database-backed research tools for search, plotting, export, schema inspection, data selection, and reproducible modeling workflows.
Quantitative Finance and Portfolio Optimization
Built a Python-based financial analysis system using portfolio objects, numerical analysis, and optimization methods to evaluate performance metrics and risk-aware allocation strategies.
Epidemiological SIR and Network Modeling
Implemented SIR and network-based epidemic models using public health datasets, numerical integration, and error analysis to study spread dynamics at regional scales.
Recognition and Presentations
- Received multiple Space Dynamics Laboratory recognitions for scientific data tooling, documentation, usability, technical delivery, and communication.
- Presented anomaly-detection methodology for telemetry-style data at a public technical conference in 2025.
- Selected to present technical program achievements to a broad engineering audience of roughly 1,000 invited personnel.
Technical Skills
Languages: Python, C++, C#, SQL, Bash, R, Java, JavaScript, HTML, CSS, Julia, MATLAB, Scala, Fortran, TypeScript, Rust
ML and Data Science: PyTorch, TensorFlow, Keras, Scikit-learn, NumPy, Pandas, generative AI, supervised learning, unsupervised learning, clustering, regression, CNNs, U-Net-style segmentation, LSTMs, Transformers, anomaly detection, forecasting
HPC, Big Data, and Systems: MPI, Dask, Slurm, distributed workflows, scalable analytics, multithreading, synchronization primitives, socket programming, Docker, Linux, Windows, Git, CI/CD
Data and Visualization: PostgreSQL, SQLite, BigQuery, HDF5, NetCDF, TDMS-style scientific data, terabyte-scale datasets, distributed data processing, Matplotlib, Seaborn, Tableau, Grafana, technical reporting
Analysis: Statistical inference, regression, Bayesian analysis, time series, signal processing, spectral methods, numerical computation, validation, error analysis
Software Practices: Requirements analysis, integration testing, workflow hardening, documentation, reproducible batch execution, LLM-assisted development with human validation
Public Information Note
This public CV intentionally omits sensitive access details, restricted system names, operational parameters, non-public datasets, and program-specific details.