University of New Mexico Computer Science

Saif Ryan Gangaram

Data Scientist 2 at LANL and UNM Computer Science Ph.D. student focused on HPC, AI/ML, Big Data, signal processing, and reliable scientific software.

Current Role
Data Scientist 2, LANL
Research Areas
HPC, Big Data, AI/ML, signal processing
Ph.D. Program
CS, UNM
Portrait of Saif Ryan Gangaram

About

Scientific computing for large, noisy, consequential data.

I am a Computer Science Ph.D. student at the University of New Mexico with a background in mathematics, data science, software engineering, and applied machine learning. My work focuses on building computational tools that make complex sensor, simulation, and experimental data easier to validate, analyze, and use responsibly.

I am advised by Dr. Amanda Bienz at UNM. Professionally, I currently work as a Data Scientist 2 at Los Alamos National Laboratory, after previously working as a Software Engineer in Automation and AI/ML at Space Dynamics Laboratory and as a Teaching Assistant at the University at Buffalo.

I am especially interested in scalable algorithms, high-throughput data systems, time-frequency analysis, robust ML inference workflows, and the engineering practices that turn research code into dependable software. This public site intentionally keeps project descriptions technical, high level, and suitable for open audiences.

Research

Research Themes

High Performance Scientific Computing

Parallel algorithms, distributed workflows, MPI-based numerical methods, Dask pipelines, and batch execution on HPC systems.

Big Data and Scalable Analytics

Current research includes Big Data methods for large scientific datasets, distributed processing, scalable storage formats, and analytics workflows for terabyte-scale data collections.

Signal Processing and Time Series Analysis

Denoising, spectral analysis, stationarity testing, transient detection, and quality checks for high-volume sensor data.

AI/ML for Scientific Data

Generative AI, supervised and unsupervised learning, clustering, regression, deep learning, anomaly detection, segmentation, forecasting, and validation for large scientific datasets.

Reliable Data Systems

Reproducible ingestion, schema design, workflow hardening, data validation, and visualization for large experimental collections.

Selected Work

Projects

Open Source | Deno, TypeScript, WebSocket

Hyperion

Phase 2 complete: a self-hosted, local-first agentic AI operations console that runs OpenAI and Anthropic agents in parallel, streams tokens and tool events live, and provides file context, persistent memory, terminal control, AI-assisted email drafting, and a no-key mock mode.

View open-source repository
Personal Project | Agentic AI, Developer Tools

Conductor

Developing an agentic AI dashboard for software development and workflow optimization, with a focus on coordinating tasks, surfacing project state, supporting debugging workflows, and improving how developers move from intent to verified changes.

UNM | MPI, NumPy, mpi4py

Parallel Principal Component Analysis

Implemented distributed PCA with rank-local data sharding, collective covariance aggregation, eigenpair broadcast, optional standardization, and scalability logging for large datasets.

Scientific ML | Python, Dask, PyTorch

Large-File Inference Workflow

Built a slabbed and microbatched inference path for large scientific files, replacing dense full-file processing with overlap-aware chunking, manifest tracking, and fast validation outputs for high-throughput and near-real-time workloads.

Graduate Project | Computer Vision

Real-Time Emotion Recognition

Trained and evaluated CNN-based models using large public image datasets, applying augmentation, regularization, feature extraction, and real-time video processing techniques.

Graduate Project | Data Mining

Financial Risk Modeling

Developed supervised learning pipelines for probability-of-default estimation with feature engineering, hyperparameter tuning, stress testing, and error analysis.

Software Engineering | Databases

Scientific Data Access Tools

Designed database-backed analysis tools for large scientific collections, with searchable metadata, plotting, export workflows, and reproducible input generation for modeling studies.

AI/ML | Image and Signal Data

Segmentation and Detection Models

Applied CNNs, U-Net style segmentation, clustering, and classical image processing to improve detection and denoising in scientific imagery and signal-derived products.

Scientific Software | Workflow Reliability

Reproducible Pipeline Hardening

Built instrumentation, validation checks, runtime tracing, and portable execution paths for scientific data workflows, improving reproducibility across local, Linux, and HPC-style environments.

Background

Experience and Education

2026 - Present

Data Scientist 2, Los Alamos National Laboratory

Develops data acquisition, analysis, automation, signal processing, supervised and unsupervised ML, generative AI, and HPC-integrated workflows for engineering test and validation environments using public-safe scientific computing methods.

2023 - 2026

Software Engineer, Space Dynamics Laboratory

Built data pipelines, database-backed web tools, scientific analysis workflows, and automation software for aerospace and remote-sensing research settings.

2024 - 2025

Technical Recognition and Presentation

Received internal recognition for user-centered engineering, data tooling, documentation, and technical communication; presented telemetry anomaly-detection methods at a public technical venue.

2019 - 2020

Teaching Assistant, University at Buffalo

Supported lab and recitation instruction, grading, exam administration, and student mentoring for undergraduate computer science coursework.

University of New Mexico

Ph.D. in Computer Science, expected 2028; advisor: Dr. Amanda Bienz

University at Buffalo

M.P.S. in Data Sciences and Applications, 2023

University at Buffalo

B.A. in Mathematics, Computing and Applied Mathematics, 2021

Methods

Technical Areas

Python C++ C# SQL MATLAB Julia PyTorch TensorFlow Scikit-learn NumPy Pandas Dask Slurm MPI PostgreSQL SQLite NetCDF HDF5 Docker Linux Git Time Series Spectral Analysis Visualization

Contact

Open to research conversations and technical collaboration.

For public research, academic, or professional communication, email is the best way to reach me.

Public Information Note

The descriptions on this site are intentionally limited to public-safe topics. They omit sensitive access details, restricted system names, operational parameters, non-public datasets, and program-specific details.