Open to Data Science, Analysis, AI/ML engineering and applied research roles — let's talk

Data Scientist · AI Builder · Researcher

Hi, I'm Siddhi
I build AI that
actually knows you.

Data scientist and ML researcher with 4 peer-reviewed publications. Currently building MindMirror, a personal AI that learns not just what you know, but how you think.

4 Publications
3 Research venues
MS Computer Science
Curiosity

I'm a data scientist and AI builder with an MS in Computer Science (Data Science specialization) from Seattle University and a BE in Computer Engineering from the University of Pune. My research lives at the intersection of data quality and model behavior — the idea that better AI starts with better data, not just bigger models.

As part of a research team at Seattle University, I contributed to work on synthetic data generation for class-imbalanced medical datasets, published across IEEE Access, DaWaK, and DASFAA. Currently I'm building MindMirror — a personal AI memory system that captures how you think, not just what you know.

"I believe the next breakthrough in AI isn't a bigger model — it's a more personal one."

Outside of building, I create content on self-discovery and unlearning on Instagram, and I'm learning German — because apparently one language wasn't enough of a challenge.

Python SQL Claude API RAG & Vector DBs LLM Applications AWS GCP Power BI Streamlit ChromaDB LangChain PyTorch Scikit-learn Flask

May 2025 — Present

Data Analyst

Boston Financial Advisory Group

Building analytical systems and data pipelines for financial advisory workflows.

Feb. 2023 — Apr. 2025

Data Science Graduate Researcher

Seattle University

Contributed to the SDGnE research team, synthetic data generation for class-imbalanced medical datasets, published across IEEE Access, DaWaK, and DASFAA. Developed hands-on ML expertise on Jetstream2/NVIDIA A100 GPU infrastructure.

Jan. 2020 — Jan. 2022

Software Engineer

M.B.B. Consulting

Architected REST APIs and a Flask backend connecting client applications to PostgreSQL at 99.9% uptime, cutting quote generation time by 50%. Automated ETL pipelines processing 1M+ records with Python, SQL, and AWS — and ran A/B tests that contributed to $40K in projected annual gains.

🔨

Building

MindMirror: A personal AI memory system

🧠

Current obsession

Can an AI learn not what you know, but how you think?

📖

Currently reading

The Mountain Is You by Brianna Wiest

🌍

Learning

German

🎯

Working toward

AI/ML engineering & applied research roles

✍️

Writing

First post — building MindMirror in public

Building now

MindMirror

A personal AI system that learns how you think, not just what you know. Uses RAG and structured personal context to make every interaction feel like working with someone who has known you for months — without re-explaining yourself every time.

Claude API ChromaDB RAG Python Streamlit
Published · Open source

SDGnE Python Package

Open source Python package stemming from the SDGnE research project — lets users generate synthetic data from our designed algorithm for rare event and imbalanced classification tasks. Published research, usable tool.

Python Scikit-learn Synthetic Data Data-centric AI
View docs ↗
Personal project

Trail Recommendation AI Agent

An end-to-end AI agent that monitors calendar events, retrieves real-time weather data, and reasons across a personal trail database to deliver context-aware hiking recommendations — demonstrating full agent orchestration with tool use, memory, and multi-API reasoning.

n8n ChatGPT API Calendar API Weather API LLM Agents
Personal project

RAG Chatbot with Agentic Pipeline

A context-aware RAG chatbot built with LangChain and LLaMA3 fine-tuned with LoRA. Focused on production readiness — evaluating outputs critically, not just getting something that runs.

Python LangChain LLaMA3 Pinecone Streamlit
Research · Published

WalkExplorer

A cloud-hosted multimodal AI tool on GCP using CLIP transformers and OpenStreetMap data to assess urban walkability. Benchmarked against human ratings with automated test validation — published at DASFAA 2026.

GCP Python CLIP Multimodal AI OSM
Read paper ↗
Learning project

Transformer LLM from Scratch

Trained a GPT model on the Shakespeare dataset using nanoGPT with character-level tokenization and AdamW. Achieved validation loss ~1.8 — built to understand transformer architecture and ML math from first principles, not just use the API.

PyTorch Python NLP Transformers

Books, papers, essays — things that have shaped how I think about AI, building, and being human. Updated whenever something genuinely moves me.

Currently reading

Book

The Mountain Is You

Brianna Wiest

On self-sabotage and why we get in our own way — uncomfortably relevant.

Add favourite

Book

Add a book that changed how you think

Author name

One sentence on what it gave you.

Recommended

Paper

Add an Anthropic or AI paper that genuinely interested you

Author(s)

Why it stuck with you.

Recommended

Essay

Add an essay or article that shaped your thinking

Author or publication

What it changed for you.

I write about building AI products, thinking in public, and the honest reality of transitioning into ML. No tutorials — just observations from someone figuring it out in real time.

Coming soon

First post dropping soon — follow on X to get notified.

Follow along

I'm currently open to AI/ML engineering, applied research, and data science roles. If you're building something interesting or just want to talk AI — I'd love to hear from you.