Nazih Kalo

Data Scientist / Engineer focused on building high-quality data products and ML systems.

About

8+ years building analytics & ML platforms from 0→1. Scaled experimentation, real-time analytics, and recommender systems powering millions of users. Deeply interested in reinforcement learning, crypto, and decentralized technologies—always learning something new to stay sharp.

Work Experience

Phantom
SF / NYC
Data Science
ML
Python
dbt

Aug 2023 - Present

Staff Data Scientist

Established company-wide data quality KPIs (coverage, freshness, accuracy, drift) across user-generated content and behavioral signals; implemented automated validation and anomaly detection with dbt + Python
Built near-real-time analytics pipelines (ClickHouse + Snowflake) reducing data latency from minutes → seconds, enabling faster feedback loops for ML-driven rankings
Scaled experimentation from 0 → 100s of A/Bs per quarter, introducing guardrails, selective holdouts, and post-launch monitoring tied to data quality changes

SF / NYC
Data Science
ML
Python
dbt

CyberConnect
San Francisco
Data Engineering
ML
Python
Spark

May 2022 - Aug 2023

Head of Data

Built data platform 0 → 1 using Airflow, Databricks, & dbt; dataset validation and monitoring for ML consumption
Shipped recommendation system to 100k+ users owning offline metrics (precision/recall/coverage) & experimentation
Designed dataset audits and bias analyses across chains, regions, and user cohorts to ensure fair exposure and robust model behavior
Implemented embedding-based deduplication and similarity search (Pinecone) to improve training data diversity and reduce noise
Translated ambiguous product goals into measurable data quality and ML performance roadmaps

San Francisco
Data Engineering
ML
Python
Spark

Scale AI
San Francisco
Data Engineering
Product
Python

Sep 2020 - May 2022

Product/Data Analyst & Data Engineer

Owned pipelines for large-scale data extraction and labeling programs supporting Fortune-500 ML systems
Led ground-truth quality initiatives: error analysis, annotator agreement metrics, guideline revisions, and automated QA checks
Reduced LiDAR and video labeling time by 34% by improving ML pre-labels, redesigning pipelines, and enabling lower-spec devices
Ran cost-quality tradeoff experiments on labeling workflows, reducing variance ~50% while maintaining accuracy
Built monitoring for label latency, coverage gaps, and drift, enabling faster iteration with ML teams

San Francisco
Data Engineering
Product
Python

Hive AI
San Francisco
ML
Product

Jun 2020 - Sep 2020

Product Analyst

Owned dataset → training → deployment lifecycle for ML moderation models
Increased model F1 by 24% via targeted error mining, label audits, and human-in-the-loop reviews
Defined production SLAs and post-launch monitoring for model and data quality

San Francisco
ML
Product

Apple
Cupertino
Data Analysis
Operations

2018 - 2018

Operations Intern

Built data pipelines integrating internal & vendor data to reduce spend forecasts latency from 168 to 24hrs
Managed data for $50M budget for iPhone XR dev builds and identified $1M fraudulent invoices through analysis

Cupertino
Data Analysis
Operations

Education

University of Chicago

2019 - 2020

MSc Data Science

University of California, Berkeley

2014 - 2017

B.A Economics

Skills

Python
SQL
dbt
Spark
Airflow/Dagster
AWS
GCP
Machine Learning
NLP
Data Engineering
GraphQL
React/TypeScript

Side projects

NFT Recommendation Engine

A recommendation engine built using NFT trading history data & collaborative filtering

Python
ML
Collaborative Filtering

XMTP Chat Integration

Private peer-to-peer chat app combining CyberConnect's social network with XMTP's messaging protocol

React
TypeScript
Web3

Steel Defect Detection

Detecting Defects in Steel Manufacturing Line using Computer Vision

Python
Computer Vision
Deep Learning

Nazih Kalo's Resume