Nazih Kalo's Resume

Nazih Kalo

Data Scientist / Engineer focused on building high-quality data products and ML systems.

Brooklyn, NY

About

8+ years building analytics & ML platforms from 0→1. Scaled experimentation, real-time analytics, and recommender systems powering millions of users. Deeply interested in reinforcement learning, crypto, and decentralized technologies—always learning something new to stay sharp.

Work Experience

Phantom

Aug 2023 - Present

Staff Data Scientist

  • Established company-wide data quality KPIs (coverage, freshness, accuracy, drift) across user-generated content and behavioral signals; implemented automated validation and anomaly detection with dbt + Python
  • Built near-real-time analytics pipelines (ClickHouse + Snowflake) reducing data latency from minutes → seconds, enabling faster feedback loops for ML-driven rankings
  • Scaled experimentation from 0 → 100s of A/Bs per quarter, introducing guardrails, selective holdouts, and post-launch monitoring tied to data quality changes
  • SF / NYC
  • Data Science
  • ML
  • Python
  • dbt

CyberConnect

May 2022 - Aug 2023

Head of Data

  • Built data platform 0 → 1 using Airflow, Databricks, & dbt; dataset validation and monitoring for ML consumption
  • Shipped recommendation system to 100k+ users owning offline metrics (precision/recall/coverage) & experimentation
  • Designed dataset audits and bias analyses across chains, regions, and user cohorts to ensure fair exposure and robust model behavior
  • Implemented embedding-based deduplication and similarity search (Pinecone) to improve training data diversity and reduce noise
  • Translated ambiguous product goals into measurable data quality and ML performance roadmaps
  • San Francisco
  • Data Engineering
  • ML
  • Python
  • Spark

Scale AI

Sep 2020 - May 2022

Product/Data Analyst & Data Engineer

  • Owned pipelines for large-scale data extraction and labeling programs supporting Fortune-500 ML systems
  • Led ground-truth quality initiatives: error analysis, annotator agreement metrics, guideline revisions, and automated QA checks
  • Reduced LiDAR and video labeling time by 34% by improving ML pre-labels, redesigning pipelines, and enabling lower-spec devices
  • Ran cost-quality tradeoff experiments on labeling workflows, reducing variance ~50% while maintaining accuracy
  • Built monitoring for label latency, coverage gaps, and drift, enabling faster iteration with ML teams
  • San Francisco
  • Data Engineering
  • Product
  • Python

Hive AI

Jun 2020 - Sep 2020

Product Analyst

  • Owned dataset → training → deployment lifecycle for ML moderation models
  • Increased model F1 by 24% via targeted error mining, label audits, and human-in-the-loop reviews
  • Defined production SLAs and post-launch monitoring for model and data quality
  • San Francisco
  • ML
  • Product

Apple

2018 - 2018

Operations Intern

  • Built data pipelines integrating internal & vendor data to reduce spend forecasts latency from 168 to 24hrs
  • Managed data for $50M budget for iPhone XR dev builds and identified $1M fraudulent invoices through analysis
  • Cupertino
  • Data Analysis
  • Operations

Education

University of Chicago

2019 - 2020
MSc Data Science

University of California, Berkeley

2014 - 2017
B.A Economics

Skills

  • Python
  • SQL
  • dbt
  • Spark
  • Airflow/Dagster
  • AWS
  • GCP
  • Machine Learning
  • NLP
  • Data Engineering
  • GraphQL
  • React/TypeScript