Nazih Kalo's Resume

Nazih Kalo

Data Scientist / Engineer focused on building high-quality data products and ML systems.

Brooklyn, NY

About

8+ years building analytics & ML platforms from 0→1. Scaled experimentation, real-time analytics, and recommender systems powering millions of users. Deeply interested in reinforcement learning, crypto, and decentralized technologies—always learning something new to stay sharp.

Work Experience

Phantom logoPhantom

Aug 2023 - Present

Staff Data Scientist

  • Led 0→1 build of a multi-agent “Data Science Agent” (Google ADK) that plans, executes, and cross-validates analytical reasoning across Snowflake, Amplitude, Sigma, and codebase sources; implemented automated QA loops, source bake-offs, and cost guardrails to improve answer reliability and reduce hallucinations
  • Built near-real-time analytics pipelines (ClickHouse + Snowflake) powering Trending Tokens and Top Apps lists for 15M+ MAU, cutting data latency from mins→seconds and enabling 8 product teams to ship without data bottlenecks
  • Built experimentation platform from 0→100s of A/Bs per quarter; designed guardrails, selective holdouts, and bias-mitigation frameworks to ensure causal validity and prevent metric contamination (42% win rate)
  • Data Lead: mentored 7 DS/DEs, defined platform roadmap, SLAs, and data quality bars (tests/lineage/CDC)
  • Owned Series C ($150M) metrics/diligence: built market-share analytics across DEX volume, TVL, and user flows vs dozens of competitors; informed investor narrative and supported valuation by quantifying share + growth cohorts
  • SF / NYC
  • Data Science
  • ML
  • Python
  • dbt

CyberConnect logoCyberConnect

May 2022 - Aug 2023

Head of Data

  • Built data platform 0 → 1 using Airflow, Databricks, & dbt; dataset validation and monitoring for ML consumption
  • Shipped recommendation system to 100k+ users owning offline metrics (precision/recall/coverage) & experimentation
  • Implemented embedding-based deduplication and similarity search (Pinecone) to improve training data diversity and reduce noise
  • Translated ambiguous product goals into measurable data quality and ML performance roadmaps
  • San Francisco
  • Data Engineering
  • ML
  • Python
  • Spark

Scale AI logoScale AI

Sep 2020 - May 2022

Product/Data Analyst & Data Engineer

  • Owned pipelines for large-scale data extraction and labeling programs supporting Fortune-500 ML systems
  • Led ground-truth quality initiatives: error analysis, annotator agreement metrics, guideline revisions, and automated QA checks
  • Reduced LiDAR and video labeling time by 34% by improving ML pre-labels, redesigning pipelines, and enabling lower-spec devices
  • Ran cost-quality tradeoff experiments on labeling workflows, reducing variance ~50% while maintaining accuracy
  • Built monitoring for label latency, coverage gaps, and drift, enabling faster iteration with ML teams
  • San Francisco
  • Data Engineering
  • Product
  • Python

Hive AI logoHive AI

Jun 2020 - Sep 2020

Product Analyst

  • Owned dataset → training → deployment lifecycle for ML moderation models
  • Increased model F1 by 24% via targeted error mining, label audits, and human-in-the-loop reviews
  • Defined production SLAs and post-launch monitoring for model and data quality
  • San Francisco
  • ML
  • Product

Apple logoApple

2018 - 2018

Operations Intern

  • Built data pipelines integrating internal & vendor data to reduce spend forecasts latency from 168 to 24hrs
  • Managed data for $50M budget for iPhone XR dev builds and identified $1M fraudulent invoices through analysis
  • Cupertino
  • Data Analysis
  • Operations

Education

University of Chicago

2019 - 2020
MSc Data Science

University of California, Berkeley

2014 - 2017
B.A Economics

Skills

  • Python
  • SQL
  • dbt
  • Spark
  • Airflow/Dagster
  • AWS
  • GCP
  • Machine Learning
  • NLP
  • Data Engineering
  • GraphQL
  • React/TypeScript