Valedictorian Faculty of IT in IUH 2025

Cao Phan Khanh Duy

Data Scientist / AI Engineer

Data Scientist with ∼2 years of experience building machine learning systems, product analytics pipelines, and predictive models for AI products and marketing optimization. Experienced in developing end-to-end ML workflows from large-scale data processing to model deployment, generating measurable business impact across SaaS products and data-driven platforms.

10+
Research Papers
3.77
GPA / 4.00
#1
Faculty Rank

Experience

Data Scientist (Full-time)

MISA Corporation

12/2025 – Present

Hanoi, Vietnam

  • Customer Retention Modeling: Developed and deployed ML models to predict customer churn and identify upsell/cross-sell opportunities across 18 SaaS products; built a dynamic SQL feature-generation framework for large-scale datasets (up to 2B records), improving accuracy by 5–20% and reducing development time by ~50%
  • Marketing Analytics & Lead Forecasting: Built centralized pipeline integrating Google Ads and TikTok Ads APIs; developed predictive models for lead/SQL forecasting and Apache Superset dashboards; contributed to 20B VND in annual marketing cost savings
  • Product Analytics: Conducted statistical analysis of user behavior to identify aha moments and habit-forming patterns across 18 SaaS products; generated insights to guide feature prioritization, applied to optimize UX across two AI platforms
Machine Learning SQL Apache Superset Google Ads API TikTok Ads API Predictive Modeling

Data Scientist (Full-time)

Nam Viet Media (MS Digital)

02/2025 – 11/2025

Ho Chi Minh City, Vietnam

  • Music Recommendation System: Built a personalized music recommendation system for MuVi.vn (30K+ MAU) by combining user behavior signals and audio features
  • Automated KPI Pipelines: Developed automated KPI evaluation pipelines for product analytics (retention, conversion, session depth), reducing manual reporting time by ~90% and enabling real-time stakeholder dashboards
  • AI Content Automation: Designed an AI-powered content automation system for multi-platform publishing with automated ingestion pipeline, generative AI APIs for video synthesis and LLM-based content generation, and Playwright browser automation; achieved ~300M VND/year operational cost savings with 5× higher content output
Recommendation Systems Generative AI Playwright LLM Multimodal AI Automation

Research Intern

CIS Lab – National Chung Cheng University

07/2024 – 12/2024

Chiayi, Taiwan

  • Audio Anti-spoofing (Q2) — Developed multi-channel models using STFT, CQT, and MFCC features with split-attention architectures; conducted experimental design and quantitative evaluation
  • Identity Cues (Vision) — Implemented skeleton-based gait recognition achieving 99% accuracy; evaluated pipeline reproducibility in a multinational research team
  • Built experiment pipelines in PyTorch and TensorFlow, managing GPU workloads and ensuring experiment reproducibility
PyTorch TensorFlow Audio Processing Computer Vision Deep Learning

Education

Bachelor of Engineering – Data Science

Industrial University of Ho Chi Minh City

08/2021 – 11/2025

Ho Chi Minh City, Vietnam

Valedictorian

October 2025 Ceremony – Ranked 1st overall among all university graduates

3.77 / 4.00
GPA
#1
Faculty of IT
Scholarships
  • Ranked 1 in Faculty of IT (4,000+ students)
  • Thesis: Proposed a Quantum-Inspired Algorithm – Grade 10/10
  • 6× merit-based scholarships (Top 5% over consecutive terms)

Patents & Publications

First author of 10+ publications including Q1/Q2 journals and conferences

Q2
Published

VDD: Voice deepfake detection with three-channel acoustic representations and advanced split-attention networks

K.-D. Cao-Phan*, Q. T. D. Dai, and V.-L. Nguyen

Signal, Image and Video Processing, vol. 19, p. 537, 2025

0.33% EER on ASVSpoof2019 118K params

DOI: 10.1007/s11760-025-04126-3
Conference
Q2
Published

MBAAF: Multi-Branch Lightweight Architecture for Audio Spoofing Detection with Temporal Gating and CBAM-Based Attention Fusion

K.-D. Cao-Phan*, and P. D. Thi

The 2025 18th International Conference on Machine Intelligence for AI Applications (MIWAI), Lecture Notes in Computer Science (LNCS), 2025

0.15% EER on ASVSpoof2019 0.092% on ITW 135K params

DOI: 10.1007/978-981-95-4957-3_30
Conference
Published

Application of temporal association rules to trading in the Vietnamese stock market

K.-D. Cao-Phan*, C. K. Nguyen, T. B. T. Phan, N. A. Nguyen, T. T. Truong, and V. H. Nguyen

Young Scientists Conference, vol. 6, Industrial University of Ho Chi Minh City, 2024

Team Leader First Prize Association Rules + K-means/GA/Percentile Stock Price Prediction
Conference
Published

Sentiment analysis for Vietnamese book reviews using deep learning approaches

K.-D. Cao-Phan*, T. D. Le, and M. T Kieu

Young Scientists Conference, vol. 6, Industrial University of Ho Chi Minh City, 2024

Team Leader Real Data Collection (>12K) LSTM + Transformer
Conference
Q4
Accepted

Aspect-Based Sentiment Analysis for Stock Price Movement Prediction

K. N. Dang, K. C. Nguyen, L. V. Truong, K.-D. Cao-Phan*, and T. M. Pham

The 14th International Symposium On Information And Communication Technology (SOICT), CCIS, 2025

Multi-aspect Analysis Stock Prediction

DOI: 10.13140/RG.2.2.27273.92004
Q1

HQSMA: A quantum-enhanced hybrid attention mechanism for efficient anti-spoofing in automatic speaker verification

K.-D. Cao-Phan*, and T. P. Dang

Expert Systems with Applications Journal, 2025

Thesis Quantum AI Perfect Score (10/10) 0.16% EER on ASVSpoof2019
Q1

RIGID: Real-time indexing of humans via gait identification and detection

X. H. T. Dao, K.-D. Cao-Phan (co-first author), and V.-L. Nguyen

Multimedia Tools and Applications, 2025

Co-first Author 96.42% Accuracy Outperforms GaitNet, GaitPart, GaitSet
Q1

PBA-Net: A Dual-Branch Architecture with Positional Bias Attention and Multi-Scale CNN for Vietnamese Aspect-Based Sentiment Analysis

K.-D. Cao-Phan*, D. T. Le, and K. V. Cao

Engineering Applications of Artificial Intelligence Journal, 2025

SOTA on UIT ABSA Dataset Restaurant & Hotel Domain Data Limited Robust

DOI: 10.2139/ssrn.5507982
Q1

ViGSA: A Multi-Task Aspect-Based Sentiment Analysis Model with Auxiliary Embedding and Global Sentiment Integration for Vietnamese Restaurant Reviews

D. X. Tran, K. V. Cao*, T. Nguyen-Huu, H.-T. D. Xuan, H. Nguyen-Viet, and K.-D. Cao-Phan*

Expert Systems with Applications, 2025

SOTA on VLSP 2018 Document Level

DOI: 10.2139/ssrn.5608837
Q2

Fast and Accurate Meat Freshness Classification Using Depthwise Separable Convolution and SPPF

K.-D. Cao-Phan*, H.-K. Dang†, H.-T. Tran-Quynh, M.-P. Lam-Doan, H.-T. Luu, and H.-Q. Bui*

Food Analytical Method Journal, 2026

98% Accuracy 2-3ms inference 224x224 images

DOI: 10.21203/rs.3.rs-8630971/v1
Q1
Ongoing

MobileNetV2-13 with MiniCSAM: A Lightweight Attention-Enhanced Model for Meat Freshness Assessment

K.-D. Cao-Phan*, D. H. Khang, T. H. H. Tran, D. N. M. Khang, and H. Q. Bui

Smart Agricultural Technology, 2025

>99% Accuracy ~550K params

Featured Projects

06/2024 – 12/2024

Financial Signal Generator for Vietnamese Markets

Research project commissioned by the State Bank of Vietnam and conducted in collaboration with Ho Chi Minh City University of Banking. Built a hybrid forecasting system combining macroeconomic signals (FOMC minutes, interest rates) and market sentiment data (35K+ reviews) to predict short-term movements in the VND exchange rate and stock trends.

  • Hybrid forecasting with XGBoost, RNN, and Transformer models
  • Macroeconomic signal extraction from FOMC minutes
  • Sentiment analysis on 35K+ market reviews
Python XGBoost RNN Transformer BeautifulSoup MLflow GCP
12/2024 – 04/2025

Vietnamese Aspect-based Sentiment Analysis

Led model development in a research collaboration with Nguyen Tat Thanh University, DayOne AI Lab and Industrial University of Ho Chi Minh City. Built an end-to-end NLP system for Vietnamese review analysis (restaurants, hotels, e-commerce).

  • Multi-task Transformer with auxiliary embeddings
  • Dual-branch CNN with positional attention
  • Achieved SOTA performance on UIT-ABSA and VLSP benchmarks
Python PyTorch Hugging Face PhoBERT InfoXLM FastAPI PostgreSQL

Skills

Research

Experimental design, model evaluation, error analysis

Programming

Python, SQL

Frameworks

PyTorch, TensorFlow, scikit-learn, HuggingFace, PennyLane, Playwright, Selenium

Data & Pipelines

Apache Beam, Spark, PostgreSQL, ClickHouse, Airflow, n8n

MLOps / Deployment

Docker, monitoring & drift detection

Honors, Awards & Activities

Vietnam University Student Math Olympiad

Bronze Medal, Algebra Division

Vietnam Mathematical Society (VMS)

2023, 2025

Twice awarded national bronze medal in Algebra at national level

Excellent Student of the Year

Faculty of Information Technology

Industrial University of Ho Chi Minh City

2023

Selected as 1 of 2 top students in the faculty for outstanding academic performance

Satellite Application Proposal Presenter

International Research Collaboration

IUH (Vietnam) – Hannam University (Korea)

2024

Proposed novel satellite application ideas with international team

Finalist – DIVE2025

Data Insight Visualization Event

Busan, Korea

2025

Represented Vietnam (top 10) and presented interactive data visualization to international judges

Presenter – 3MT Competition

Three Minute Thesis

APSIPA ASC – Asia-Pacific Signal and Information Processing Association

2025

Delivered 3-minute presentation summarizing complex research to broad audience

Get In Touch

I'm always open to discussing new opportunities, collaborations, or research ideas