Phân loại sản phẩm tự động bằng AI: Cách gắn tag màu sắc, kiểu dáng, chất liệu cho hàng triệu SKU từ hình ảnh! - Mai Văn Hải - Kiến thức Triển khai nền tảng tích hợp AI

Hệ thống tự động gắn tag (Auto‑tagging) cho hàng triệu SKU bằng mô hình CLIP

Áp dụng cho các nền tảng thương mại điện tử có quy mô 100‑1000 tỷ VNĐ/tháng

1. Giới thiệu chung

Theo Statista 2024, số lượng SKU bán lẻ trực tuyến toàn cầu đã vượt 30 tỷ và dự kiến tăng 15 % mỗi năm. Ở Việt Nam, Cục TMĐT báo cáo 2024 có hơn 12 triệu SKU đang hoạt động trên các sàn lớn, chiếm ≈ 45 % tổng doanh thu thương mại điện tử.

Việc gắn tag (màu sắc, kiểu dáng, chất liệu…) thủ công gây tốn kém, độ trễ cao và sai lệch dữ liệu. Ứng dụng OpenAI CLIP (Contrastive Language‑Image Pre‑training) cho phép trích xuất thuộc tính trực quan từ ảnh sản phẩm và tự động tạo tag với độ chính xác ≥ 92 % (theo Gartner 2025 AI Adoption Survey).

Bài viết này cung cấp hướng dẫn thực thi từ khâu thiết kế kiến trúc, lựa chọn công nghệ, dự toán chi phí, tới triển khai chi tiết – để junior dev/BA/PM có thể “cầm lên làm” ngay hôm nay.

2. Kiến trúc tổng quan

+-------------------+      +-------------------+      +-------------------+
|   Source System   | ---> |   Image Ingest    | ---> |   Pre‑process     |
| (ERP, PIM, CMS)   |      |   (Kafka)         |      | (Resize, Crop)   |
+-------------------+      +-------------------+      +-------------------+
                                   |                         |
                                   v                         v
                         +-------------------+      +-------------------+
                         |   CLIP Inference  | ---> |   Tag Generator   |
                         |   (GPU Service)   |      |   (Python)        |
                         +-------------------+      +-------------------+
                                   |                         |
                                   v                         v
                         +-------------------+      +-------------------+
                         |   Tag Store (ES)  | <--- |   Sync Service    |
                         +-------------------+      +-------------------+
                                   |
                                   v
                         +-------------------+
                         |   Downstream API  |
                         | (GraphQL / REST)  |
                         +-------------------+

Kafka: luồng ảnh từ các hệ thống nguồn.
GPU Service: container Docker chạy mô hình CLIP (ViT‑B/32).
Elasticsearch (ES): lưu trữ tag dạng inverted index, hỗ trợ tìm kiếm nhanh.
Sync Service: đồng bộ tag về PIM/ERP qua API.

3. Lựa chọn công nghệ (Tech Stack Comparison)

#	Thành phần	Lựa chọn A (Open‑source)	Lựa chọn B (Managed Cloud)	Lựa chọn C (Hybrid)	Lựa chọn D (Serverless)
1	Mô hình CLIP	PyTorch + HuggingFace	SageMaker JumpStart	TorchServe on VM	Vertex AI Prediction
2	Message Queue	Apache Kafka (self‑host)	Confluent Cloud	RabbitMQ on K8s	Google Pub/Sub
3	Storage Tag	Elasticsearch 8.x	OpenSearch Service (AWS)	Elastic Cloud (Elastic.co)	Algolia
4	Orchestrator	Docker‑Compose + systemd	Amazon ECS	Kubernetes (EKS)	Cloud Run
5	CI/CD	GitHub Actions	GitLab CI (managed)	Jenkins X	GitHub Actions + Cloud Build
6	Monitoring	Prometheus + Grafana	CloudWatch	Loki + Grafana	Stackdriver
7	Security	OPA Gatekeeper	IAM (cloud)	Istio + mTLS	Cloud IAM + Cloud Armor
8	Cost (30 tháng)	≈ $28 000	≈ $45 000	≈ $35 000	≈ $22 000

⚡ Lưu ý: Lựa chọn D (Serverless) giảm chi phí hạ tầng nhưng giới hạn thời gian inference (≤ 30 s). Đối với SKU > 5 triệu, nên dùng A hoặc C để kiểm soát GPU.

4. Dự toán chi phí chi tiết 30 tháng

Hạng mục	Năm 1	Năm 2	Năm 3	Tổng (30 tháng)
GPU instances (p3.2xlarge)	$12 200	$9 800	$6 500	$28 500
Kafka (3 node)	$2 400	$2 200	$1 800	$6 400
Elasticsearch (managed)	$3 600	$3 200	$2 900	$9 700
Storage (S3/Blob)	$1 200	$1 100	$900	$3 200
CI/CD & monitoring	$800	$750	$600	$2 150
Nhân sự (DevOps 0.5 FTE)	$4 800	$4 500	$4 200	$13 500
Tổng	$24 ?	$21 ?	$17 ?	$?

🛡️ Bảo mật: Chi phí bảo mật (WAF, DDoS) đã bao gồm trong “GPU instances”.

5. Các bước triển khai (6 Phase)

Phase 1 – Khảo sát & Định nghĩa yêu cầu

Mục tiêu	Công việc con	Người chịu trách nhiệm	Thời gian (tuần)	Dependency
Xác định thuộc tính tag	Phân tích catalog, thống kê màu, chất liệu	Business Analyst	1‑2	–
Định nghĩa schema tag	Thiết kế ES mapping, chuẩn JSON‑LD	Solution Architect	2‑3	Phase 1‑1
Đánh giá dữ liệu ảnh	Kiểm tra độ phân giải, định dạng	Data Engineer	3‑4	Phase 1‑2
Lập kế hoạch GPU	Lựa chọn instance, dự toán chi phí	Cloud Engineer	4‑5	Phase 1‑3

Phase 2 – Xây dựng môi trường phát triển

Mục tiêu	Công việc con	Người chịu trách nhiệm	Thời gian (tuần)	Dependency
Cài Docker‑Compose cho CLIP	Viết `docker-compose.yml`	DevOps Engineer	1‑2	Phase 1‑4
Thiết lập Kafka cluster	Helm chart + PVC	Cloud Engineer	2‑3	Phase 2‑1
Deploy Elasticsearch	Docker + security hardening	DevOps Engineer	3‑4	Phase 2‑1
CI/CD pipeline	GitHub Actions workflow	DevOps Engineer	4‑5	Phase 2‑2

Phase 3 – Phát triển mô hình & API

Mục tiêu	Công việc con	Người chịu trách nhiệm	Thời gian (tuần)	Dependency
Fine‑tune CLIP trên dataset VN	Python script `train_clip.py`	ML Engineer	1‑3	Phase 2‑4
Xây dựng service inference (FastAPI)	`app/main.py`	Backend Engineer	3‑4	Phase 3‑1
Định nghĩa endpoint `/tag`	OpenAPI spec	Backend Engineer	4‑5	Phase 3‑2
Kiểm thử unit & integration	PyTest + Postman	QA Engineer	5‑6	Phase 3‑3

Phase 4 – Xây dựng pipeline xử lý ảnh

Mục tiêu	Công việc con	Người chịu trách nhiệm	Thời gian (tuần)	Dependency
Ingest ảnh từ PIM → Kafka	Producer script `producer.py`	Data Engineer	1‑2	Phase 3‑4
Pre‑process (resize, crop)	Celery worker `preprocess_task.py`	Backend Engineer	2‑3	Phase 4‑1
Inference & tag generation	Worker `clip_worker.py`	ML Engineer	3‑4	Phase 4‑2
Đẩy tag vào Elasticsearch	Sync service `es_sync.py`	Backend Engineer	4‑5	Phase 4‑3

Phase 5 – Đồng bộ & triển khai production

Mục tiêu	Công việc con	Người chịu trách nhiệm	Thời gian (tuần)	Dependency
Deploy trên Kubernetes (EKS)	Helm chart `auto-tagging`	Cloud Engineer	1‑2	Phase 4‑5
Cấu hình autoscaling (HPA)	HorizontalPodAutoscaler	Cloud Engineer	2‑3	Phase 5‑1
Thiết lập CI/CD cho prod	GitHub Actions + ArgoCD	DevOps Engineer	3‑4	Phase 5‑2
Kiểm thử tải (load test)	k6 script `load_test.k6`	QA Engineer	4‑5	Phase 5‑3

Phase 6 – Go‑live & Transfer

Mục tiêu	Công việc con	Người chịu trách nhiệm	Thời gian (tuần)	Dependency
Đào tạo người dùng (PIM)	Workshop + tài liệu	Business Analyst	1‑2	Phase 5‑4
Chuyển giao tài liệu	Bảng “Tài liệu bàn giao”	Project Manager	2‑3	Phase 6‑1
Kiểm tra cuối (UAT)	Test case checklist	QA Engineer	3‑4	Phase 6‑2
Go‑live & monitoring	Enable alerts, rollback plan	DevOps Engineer	4‑5	Phase 6‑3

6. Rủi ro & phương án dự phòng

Rủi ro	Mức độ	Phương án B	Phương án C
GPU quá tải	Cao	Chuyển sang Spot Instances, giảm batch size	Sử dụng inference on‑demand (AWS Lambda + GPU)
Độ chính xác < 85 %	Trung bình	Thu thập thêm dữ liệu, fine‑tune lại	Thêm mô hình phụ (ResNet) để ensemble
Dữ liệu ảnh kém chất lượng	Cao	Áp dụng pipeline nâng cấp ảnh (enhance)	Loại bỏ SKU không đủ ảnh, yêu cầu nhà cung cấp
Mất kết nối Kafka	Trung bình	Deploy MirrorMaker 2 để sao chép topic	Chuyển sang Pub/Sub tạm thời
Vi phạm GDPR/PDPA	Cao	Mã hoá dữ liệu tại rest & in‑transit, audit log	Sử dụng Data Loss Prevention (DLP) service

7. KPI, công cụ đo & tần suất

KPI	Mục tiêu	Công cụ đo	Tần suất
Precision@k (k=5)	≥ 92 %	MLflow tracking	Hàng ngày
Latency (inference)	≤ 200 ms	Prometheus + Grafana	5 phút
Tag coverage	≥ 98 % SKU	Elasticsearch stats API	Hàng tuần
Error rate (pipeline)	≤ 0.5 %	Sentry + Loki	15 phút
Cost per 1 M tags	≤ $0.12	CloudWatch Cost Explorer	Hàng tháng

$\huge Precision@k=\frac{TP@k}{TP@k+FP@k}$
Giải thích: Precision@k đo tỉ lệ các tag đúng trong top k dự đoán của mô hình.

8. Checklist Go‑Live (42 item)

8.1 Security & Compliance

#	Mục kiểm	Trạng thái
1	TLS 1.3 cho tất cả endpoint	✅
2	IAM role least‑privilege	✅
3	Audit log bật trên ES & Kafka	✅
4	DLP scan dữ liệu ảnh	✅
5	Pen‑test OWASP Top 10	✅
…	…	…

8.2 Performance & Scalability

#	Mục kiểm	Trạng thái
13	HPA target CPU 60 %	✅
14	Autoscaling Kafka partitions	✅
15	Cache layer (Redis) cho tag lookup	✅
16	Load test ≥ 10 k RPS	✅
…	…	…

8.3 Business & Data Accuracy

#	Mục kiểm	Trạng thái
21	Precision@5 ≥ 92 %	✅
22	Tag coverage ≥ 98 % SKU	✅
23	Đối chiếu mẫu 1 % SKU với manual	✅
…	…	…

8.4 Payment & Finance

#	Mục kiểm	Trạng thái
27	Kiểm soát chi phí GPU < $0.12/M tags	✅
28	Billing alerts (threshold 80 %)	✅
…	…	…

8.5 Monitoring & Rollback

#	Mục kiểm	Trạng thái
33	Alert latency > 300 ms	✅
34	Snapshot ES trước deploy	✅
35	Rollback script `rollback.sh`	✅
…	…	…

9. Tài liệu bàn giao cuối dự án (15 tài liệu)

STT	Tên tài liệu	Người viết	Nội dung chính
1	Architecture Diagram	Solution Architect	ASCII & PlantUML diagram, component description
2	API Specification (OpenAPI 3.0)	Backend Engineer	Endpoint `/tag`, request/response schema
3	Data Model (Elasticsearch Mapping)	Data Engineer	Field types, analyzers, sample docs
4	Inference Service Dockerfile	DevOps Engineer	Build steps, environment variables
5	CI/CD Pipeline (GitHub Actions)	DevOps Engineer	Workflow YAML, secret handling
6	Kafka Topic Design	Data Engineer	Partition count, retention policy
7	Tag Generation Algorithm	ML Engineer	Fine‑tune steps, hyper‑parameters
8	Performance Test Report	QA Engineer	k6 results, bottleneck analysis
9	Security Assessment Report	Security Analyst	Pen‑test findings, remediation
10	Cost Model Spreadsheet	Finance Analyst	30‑month cost breakdown
11	Runbook – Deployment	Cloud Engineer	Helm install, rollback steps
12	Runbook – Monitoring	DevOps Engineer	Grafana dashboards, alert rules
13	User Guide – PIM Integration	Business Analyst	API call examples, error handling
14	Training Slides	Project Manager	Workshop agenda, Q&A
15	Project Closure Report	PM	KPI summary, lessons learned

10. Mã nguồn & cấu hình (≥ 12 đoạn)

10.1 Docker‑Compose (CLIP inference)

version: "3.8"
services:
  clip-service:
    image: ghcr.io/hai-ai/clip-inference:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    environment:
      - MODEL_NAME=ViT-B/32
      - BATCH_SIZE=32
    ports:
      - "8000:8000"
    volumes:
      - ./models:/app/models
    restart: always

10.2 Nginx reverse‑proxy

server {
    listen 80;
    server_name api.auto-tag.vn;

    location / {
        proxy_pass http://clip-service:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }

    # Rate limit 100 rps
    limit_req zone=api burst=20 nodelay;
}

10.3 FastAPI endpoint

from fastapi import FastAPI, File, UploadFile
from inference import predict_tags

app = FastAPI(title="Auto‑Tagging Service")

@app.post("/tag")
async def tag_image(file: UploadFile = File(...)):
    image_bytes = await file.read()
    tags = predict_tags(image_bytes)
    return {"tags": tags}

10.4 CLIP inference script (Python)

import torch
from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

def predict_tags(image_bytes):
    from PIL import Image
    import io
    image = Image.open(io.BytesIO(image_bytes)).convert("RGB")
    texts = ["màu đỏ", "màu xanh", "vải cotton", "vải polyester", "kiểu áo thun"]
    inputs = processor(text=texts, images=image, return_tensors="pt", padding=True)
    with torch.no_grad():
        logits_per_image = model(**inputs).logits_per_image
    probs = logits_per_image.softmax(dim=1).cpu().numpy()[0]
    top_idx = probs.argsort()[-3:][::-1]
    return [texts[i] for i in top_idx]

10.5 Celery worker (pre‑process)

from celery import Celery
from PIL import Image
import io

app = Celery('preprocess', broker='kafka://kafka:9092')

@app.task
def resize_image(image_bytes, size=(512, 512)):
    img = Image.open(io.BytesIO(image_bytes))
    img = img.resize(size, Image.ANTIALIAS)
    out = io.BytesIO()
    img.save(out, format='JPEG')
    return out.getvalue()

10.6 Kubernetes Deployment (Helm)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: clip-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: clip-service
  template:
    metadata:
      labels:
        app: clip-service
    spec:
      containers:
        - name: clip
          image: ghcr.io/hai-ai/clip-inference:latest
          resources:
            limits:
              nvidia.com/gpu: 1
          env:
            - name: MODEL_NAME
              value: "ViT-B/32"
          ports:
            - containerPort: 8000

10.7 Cloudflare Worker (Cache tag API)

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  const url = new URL(request.url)
  const cacheKey = new Request(url.toString(), request)
  const cache = caches.default
  let response = await cache.match(cacheKey)
  if (!response) {
    response = await fetch(`https://api.auto-tag.vn${url.pathname}`)
    response = new Response(response.body, response)
    response.headers.append('Cache-Control', 'public, max-age=3600')
    await cache.put(cacheKey, response.clone())
  }
  return response
}

10.8 GitHub Actions CI/CD

name: CI/CD Pipeline
on:
  push:
    branches: [ main ]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.10"
      - name: Install deps
        run: pip install -r requirements.txt
      - name: Run tests
        run: pytest -q
      - name: Build Docker image
        run: |
          docker build -t ghcr.io/hai-ai/clip-inference:${{ github.sha }} .
          echo ${{ secrets.GITHUB_TOKEN }} | docker login ghcr.io -u ${{ github.actor }} --password-stdin
          docker push ghcr.io/hai-ai/clip-inference:${{ github.sha }}
      - name: Deploy to EKS
        uses: aws-actions/eks-kubectl@v2
        with:
          cluster-name: auto-tag-cluster
          command: |
            helm upgrade --install clip-service ./helm/clip \
              --set image.tag=${{ github.sha }}

10.9 Medusa plugin (custom tag service)

// plugins/auto-tag/index.js
module.exports = (options) => ({
  routes: [
    {
      method: "POST",
      path: "/admin/auto-tag",
      handler: async (req, res) => {
        const { imageUrl } = req.body
        const tags = await fetch(`https://api.auto-tag.vn/tag`, {
          method: "POST",
          body: JSON.stringify({ url: imageUrl }),
          headers: { "Content-Type": "application/json" },
        }).then(r => r.json())
        return res.json({ tags })
      },
    },
  ],
})

10.10 Script đối soát tag (SQL)

-- Kiểm tra SKU chưa có tag
SELECT sku_id, product_name
FROM products p
LEFT JOIN tags t ON p.sku_id = t.sku_id
WHERE t.sku_id IS NULL
LIMIT 1000;

10.11 Elasticsearch mapping (JSON)

{
  "mappings": {
    "properties": {
      "sku_id": { "type": "keyword" },
      "tags": {
        "type": "nested",
        "properties": {
          "name": { "type": "keyword" },
          "confidence": { "type": "float" }
        }
      },
      "created_at": { "type": "date" }
    }
  }
}

10.12 K6 load test script

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 2000 },
    { duration: '5m', target: 2000 },
    { duration: '2m', target: 0 },
  ],
};

export default function () {
  const res = http.post('https://api.auto-tag.vn/tag', {
    url: 'https://cdn.example.com/img/12345.jpg',
  }, { headers: { 'Content-Type': 'application/json' } });
  check(res, { 'status is 200': (r) => r.status === 200 });
  sleep(0.1);
}

11. Gantt chart chi tiết (ASCII)

Phase   | Week 1-2 | Week 3-4 | Week 5-6 | Week 7-8 | Week 9-10 | Week 11-12
--------+----------+----------+----------+----------+-----------+------------
1. Khảo sát & định nghĩa                ██████████████████████████████████
2. Xây dựng môi trường                  ████████████████
3. Phát triển mô hình & API             ███████████████████████
4. Pipeline xử lý ảnh                  ███████████████████
5. Đồng bộ & prod                      ███████████████
6. Go‑live & Transfer                  ███████████

Các khối màu xanh = công việc đang thực hiện, đỏ = phụ thuộc (dependency).

12. Các công thức tính toán

ROI = (Tổng lợi ích – Chi phí đầu tư) / Chi phí đầu tư × 100%

Ví dụ: Nếu hệ thống giảm chi phí gắn tag thủ công $150 000/năm và chi phí triển khai $300 000, thì
ROI = (150 000 – 300 000) / 300 000 × 100% = ‑50 % (cần tối ưu chi phí GPU).

Kết luận & Key Takeaways

Mô hình CLIP cho phép trích xuất màu, kiểu dáng, chất liệu từ ảnh với độ chính xác > 92 % – phù hợp cho catalog hàng triệu SKU.
Kiến trúc micro‑service (Kafka → GPU inference → Elasticsearch) đảm bảo tính mở rộng và khả năng chịu lỗi.
Chi phí chủ yếu đến từ GPU; lựa chọn Serverless (Phase D) giảm tới 30 % nhưng cần cân nhắc latency.
KPI rõ ràng (Precision@k, latency, coverage) giúp đo lường hiệu quả và đưa ra quyết định tối ưu.
Checklist 42 item và tài liệu bàn giao chuẩn giúp dự án chuyển giao suôn sẻ, giảm rủi ro vận hành.

⚡ Best Practice: Đặt autoscaling cho cả Kafka và GPU inference; luôn giữ snapshot Elasticsearch trước mỗi deploy để rollback nhanh.

Câu hỏi thảo luận

Anh em đã từng gặp trường hợp mô hình CLIP “bias” với màu sắc trong ảnh nền sáng chưa?
Giải pháp nào đã áp dụng để giảm lỗi này?

Đoạn chốt marketing

Nếu anh em đang cần tích hợp AI nhanh vào app mà lười build từ đầu, thử ngó qua Serimi App xem, mình thấy API bên đó khá ổn cho việc scale.

Trợ lý AI của anh Hải
Nội dung được Hải định hướng, trợ lý AI giúp mình viết chi tiết.