ML System Design Example:

Faire is deploying a realtime ML inference service to power personalized product recommendations. System must handle hight request volume with low latency and serve multiple models efficiently. Design the architecture for this realtime inference service.

You will be evaluated based on the following topics

  1. Model deployment: How will models be deployed updated and versioned
  2. Scalability: How will system handle increased traffic and concurent requests.
  3. Latency optimization: How will you ensure low latency inference
  4. Feature retrieval: How will real-time and offline features be fetched efficiently
  5. Logging: How will you track requests errors and latencies.
  6. Monitoring: How will you track model of feature drift
  7. Failure handling: What happens if the mode server fails or responses are delayed.

Rubric:

Model Deployment

Scalability