Rubric + Sample Q&A

ML System Design Example:

Faire is deploying a realtime ML inference service to power personalized product recommendations. System must handle hight request volume with low latency and serve multiple models efficiently. Design the architecture for this realtime inference service.

You will be evaluated based on the following topics

Model deployment: How will models be deployed updated and versioned
Scalability: How will system handle increased traffic and concurent requests.
Latency optimization: How will you ensure low latency inference
Feature retrieval: How will real-time and offline features be fetched efficiently
Logging: How will you track requests errors and latencies.
Monitoring: How will you track model of feature drift
Failure handling: What happens if the mode server fails or responses are delayed.

Rubric:

Model Deployment

Poor (0):
- No clear deployment strategy
Average (1-2):
- Mention basic deployment, ex: flask/ FastAPI, if you lack versioning details of mention API design at a high level
Good (3-4):
- Scalable deployment solution. Ex Tf serving, triton, bento ML with versioning strategies.
- More senior candidates should understand A/B testing would be required for this.

Scalability

Poor (0):
- No clear strategy for handling increased traffic or concurrent requests.
Average (1-2):
- Mentions basic scaling techniques (e.g., horizontal scaling, load balancers).
- Refers to replication of services without addressing potential bottlenecks or the impact on stateful components.