Day 10/21
- **MLOps with k8s - twiml (Page 16 /31)**
- Steps to consider: data acquisition, preprocessing, experiment management, model development, deployment and monitoring(reporting).
- ML at scale, focus on eliminating the *incidental* complexity
- *incidental* complexity of machine learning ⇒ getting access to data, setting up servers, and scaling model training and inference.
- As opposed to its *intrinsic* complexity ⇒ identifying the right model, selecting the right features, and tuning the model’s performance to meet business goals.
- Key requirements
- **Multi-tenancy:** Establishing a group of hardware to a specific team is inefficient, rather create a shared environment for concurrent projects.
- **Elasticity:** The hardware should expand/shirk based on the requirement of workload.
- **Immediacy:** It should have self-service access to the Data scientists.
- **Programmability:** APIs to enable automated provisioning and maximise utilisation.
- Cloud does meet the above requirements, however, latency and economics can be optimised significantly if on-prem. If you want to know more about a hybrid approach watch “How Dukaan moved from cloud to on-prem” - Asli Engineering [link](https://www.youtube.com/watch?v=vFxQyZX84Ro)
- Container and K8s
- K8’s hierarchy - a declarative system
- Cluster, Master → multiple worker(nodes), kubelet (agent),
- Kubeflow is one of the options that utilises K8s to deliver mlops capabilities.
- Other solutions: [TWIML Solutions Guide](https://twimlai.com/solutions/)
- in general, ephemeral in nature
- Volumes (available until the pod exists), Persistence volume (lifecycle managed by the cluster)
![Untitled](https://prod-files-secure.s3.us-west-2.amazonaws.com/d2df9e4d-9311-4c0c-9701-1e0536a3aba8/d49fc8f2-e8f8-4d09-aafc-dc57f87b24ea/Untitled.png)
pg 17
- CSI and other others - Custom resources, operators, schedule extensions, CNI(container network interface), Device plugins
- **Exercise idea:** Containerise the training and inference part of a simple machine learning use case and orchestrate the process using K8s.
**Extras:**
1. Read: What if the load balancer goes down? [Saurabh Dashora on X](https://twitter.com/ProgressiveCod2/status/1735561521869283339)
1. Remove a single point of failure using Floating IP and Active-passive switchover.
2. Completed AWS LI assessment: [LinkedIn](https://www.linkedin.com/skill-assessments/Amazon%20Web%20Services%20(AWS)/quiz-intro/)
**Retro**
- Progress >>>> Feelings: You don’t have to “feel like” doing the thing but if you know deep inside it is good for you in the long/short run “just embrace the pain” and do it anyways or else you will have to endure the pain of regret. Choose your pain wisely.
- Steps to consider: data acquisition, preprocessing, experiment management, model development, deployment and monitoring(reporting).
- ML at scale, focus on eliminating the *incidental* complexity
- *incidental* complexity of machine learning ⇒ getting access to data, setting up servers, and scaling model training and inference.
- As opposed to its *intrinsic* complexity ⇒ identifying the right model, selecting the right features, and tuning the model’s performance to meet business goals.
- Key requirements
- **Multi-tenancy:** Establishing a group of hardware to a specific team is inefficient, rather create a shared environment for concurrent projects.
- **Elasticity:** The hardware should expand/shirk based on the requirement of workload.
- **Immediacy:** It should have self-service access to the Data scientists.
- **Programmability:** APIs to enable automated provisioning and maximise utilisation.
- Cloud does meet the above requirements, however, latency and economics can be optimised significantly if on-prem. If you want to know more about a hybrid approach watch “How Dukaan moved from cloud to on-prem” - Asli Engineering [link](https://www.youtube.com/watch?v=vFxQyZX84Ro)
- Container and K8s
- K8’s hierarchy - a declarative system
- Cluster, Master → multiple worker(nodes), kubelet (agent),
- Kubeflow is one of the options that utilises K8s to deliver mlops capabilities.
- Other solutions: [TWIML Solutions Guide](https://twimlai.com/solutions/)
- in general, ephemeral in nature
- Volumes (available until the pod exists), Persistence volume (lifecycle managed by the cluster)
![Untitled](https://prod-files-secure.s3.us-west-2.amazonaws.com/d2df9e4d-9311-4c0c-9701-1e0536a3aba8/d49fc8f2-e8f8-4d09-aafc-dc57f87b24ea/Untitled.png)
pg 17
- CSI and other others - Custom resources, operators, schedule extensions, CNI(container network interface), Device plugins
- **Exercise idea:** Containerise the training and inference part of a simple machine learning use case and orchestrate the process using K8s.
**Extras:**
1. Read: What if the load balancer goes down? [Saurabh Dashora on X](https://twitter.com/ProgressiveCod2/status/1735561521869283339)
1. Remove a single point of failure using Floating IP and Active-passive switchover.
2. Completed AWS LI assessment: [LinkedIn](https://www.linkedin.com/skill-assessments/Amazon%20Web%20Services%20(AWS)/quiz-intro/)
**Retro**
- Progress >>>> Feelings: You don’t have to “feel like” doing the thing but if you know deep inside it is good for you in the long/short run “just embrace the pain” and do it anyways or else you will have to endure the pain of regret. Choose your pain wisely.