Day 10/21

- **MLOps with k8s - twiml (Page 16 /31)**
- Steps to consider: data acquisition, preprocessing, experiment management, model development, deployment and monitoring(reporting).
- ML at scale, focus on eliminating the *incidental* complexity
- *incidental* complexity of machine learning ⇒ getting access to data, setting up servers, and scaling model training and inference.
- As opposed to its *intrinsic* complexity ⇒ identifying the right model, selecting the right features, and tuning the model’s performance to meet business goals.

- Key requirements
- **Multi-tenancy:** Establishing a group of hardware to a specific team is inefficient, rather create a shared environment for concurrent projects.
- **Elasticity:** The hardware should expand/shirk based on the requirement of workload.
- **Immediacy:** It should have self-service access to the Data scientists.
- **Programmability:** APIs to enable automated provisioning and maximise utilisation.
- Cloud does meet the above requirements, however, latency and economics can be optimised significantly if on-prem. If you want to know more about a hybrid approach watch “How Dukaan moved from cloud to on-prem” - Asli Engineering [link](https://www.youtube.com/watch?v=vFxQyZX84Ro)

- Container and K8s
- K8’s hierarchy - a declarative system
- Cluster, Master → multiple worker(nodes), kubelet (agent),
- Kubeflow is one of the options that utilises K8s to deliver mlops capabilities.
- Other solutions: [TWIML Solutions Guide](https://twimlai.com/solutions/)
- in general, ephemeral in nature
- Volumes (available until the pod exists), Persistence volume (lifecycle managed by the cluster)

![Untitled](https://prod-files-secure.s3.us-west-2.amazonaws.com/d2df9e4d-9311-4c0c-9701-1e0536a3aba8/d49fc8f2-e8f8-4d09-aafc-dc57f87b24ea/Untitled.png)

pg 17

- CSI and other others - Custom resources, operators, schedule extensions, CNI(container network interface), Device plugins
- **Exercise idea:** Containerise the training and inference part of a simple machine learning use case and orchestrate the process using K8s.


**Extras:**

1. Read: What if the load balancer goes down? [Saurabh Dashora on X](https://twitter.com/ProgressiveCod2/status/1735561521869283339)
1. Remove a single point of failure using Floating IP and Active-passive switchover.
2. Completed AWS LI assessment: [LinkedIn](https://www.linkedin.com/skill-assessments/Amazon%20Web%20Services%20(AWS)/quiz-intro/)

**Retro**

- Progress >>>> Feelings: You don’t have to “feel like” doing the thing but if you know deep inside it is good for you in the long/short run “just embrace the pain” and do it anyways or else you will have to endure the pain of regret. Choose your pain wisely.