I’m currently employed as a Data Scientist for a small team within a multinational corporation. Despite this, our deployment process for Data Science models and applications remains sluggish and rudimentary.
The previous Data Scientist implemented a basic Streamlit setup on a server, which we’re still using. However, with our significant scaling up, I’m encountering increasing difficulties in code maintenance and app deployment.
To illustrate, I’m manually transferring my code to the “production” environment (a single running instance of Streamlit) as we don’t even utilize Git.
The higher-ups are seeking viable solutions for scaling up, but whenever I suggest leveraging cloud services (although I’m not well-versed in this area, I recognize it as a potential solution), they seem to sidestep my recommendations.
I’m really in need of guidance as I feel somewhat out of my depth here. Any advice would be greatly appreciated.
Ugh, data science deployment struggles! I feel you. I’m a data scientist at a startup, and while we’re growing fast, our deployment process is still kind of clunky.
Cloud services can be amazing for data science deployment, but there is a learning curve. Here’s what I’d suggest:
Gather data: Show your managers some concrete examples of how cloud platforms like AWS SageMaker or Google Cloud AI Platform can streamline deployment and handle scaling. Focus on the benefits like faster iteration times, easier collaboration, and potentially reduced costs in the long run.
Start small: Maybe propose a pilot project where you deploy a simple model on a cloud platform. This can give you a chance to learn the ropes and showcase the benefits to your team.
Focus on the ROI (Return on Investment): Frame your suggestions around how cloud deployment can save time, improve efficiency, and ultimately help the company achieve its goals.
Make a cross-team plan that includes security, cloud, and DS/ML. I have experience, for instance, with serverless deployment plans. The MLOPs pipeline uses Sagemaker, whereas the CI/CD pipeline uses Terraform for security and Azure DevOps.
You’re not stuck! Focus on the pain points - no version control, manual work. Suggest cloud services like AWS SageMaker to automate and scale things.
Start small with a test project to show the benefits. Offer to learn about the cloud with your boss - it shows initiative! Cloud services can be cost-effective too. Find someone who knows the cloud to be your partner in proposing this switch. Present a simple plan focused on the problems and the cloud’s solutions.
To be fair, it sounds like you’re a little in over your head and that you’ve grown enough that you’d need a devops guy / full stack developer to help you set up a proper deploy pipeline.
What you describe that you’re doing now is definitely not sustainable in the long run, and you’ll end up with all kinds of issues.
You can also try to go at it alone, I would start by setting up git, github/gitlab, and automated deployment.
You’ll also probaby want to set up a dev and staging environment pretty quickly.