Nick Elprin is CEO of Domino Data Lab, provider of the enterprise data science management platform trusted by over 20% of the Fortune 100
Two tectonic forces in enterprise infrastructure are on a high-speed collision course: the rise of data science and machine learning, and increasingly complex security threats. The convergence of these trends has created a critical moment for CIOs. Those who seize the opportunity will set their organizations on a course for growth and innovation, cementing the CIO’s role as a critical and strategic leader. Those who don’t will watch as innovation stagnates while new security and operational risks engulf the organization.
To understand the interplay of these two trends, we’ll start by looking at how data science and machine learning have taken root in most organizations and how this work differs from software engineering and previous generations of analytics work.
Data science as a discrete discipline started out much like most new technology trends — in silos. Functional departments hired data scientists to improve their specific processes (e.g., sales teams to automate forecasting or product teams to automate customer insights). As each of these data science islands appeared, they stood up their own “shadow IT” composed of the preferred hardware and software for their data scientists.
Unlike software engineering or business intelligence or descriptive analytics work, data science requires much more powerful and flexible hardware and software. Machine learning algorithms require massive compute resources — hundreds of cores, GPUs — and data scientists use open-source tools (Python, R) that have hundreds of packages that are updated weekly. Without centralized and mature platforms, data scientists use their desktops or laptops, department-specific cloud resources or shared servers to use their tools of choice. As a result, most organizations now have a “wild west” of data science systems and tools throughout the enterprise.
The Impact Of Shadow IT
As machine learning moves out of the “innovation lab” and predictive models drive more critical production processes, this “wild west” of data science systems creates significant operational risk. From my experience, I learned of an airline that didn’t know how to update its pricing models after the Covid-19 outbreak because the data scientists who developed those models were no longer with the company. The materials and process to update the models were spread across dozens of systems, and the original code wouldn’t run because it depended on older, unknown versions of specific Python packages.
This operational risk would be bad enough on its own, but it’s joined by the rise of sophisticated security threats that exploit technologies data scientists often use. Recently, a security researcher breached 35 tech companies — including some of the largest in the world — by publishing malicious packages to PyPi, a repository for Python packages that many data scientists depend on daily.
In a similar vein, it has become a best practice among data science teams to leverage containers and Kubernetes to productionize predictive models. Yet the security surface area of those technologies is broad and rapidly evolving, while most IT organizations are early in developing the expertise to secure and manage them. Container and Kubernetes security has become its own cottage industry, with specialized tools and firms. To use a metaphor, data scientists are packaging their work in a new type of explosive that only a small number of expert chemists know how to handle safely.
So data science is becoming more critical and integrated with production systems at most companies. At the same time, data scientists’ work is built upon decentralized, diverse and ungoverned software and infrastructure — much of which leverages technologies that are rife with novel security threats. This is a recipe for catastrophe in any enterprise.
From Chaos To Order
CIOs need to understand that shifting from being data-driven to model-driven requires a complete rewiring of their businesses, and they must do it. It’s very similar to what happened at many Fortune 500 companies that were late to the game implementing SaaS or public cloud; business units went rogue using their own cloud-based software, creating a tech sprawl that left their organizations less competitive. Let’s not repeat mistakes of the past. The longer CIOs wait, the greater the chance that more nimble companies are going to pass them by.
While sure to ruffle some feathers, the strategic CIO will position this transformation as a “win-win-win” that pleases all stakeholders. IT wins by mitigating operational and security risks by assuming governance of the entire data science function. At the same time, data scientists can be afforded the opportunity to collaborate as silo-based model development becomes a thing of the past. Business leaders benefit because more productive data scientists drive more business impact from data science investment.
The CIO is the most natural — if not the most business-critical — leader in an enterprise to unite these benefits together. As someone who is accountable for auditability, governance and compliance, they must have a purview into all facets of business operations, including product creation and pricing automation. CIOs cannot relinquish control of IT to the business units to piece together their own solutions.
Finally, it’s worth noting that Covid-19 has increased the urgency of this opportunity. The pandemic has caused such a punctuated change in people’s behavior that real-world data has dramatically diverged from patterns in historical datasets, invalidating the assumptions in many existing predictive models. This phenomenon is called “drift,” and it has created an urgent imperative to retrain models and redo historical data science analyses. That means a lot of data science work must be done quickly — and that is hard or impossible to do if past work is in the morass of the “wild west” of data science spread across diffuse and ungoverned systems and tools.
However, risk and opportunity often go hand-in-hand. CIOs should take action now to elevate data science and machine learning as a core organizational capability. Doing so can help unleash business value by making data scientists more productive, promote well-tuned, secure business operations and cement IT’s role as a strategic driver of digital transformation.