Серверы с серверами

The Golden Gate Bridge was built to connect San Francisco and Marin County using a suspension bridge over a mile long. It was built during the Great Depression and took over 4 years to construct with over 1.2 million rivets.

When I talk to data scientists and IT teams in the same room, I often feel that the distance between the 2 teams is much more than the mile that Golden Gate Bridge covers. They have very different expertise and vocabulary making effective communication difficult. Yet, as I have mentioned in a previous blog, the speed of execution for delivering machine learning projects has a huge financial impact. Data scientists often work with data pipelines, and IT teams are focused on the infrastructure. So, naturally, there may be some communication gaps between the two teams. Recently, when I show data scientists and IT teams that Cisco Validated Designs are able to provide a scalable architecture supporting data pipelines, the dynamics of the room changed from one of indifference to that of tight collaboration.

What is a Data Pipeline?

First let's talk about data pipelines. As an example, the diagram below shows a data pipeline for a single data source. The data scientist has to collect, clean, and correlate the data before sending it for machine learning training. Eventually, a good model emerges, enabling new data to be fed to the model for updated inference results.

Data Pipeline for Single Data Source

But increasingly, customers are working with much more sophisticated data pipelines. As shown in the diagram below, often times, data scientists are working with multiple data sources, each with their own pipeline for collection, cleansing, and correlation. The inferred data from each of these data sources is then fed to a second stage of learning gathering the information from multiple data sources. In fact, this type of data pipeline with multiple data sources can be found in many verticals, from security to targeted marketing campaigns. On any given day, data scientists can be focused on issues related to any part of the pipeline. In fact, he or she is probably also wondering whether additional data sources, such as structured data, live news feed, geolocation data, and other sources, should be added to the mix to gain deeper insights into the business problem at hand. Note that the focus on the data and its operations prevents the data scientists from focusing on the infrastructure that is needed to run the data pipelines.

Data Pipeline for Multiple Data Sources

What Tools are Available to Implement Data Pipelines?

There are many tools available to implement data pipelines. Each data scientist will have preferences depending on familiarity, data type and software package A small sample of tools is listed in the diagram below. At the end of the day, these sophisticated data pipelines are still software running on servers: It's NOT rocket science.

Data Pipeline Software Tools

Infrastructure Solutions for Data Pipeline

As Cisco works with customers from different verticals, some patterns start to emerge. While not all customers will have the same architecture, we see that many customers end up having a combination of

Data Ingestion Infrastructure: For storing, staging, and streaming the information before going to the next part of the data pipeline. Often, this can be a Hadoop layer where the data is close to the compute enabling high parallel processing of the data.
Compute Intensive Infrastructure: For compute intensive workloads like machine learning and deep learning training, a dedicated cluster can be used to accelerate the processing
Storage Infrastructure: At some point, the raw and processed data needs to be stored. Having a dedicated storage infrastructure makes it easier to provide scalable storage capable of delivering proper backup and ease of management.

Infrastructure Solutions for the Data Pipeline

While not every customer will have separate clusters for each of the functions cited above, we do find that many customers do have the various functions such as data ingestion, storage, etc. For example, the Cisco Validated Design with Cloudera Data Science Workbench, incorporates data ingestion and storage as part of the Hadoop cluster that includes Apache Spark and Hadoop File System. In addition, Cloudera Data Science Workbench creates a compute-intensive cluster capable of using servers with GPUs running deep learning frameworks, such as TensorFlow. Check out the CVDs in the Cisco Data Center Design Zone that provide more details for each of the clusters mentioned above.

Building a Bridge between Data Science and IT through the Artificial Intelligence and Machine Learning Partner Ecosystem

Cisco is continuing to work with machine learning ecosystem partners to help bridge the gap between data scientists and IT. In fact, Cisco is delighted to see Google adding Kubeflow Pipelines to the Kubeflow open source project. This latest contribution expands TensorFlow's capability to compose a data pipeline with reusable components accelerating the work of data scientists. Cisco is also contributing code to the Kubeflow project, ensuring a consistent hybrid cloud architecture for machine learning.

Cisco is also participating in NVIDIA's new NGC-Ready program, ensuring that Cisco servers, such as the recently announced UCS C480 ML, can take advantage of the NGC container registry and its large repository of pre-built and optimized containers with the latest machine learning and deep learning frameworks.

"Powerful software benefits from powerful systems," said Kari Briski, Sr. Director, Accelerated Computing Software and AI Product, NVIDIA. "With NGC-Ready, users of the Cisco UCS C480 ML, with its 8 NVIDIA Tesla V100 GPUs interconnected with NVLink, can leverage NGC containers to create an ideal solution for large-scale deep learning workloads."

In working with the artificial intelligence and machine learning ecosystem, Cisco is bridging the gap between data scientist and IT. At the end of the day, Cisco's goal is to accelerate your the machine learning deployment and refine the mining of your data.

Where are you in your journey to adopt artificial intelligence and machine learning? Check out some of the Cisco Validated Designs to help you accelerate. Reach out to your Cisco account team, and we can have a deeper conversation. If you are attending SC18 in Dallas, Texas, stop by the Cisco Booth,#2803.

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

Серверы с серверами

Новости по теме

Top Features of the Huawei S5731-S24T4X: The Ultimate Gigabit Access Switch for Modern Networks

General Power Module Fault Location Procedure (CE8800 & 7800 & 6800 & 5800)

How Do I Split a Stack? How to clear the stacking configuration?

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Stacking Card Stacking vs. Service Port Stacking: Application Scenarios for the Two Switch Stacking Methods

Huawei S5731-H24T4XC: High-Performance Intelligent Gigabit Switch

Huawei S5731-S48P4X: High-Performance PoE Switch with Flexible Power and Uplink Options

Huawei S5731 Series: Advanced Networking Solutions for Enterprises

Difference between campus switch and data center switch

Huawei S6730-H28Y4C Campus CloudEngine Switch Datasheet

S6730-H48Y6C: Unleashing Power and Flexibility for Modern Networking

CloudEngine S6730-H Series Switches Datasheet

Huawei CloudEngine Switch S6730-S24X6Q Datasheet

CloudEngine S6700 Series Switches Naming Conventions & Description

Huawei CloudEngine S6730-H24X6C Datasheet

Huawei S6730 Series Switches Datasheet

Huawei CloudEngine Switch S6730-H48X6C Datasheet

Introduction to the Huawei CloudEngine S6730-S Series Switches

Huawei S6730-H48X6CZ-V2: The Ultimate High-Speed Network Switch

Overview of the S6730-H28X6CZ-V2 Switch

Huawei CloudEngine S6730-H24X4Y4C: A High-Performance Enterprise Switch for Modern Networks

​Introduction to Huawei CloudEngine S6730-H Series Switches

Comprehensive Guide to the CloudEngine S6730-H24X6C-V2: Features, Specifications, and Applications

Bridging the Gap between Data Scientists and IT

What is a Data Pipeline?

What Tools are Available to Implement Data Pipelines?

Infrastructure Solutions for Data Pipeline

Building a Bridge between Data Science and IT through the Artificial Intelligence and Machine Learning Partner Ecosystem

Горячие метки:

Ordering Guide

Ресурсы по программам

О нас

Introduction to Huawei CloudEngine S6730-H Series Switches