Use other services for machine learning

  • 4/11/2018

Skill 4.2: Streamline development by using existing resources

As in most jobs, like a data science project, it is likely that many things can be reused: tools installation, data pre-processing, model architectures, the way to store the results, and more. The reutilization of some of these processes saves you time while developing data science solutions.

In this Skill you see how Cortana Intelligence Gallery makes it easier for you to access and publish pre-built machine learning solutions. The gallery offers access to Azure Machine Learning experiments, Jupyter Notebooks, projects (usually links to GitHub projects), complete solutions (solutions may contain other Azure elements beyond Azure Machine Learning experiments), tutorials, and custom Azure Machine Learning modules. In addition, all of these items can be grouped into collections, making it easy to access related resources.

You can upload your work to the gallery, so deploying it in another subscription is trivial. And not only that, you can make them public and share them with other users. Just as you can publish solutions that have worked well for you, you can also benefit from the solutions that other users have uploaded. Cortana Intelligence Gallery is, therefore, a powerful community-driven tool.

In addition, in this Skill you see in detail the main advantages of using Azure virtual machines such as the Data Science Virtual Machine or the Deep Learning Virtual Machine (already used in Skill 4.1).

Clone template experiments from Cortana Intelligence Gallery

Throughout the book, references have been made of example experiments that you can use in your learning process. In this section you see how to clone an experiment in your current subscription. You use an example of a digit recognition experiment using convolutional networks defined with Net#.

In order to do so, go to the Cortana Intelligence Gallery page (https://gallery.cortanaintelligence.com). The home page shows you the latest news and the most popular resources (see Figure 4-14).

FIGURE 4-14

FIGURE 4-14 Cortana Intelligence Gallery home page

Use the search bar and enter Neural Networks, and then press Enter, or click the magnifying glass icon. A list of experiments and some other resources will appear. Note that on the left pane you have a series of filters that allow you to filter results by categories, tags, algorithms used, or programming language used, to name a few examples. Click the Neural Network: Convolution And Pooling Deep Net experiment (see Figure 4-15).

FIGURE 4-15

FIGURE 4-15 Neural networks search results

The new page that appears (see Figure 4-16) shows you detailed information about the experiment. The explanation usually includes captures that help you better understand the experiment and show you how to interpret the results correctly.

Related experiments are listed on the right, so it is easy to find similar experiments if you want to try more models. Also on the right you find the button that allows you to clone this experiment in your workspace. Click that button.

FIGURE 4-16

FIGURE 4-16 Description of a neural network experiment that uses a convolution network with maxpooling

Pressing this button takes you to the Azure Machine Learning Studio. There you can select in which subscription you want to clone the experiment. When you have selected the desired subscription, click the tick button (see Figure 4-17).

FIGURE 4-17

FIGURE 4-17 Copy experiment from Gallery window that appears in Azure Machine Learning Studio when you open a gallery experiment

After a short charging time, a new experiment opens and you can start working with it.

Another option for cloning the experiment is to do it directly from the Azure Machine Learning Studio. Click the ‘+ New’ in the lower left corner and search again for ‘neural networks’. In Figure 4-18 you can find the search result. By placing the cursor over an experiment, you see a green button that reads, Open In Studio. Press to create new experiments directly from the Azure Machine Learning Studio.

FIGURE 4-18

FIGURE 4-18 Neural networks experiments samples from Azure Machine Learning

Using gallery experiments saves you time in different ways:

  • Experiments come prepared for execution, so you do not have to waste time disposing modules in the Azure Machine Learning canvas and connecting them to each other.

  • By quickly exploring different architectures without having to develop them yourself. Cloning an experiment is a good way to start; if it does not work as expected you can always add modifications to the base experiment.

  • Additionally, with the deployed copied experiments datasets and pre-trained models are also provided, so in some cases it is not necessary to re-run training.

In addition to the time you save, the gallery is very useful to train and learn from examples developed by other experts in the field. Consider it also a resource for learning.

Do not hesitate to contribute by publishing an experiment to the gallery using the Publish To Gallery button at the bottom of the editing view of an Azure Machine Learning experiment (see Figure 4-19). Before publishing, you must provide a name and a description of the experiment. Be sure to write a good description in order to let gallery users know before they clone the experiment how your experiment works and what it can provide.

FIGURE 4-19

FIGURE 4-19 Publish To Gallery button

Use Cortana Intelligence Quick Start to deploy resources

Cloning experiments is fine when everything you need to implement your solution is within an Azure Machine Learning experiment. It is often necessary to deploy other Azure elements that deal with certain parts of the process, such as data acquisition and pre-processing or saving results in a SQL Database with an added Analysis Services on top of it. Other times it is not necessary to use any Azure Machine Learning experiments, such as the solution that you use in Skill 3.3 where you built a recommendation system. Along those lines, Cortana Intelligence Solution is a prebuilt machine learning solution that you can easily deploy on your subscription.

Go to the front page of the Cortana Intelligence site (see Figure 4-14) and click the tab Solutions. You see a list of solutions that you can easily deploy in your subscription (see Figure 4-20).

FIGURE 4-20

FIGURE 4-20 Cortana Intelligence Solutions

In the ribbon under the header you find a link to the deployed solutions. If you have followed Skill 3.3, you see the recommender solution listed there (see Figure 4-21).

FIGURE 4-21

FIGURE 4-21 List of deployed solutions

Return to the solutions page shown in Figure 4-20, and click the See All button that is on the right side. This takes you to the same page where you searched for experiments with Neural Networks in the previous section, but with the difference that this time the results are filtered by solutions (see Figure 4-22).

FIGURE 4-22

FIGURE 4-22 Searching for solutions in the Cortana Intelligence Gallery

If you are looking for the recommendation solution you deployed in Skill 3-4 you will not find it because it belongs to the Tutorial category and not the Solutions category. Regardless of whether it is a tutorial or a solution, the principle is the same: Azure architecture deploys in just a few clicks.

Scroll down and search for the Demand Forecasting For Shipping And Distribution solution. Click the solution title and you will see a detailed description, similar to the experiments (see Figure 4-23).

FIGURE 4-23

FIGURE 4-23 Demand Forecasting For Shipping And Distribution Solution

Scroll down and, in the right column, you see the list of used elements for that solution. In the Demand Forecasting solution, a lot of Azure elements will be deployed (see Figure 4-24).

FIGURE 4-24

FIGURE 4-24 Services used by the Demand Forecasting For Shipping And Distribution solution

Clicking Deploy takes you to a deployment wizard (see Figure 4-25) very similar to the one seen in Skill 3.4: Consume exemplar Cognitive Services APIs. For a complete deployment example, review the recommendations system deployment carried out in that Skill.

FIGURE 4-25

FIGURE 4-25 Solutions Deployment Wizard

As with the gallery’s experiments, Cortana Intelligence Solutions, in addition to saving you time allows you to use architectures designed by experts and ensure you get good predictions, performance, and scalability. They are also good resources for training and education.

Use a data science VM for streamlined development

The Data Science Virtual Machine (DSVM) is a virtual machine image that aims to save you time by providing a predefined environment for doing data science. The Azure Virtual Machines minimize hardware maintenance and allows you to scale resources easily. DSVMs are currently available on Windows Server, Ubuntu, and Cent OS. A special type of DSVM is the Deep Learning Virtual Machine (DLVM) that, in addition to the data science tools, contains tools specifically designed for deep learning and is configured to use GPUs.

The prerequisite for creating Azure Virtual Machines is a subscription. You should keep in mind that the number of cores of the machine you want to create must not exceed the limit set by your subscription. In Skill 4.1, you have seen how to create and use a DLVM. You also will create a DSVM in the first section of Skill 4.4. Because two examples are given in this book on the creation of virtual machines, in this section you look at different use cases and the preinstalled tools.

You have already seen some of the advantages provided by these machines, but here you find a more detailed list of different scenarios in which a DSVM or a DLVM can be useful:

  • Preconfigured workspace With zero configurations, you can get a cloud computer ready to solve data science problems. If you are working in teams, the work is more comfortable because there will not be any problems with versions, as could be the case with personal desktop computers.

  • Training If you are giving a course and want to give exercises to the students, the environments of a DSVM are perfect because it leads to predictable results. Very easy to deploy in a lecture or for students. Moreover, the ease with which machines are deployed also makes it convenient to create one machine per student.

  • Scalability Usually, while defining the models and performing small tests, a powerful machine is not necessary, but when working with a lot of data, it is normal to increase the size of the machine during preprocessing and model training. Azure virtual machines can be easily scaled, so you do not need to create a new machine with better capabilities for heavy tasks, you can use the same.

  • Quick experiments Sometimes you may want to test new tools, make demos, or replicate published experiments. For all of those tasks, a DSVM is useful.

  • Deep learning Regarding deep learning, you have already seen the facilities it can provide. In addition to all the pre-installed tools, mounting the images of a DSVM on N-series machines gives you GPU acceleration, a required condition to train deep models in reasonable time. Use DLVMs to write applications capable of performing difficult tasks such as understanding images, videos, or text. Deep learning shines in these kinds of tasks, sometimes even surpassing the performance of a human in many tasks.

As mentioned above, there are three different images of the DSVM depending on the operating system used. Below are some of the applications and tools preinstalled in the Windows version:

  • Microsoft ML Server Developer Edition You use this powerful tool in Skill 4.4.

  • Anaconda Python Two distributions are available: 2.7 and 3.5 versions. Anaconda is a Python environment in which most of the packages used for data science are pre-installed. Among them are numpy, pandas, and sklearn to name a few. Jupyter Notebooks also come with Anaconda distributions.

  • JuliaPro A complete distribution of the Julia programming language, including scientific and data science libraries.

  • Azure Machine Learning Workbench It is a desktop application plus command-line tools that allows you to manage machine learning solutions through the entire data science life cycle.

  • Visual Studio Community Edition and other IDEs and code editors.

  • Power BI Desktop.

  • SQL Server 2017 Developer Edition.

  • Spark Standalone instance for local development and testing (includes a PySpark kernel for Jupyter Notebooks).

  • Git Git Bash and Visual Studio Team Services.

  • Other machine learning and data analytics tools Such as XGBoost, Tensorflow, Keras, CNTK, mxNet, Weka, Vowpal Wabbit, Apache Drill, and Rattle.

Notice that all of the tools listed here are not available in all of the operating systems. For instance, Power BI is only available in Windows Server. Linux versions of the DSVM are more oriented toward the use of deep learning. The two Linux versions have libraries like Caffe, Caffe2, Torch, Theano, and NVIDIA DIGITS, which the Windows DSVM version does not have.

In addition to individual virtual machines, Azure allows you to deploy large clusters (computer groups working in a coordinated way). In the next Skill you review how to manage clusters.