Use other services for machine learning

  • 4/11/2018

Thought experiment answers

This section contains the answers to the thought experiment.

  1. ReLU, because it does not suffer from the vanishing gradient problem. Sigmoids are the default activation function in neural networks created by Net#, but you can change it for tanh activations. However, ReLUs cannot be used in Net#; that is one of the reasons that makes Net# not suitable for building deep neural networks.

  2. Although we consider images to be unstructured data because they are not in the form of a table, such as the database tables we are used to handle, in a certain way they are structured. An image has pixels correctly organized in two dimensions: width and height. The proximity of the pixels helps us to identify what appears in the image. Data in table form is very well structured, but they are not structured in the way that the pixels of an image do (nearby columns in the table do not provide extra information) so that no advantages can be seen when applying deep neural networks with convolutions (CNN).

    When we talk about tabular data, fully connected neural networks work well but do not usually provide better performance than a SVM, random forest or xgboost, and if you add that they are computationally expensive to train and difficult to fine tune, they end up being less used in this type of problems.

    So, task B is best suitable for deep learning.

  3. The correct order is A, C, B.

  4. See Skill 4.3 Perform data sciences at scale by using HDInsight.

  5. You must create a temporal table using that DataFrame:

    df.registerTempTable("MyTable")

    And make the SELECT to that table:

    %%sql
    SELECT * FROM MyTable
  6. See Skill 4.4 Perform database analytics by using SQL Server R Services on Azure.