Microsoft Azure Machine Learning Studio meets Titanic
In a previous post we made a little experiment with AWS SageMaker, using Titanic dataset. Now we’ll do the same using Azure services, specifically Machine Learning Studio.
For the few who have not read the previous article (shame on you :), just two words about the problem we want to resolve.
The Titanic dataset is a classic classification problem: the dataset contains several info about the passengers (age, sex, ticket class and so on) and the survival (yes/no) target value.
The goal is to train a model to predict if a given passenger survived or not. The data is not ready “as is”, so we’ll need to preprocess the data a bit.
Microsoft Azure Machine Learning Studio
Microsoft Azure Machine Learning Studio is a “collaborative, drag-and-drop tool you can use to build, test, and deploy predictive analytics solutions on your data.” (cit Microsoft).
Basically is complementary to Machine Learning Service, where you use Jupyter notebooks hosted by Azure or Visual Studio Code, using the specific extension.
As you can see from this infographic, the offer in terms of algorithms is quite exhaustive, so let’s start to use something for our Titanic problem!
First let’s upload the files on the platform, so we can use them as project assets.
Now let’s create a new project and add the titanic files as associated dataset.
There is the possibility to start a Notebook but we want to use the visual tool, so let’s start a new experiment starting from scratch!
This is the graphical interface and it seems quite intuitive to use, but first let’s recap what we’ll do: as said, we’ll do just some basic preprocessing, just to see the tool in action, without too much attention to results: we’ll try a couple of different two classes classification algorithms and we’ll see how results can be compared directly in the same experiment!
Let’s add the train dataset and take a look at the data. To do this, just drag and drop it from “Saved Datasets” on left menu and right click on it, to open context menu and choose Visualize.
Wow! For every column, you can have statistics and draw some plots (based on data types) to visualize and better understand the data. It’s possible to compare columns too, to see relations between them, doing multivariate analysis. Really cool.
Let’s drop PassengerId, Name and Ticket. To do so we have to use “Select Columns in Dataset” and, once done, we can run the task and have a resulting dataset, that is possible to visualize just like in the previous passage.
We can see Age has 177 missing values and categorical features need to be transformed: let’s do this adding appropriate steps and running again.
Now the data is ready to feed some models, so let’s do it!
We have to split the data 70–30, meaning we’ll use 70% of the data for training and 30% for testing and then we’ll score the results: then we’ll use a Logistic Regression algorithm, train the model and see the results. Something like this…
Just run… et voilà, super easy to do. Time to check some results, using the visualization of the evaluation results in Evaluate Model step
Awesome! You have all the metrics clearly indicated on the bottom part (Confusion Matrix, Accuracy, Precision, Recall, F1 score, AUC), the possibility to change the threshold and some useful graphs to visualize the performance.
Now, let’s add another classification algorithm (“Support Vector Machine”), using the same inputs for training and testing and let’s find out if it performs better or not.
Well, it seems SVM performs just a tiny bit better in term of F1 score and it’s easy to see and compare the results.
Azure Machine Learning Studio is really an impressive tool: the drag and drop interface reminds me a lot Orange (an open source machine learning and data visualization tool — I’ll make a specific post on it in the future) but clearly you can use all the Azure infrastructure and services in addition or…. just play with the data and do some fast prototyping (you don’t need to have an Azure account to use Learning Studio, just a Microsoft one) without the hassle to install or configure anything.
Moreover, you can add custom scripts in Python or R so there is room to customize the flow if needed.
Finally, the documentation is exhaustive and there are plenty of examples to try in the so called “AI Gallery”.
Hope you enjoy this little introduction to ML Studio, see you for the next episode!