€1,445.62 – €2,062.50

Machine Learning and Data Science in R on Microsoft ML and SQL Server

Event Information

Share this event

Date and Time

Location

Location

Online (virtual event)

TBA

Oman

Refund Policy

Refund Policy

Refunds up to 7 days before event

Event description

Description

A very intensive, hands-on, 5-day course designed for those who want to learn more in-depth machine learning and data science using R.

As you study the free, open source R, you will also learn how to easily make R incredibly fast, scalable, and enterprise-ready with Microsoft ML Server, SQL Server ML Services, and RStudio.


What Will You Learn?

  • Building and deploying machine learning models using open source R programming language, including data preparation, visualisation, and stringent model validation.

  • High-performance ML using the newest version of Microsoft ML Server and SQL Server 2019 with R and RStudio.

  • Deployment to production with nanosecond-scale performance.

  • Successful data science project formulation and delivery.


Target Audience

Analysts, budding and current data scientists, data engineers, DBAs, BI developers, programmers, power users, predictive modellers, forecasters, consultants, data engineers, anyone interested in using ML for AI, AI engineers.


Prerequisites

General ability to work with data in any form: using spreadsheets, tables, or databases. Prior knowledge of any programming language is helpful, however, if you are prepared to work harder by asking Rafal questions and doing a little additional homework during the week you can use this course to learn R as your very first programming language.

This course will teach you machine learning and data science using R and Microsoft technologies: you do not need to know that before attending.


Format

50% lectures, 25% demos, 25% lab tutorials.

You are encouraged to follow the demos on your machine, and you will be challenged to find answers to a few larger problems during the tutorials. We will provide you with all the necessary data sets including structured R Markdown notebooks containing labs.

While both the demos and the tutorials are a hands-on part of the course, if you prefer not to practice, you are welcome to use that time for additional Q&A, or to work with your own data.

As each training centre is different, you will receive an email, two weeks before the course starts, explaining how to prepare your computer for the course, unless the centre is providing one for you. In any case, preparation is easy, because we will use an Azure virtual machine that has been fully preconfigured with all the necessary software for the course. If you follow our preparation guide, you can take this VM together with the course data for your own future learning and reference.



Course Description

Above all, this course will teach you modern R: currently, the most powerful language explicitly designed for advanced analytics, statistical learning, data science, and cutting-edge general-purpose machine learning. While Python is more popular as a universal programming language, also widely used for image and text analysis using deep learning, R is a clear leader in data science.

You will learn how to do machine learning in R especially on classical data sets that you often encounter in business use. Even though such data might come from a data lake, typically you will find plenty of it in a data warehouse, a relational databases, or you can acquire it from transactional business application files, or from devices, such as: healthcare equipment, point-of-sales devices, or manufacturing and transportation machinery. Above all, R is great for exploratory analysis of data and it can help you draw meaningful conclusions from real-world experiments, such as A-B marketing tests or product trials.

This course will teach you the foundations of hypothesis testing in order to be able to draw such conclusions with a high dose of confidence.

Microsoft Machine Learning Server and Microsoft SQL Server 2019/2017 Machine Learning Services support both R and Python in a number of proprietary, high-performance, scalable, enterprise-ready, easy-to-use packages and libraries, notably RevoScale and MicrosoftML.

You will learn how to use them during this course. You will also learn how to do almost everything using the most popular algorithms provided by open source R packages, such as rpart, kmeansruns, fps, cluster, clusplot, ts, xts, e1071, caret, glm, and for extra help rattle, qdapTools, MLmetrics, and miscTools.

You will learn how to prepare and visualise data both by using open source packages, mainly dplyr and ggplot2, and other parts of the tidyverse meta-package, like readr, readxl, and lubridate, and how to do it more directly in SQL Server, benefiting from its performance and scalability. We will even combine the power of R with Power BI, to create informative visualisations that are otherwise impossible to do it Power BI alone.



While learning about data science process and hypothesis testing, you will discover that some complex business questions can be answered using simpler, statistical techniques, such as tests of significant differences between sets of data, or visualisations like notched box plots.

We will refresh your knowledge of rudimentary statistical concepts that are necessary for machine learning and data science, like knowing the difference between ordinal, interval and ratio data, and thus why it does not make sense to calculate a mean star rating, while a median is possible.

A little time has been allocated for the discussion of p-values, confidence intervals, and the differences between Bayesian and frequentist interpretation of your results. Bear in mind, that this is not a course about statistics, but a little working knowledge is a must in our industry, and to make the rest of the course easier to follow.

Early in the course, you will learn all the fundamentals of machine learning—no prior knowledge is necessary. You will study: data preparation and relevant structures, algorithm classes and their applications, model evaluation and validation, including all the common performance metrics such as precision and recall. At the heart of this course, however, you will gain an intimate understanding of how some of the most important algorithms work and how to prepare data to make the algorithms give you the most they can.



Starting with clustering, you will learn about k-means, k-medians, spherical kmeans and expectation-maximisation. You will find out how to prepare non-numerical and even some numerical data using popular R functions such as mtabulate for these algorithms.

Other than using clustering for segmentation, we will also study its use for anomaly detection. We will expand on that subject using other, specialised techniques, such as a One Class SVM and PCA-Based Anomaly Detection, permitting you to predict anomalies, such as fraud.



We dedicate a full day to focus on building classifiers. You will understand the differences between the most important decision tree algorithms: plain, forests and boosting, and you will study both simpler and more complex neural networks, and how they relate to regressions. We will also cover the widely used logistics regression algorithm, which, actually, is a classifier. Later in the course you will meet the large family of regression techniques, starting with classic linear regression, through GLM, the generalised linear model, to non-linear ML regressions. We will also have some time to cover remaining big applications of machine and statistical learning, notably forecasting with time series, and, briefly, recommendation engines.

When deploying models to production, the benefits of using ML Server and SQL ML Services will impress. After seeing how to do it using open source R, we will culminate with an extremely fast in-database deployment using T-SQL PREDICT statement, and the related real-time sp_rxPredict, which returns predictions on a nano-second scale! You will also see how to deploy your models using web services, interacting via Azure if needed.

Please note, however, that this course does not focus on Azure ML, even though we will briefly discuss how to combine those technologies together (please also see our other course by Rafal that focuses on Azure ML).

Every day we will work using RStudio, the most popular, and free, R IDE which is recommended by Microsoft for building R applications on top of SQL and ML Servers. All of our work will follow the modern principles of reproducible research: you will learn how to set-up notebooks, manage packages and their dependencies, including versioning, using snapshots, how to save your work, how to manage change using Git, and how to collaborate.

At the end of the course you will keep your own R notebook containing almost 1000 lines of code and results! You are also welcome to keep all data sets that you use during the course labs and tutorials. You will notice that throughout the week you understand and write better and more advanced R, whilst experiencing, first-hand, many of its real-world applications.

Model validity is the most important aspect of any machine learning project. A lot of time has been dedicated to explain it in detail: many validity metrics, such as precision, recall, AUC, F1 score, accuracy (which is rarely a good metric), and the many charts we use to analyse models, especially: confusion matrix, lift/gain charts, ROC curve, precision-recall curve, profit and cost chart, calibration charts, scatter plots, and others used for regression evaluation like histograms of residuals, QQ-Norm plot of residuals, scale-location, Cook’s distance and many others. You will learn how to create those plots using R, and with the help of other tools.

At the end of the course you will know when you can trust your models, and you will be able to explain your work to others, especially your project sponsors who rarely are machine learning experts.

Above all, this course will not only teach you the technology and how to use it, but, much more importantly, you will understand how ML works, how to avoid common mistakes, such as overfitting/overtraining, how to balance model accuracy against its reliability—the bias-variance trade-off—and how to relate key ML performance metrics to your business goals, making your bosses and clients happy with your progress and results. You will gain clarity how to start your data science projects and how to finish them. You will know how to express the business need in terms of testable hypotheses, which will guide model building and selection. You will understand what types of work are suited to ML, and which are unlikely to deliver results. You will discover what makes good first projects in your own area of specialisation. These are the key benefits of studying machine learning with Rafal Lukawiecki: industry veteran who has been practicing ML, data mining, statistical learning, and data science with his customers for well over a decade, and who has studied artificial intelligence at Imperial College in the ‘90s under the guidance of the leaders and the inventors of this are of industry and science.


Share with friends

Date and Time

Location

Online (virtual event)

TBA

Oman

Refund Policy

Refunds up to 7 days before event

Save This Event

Event Saved