Applied Machine Learning - Microsoft Certificate in Data Science 9a

1. Time Series and Forecasting

Introduction to Time Series

Finance / stock / currency exchange rate / sales forecast / temperature / heartrate / Semicon ET and inline long-term trend / …

The Nature of Time Series Data

Time Series vs. Random or Independent Noise @Time Series vs. Random or Independent Noise

Autocorrelation: value at has correlation with the value at following

Autocorrelation
@Autocorrelation

Regular Reporting: Some algorithms can only work with regular reporting

Regular vs. Irregular Reporting @Regular vs. Irregular Reporting

Decomposition – STL Package (Investigating)

Components of Time Series Data

STL Package Procedure
  1. Start with time series data
  2. Use Loess to find a general trend
  3. Use Moving Average Smoothing to find fine-grained trend
  4. Get seasonal / periodic component
  5. Get final trend by smoothing the nonseasonal trend with Loess
  6. Get remainder
Lowess / Loess Regression

General trend for smoothing time series data

Idea: fit local polynomial models and merge them together local for flexible polynomial for smooth

Step 1: Define the window width m, and do local regression with m nearest neighbors Alt text

Step 2: Choose a weight function giving higher weights to nearer points to center Alt text Alt text

Step 3: Do quadratic Taylor’s polynomial regression considering the weights from Step 2 Alt text

Step 4: Substitute with , which is calculated from regression when

Step 5: Repeat above for each of , then connect points to get the general trend

Adjusting Window

Moving Average Smoothing

Fine-grained trend for smoothing time series data with clearly periodicity (after extract general trend)

Procedure of Moving Average Smoothing

Stationary Remainder / Time Series

Second-order stationarity conditions:

  1. Constant Mean
  2. Constant Variance
  3. An autocovariance that does not depend on time

Technique 1: Boxplots with binned data point into upper hierarchy Alt text

Technique 2: Boxplots with binned data point into upper hierarchy Alt text

Autocorrelation and Autocovariance

Same as Correlation (normalized Covariance) used to describe (linear) relationship between Feature X and Feature Y

Auto = self ACF = Autocorrelation Function

Alt text

Working with Time Series

Introduction of models for modeling different types of time series data so that we can do forecasting

Remainder can also have time series pattern which need to be carefully modeled and removed, then the left residue (prediction error) should be normal distributed along time

Noticed successful STL should have below appearance

  1. histogram of remainder is close to normal distribution
  2. boxplot of remainder at seasonal level (like month) is stable

Moving Average Models MA(q)

Microsoft announces one news everyday, and its stock will be affected by today’s and last 2-days news Alt text

A model has only short memory of the previous noise Moving Average Model

ACF: sharp cut off after order q; can identify whether you data can be modeled as with what order Alt text

Autoregression Models AR(p)

Today’s value is slightly different from a combination of the last day’s values Alt text

ACF: Exponential decay; can not identify order

Alt text

Partial Autocorrelation

The correlation that is not accounted for all of the lags in between

Alt text

@Comparison between ACF and PACF of AR(1)

Auto-Regressive Moving Average Model ARMA(p,q)

Used when both ACF and PACF shows slow decay

Alt text

Auto-Regressive Integrated Moving Average Model ARIMA(d,p,q)

Differencing

Non-stationary time series can have stationary differences

Alt text

Higher order trends can be turned into stationary models through repeated differencing Alt text

Model Details

Alt text

Exponentially Weighted Moving Averages Model EWMA / Simple Exponential Smoothing Model SES

Most widely used for business applications / forecasting

Alt text

Forecasting in Context

Alt text

Reference

Time Series Analysis (TSA) in Python - Linear Models to GARCH

2. Spatial Data Analysis

Mobile marketing / smart watch data / oil exploration / real estate pricing / transportation network / crimes data …

Introduction to Spatial Data

Types of Spatial Data

  • Points (location only)
  • Polygons
  • Pixels / Raster (location + count/density shown as colors)

Types of Distance

  • Euclidean distance (physical distance; use built-in tool to calculate since earth is round)
  • Driving / Walking distance
  • Adapted to the local area (like same building) Distance Matrix @Distance Matrix

Visualize relationship of different features and overlay multiple features in one plot by various way like bubble size or filled color

Kernel Density Estimation KDE

  • Go-to method for density / event rate estimation
  • “Nonparametric”, meaning that there is a bump on each point

Alt text

K-Nearest Neighbour

Localized technique of probability estimation

  • Classification by majority vote
  • Regression by average vote
  • Take care
    • scale sensitive: consider normalization
    • selection of K and weight of distance

Working with Spatial Data

Spatial Poisson Processes

Probability estimation of occurrence count in an area in a period, based on Poisson distribution, which is a discrete probability distribution

Alt text

Variogram

Estimate the (label) covariance between samples with spatial changes in units, which is just like the ACF and PACF for time series Input data is labeled Consider overall data in dataset Alt text

Alt text

Kriging / Gaussian Process / Spatial Regression

Overall technique of probability estimation based on Variogram providing the covariance k

k can be modelled by arbitrary covariance function in Variogram stage

Interpolation method for estimating the property of unsampled location, so can get the complete map

Alt text

Spatial Data in Context

Alt text

3. Text Analytics

Summary of text / compare between text or classification

Introduction to Text Analytics

Alt text

Word Frequency

  • Frequency plot
  • Cumulative plot to examine the cleaned up dataset

Stemming

Only for English

connection, connected, connective, connecting –> connect

  • Porter’s Algorithm
    • V is one or more vowels (A, E, I, O, U)
    • C is one or more consonants
    • All words are of the following form
      • [C]VC{m}[V], optional in brackets and stack times in parentheses
    • For each words, we check whether it obeys a condition, and shorten or lengthen it accordingly

Feature Hashing (Dimensionality Reduction)

Fast and space-efficient way of vectorizing features, by applying a hash function to the features and using their hash values as indices directly.

Also called hashing (kernel) trick.

Wiki: Feature hashing

Working with Text

Calculating Word Importance by TF-IDF

TF = Term Frequency (the number of times you see a word) IDF = Inverse Document Frequency

TF-IDF is the key factor used in search engines

  • TF-IDF is high when
    • the term appears many times in few documents
  • TF-IDF is low when
    • the term appears in almost all documents
    • the term does not appear often

Introduction to Natural Language Processing

Alt text

Text Analytics in Context

Alt text

4. Image Analysis

Photographs / Security cameras / Check reader / Medical images / Art work analysis …

Introduction to Image Analysis

Read / Plot Image

  • misc function from scipy (output a numpy array with rows and columns as the image size)
  • imshow from matplotlib.pyplot
  • glob.glob for multiple images reading
    • Alt text

Image Properties

  • Examine the distribution of gray scale
    • Histogram (ideal image has nearly uniform distribution)
    • CDF (ideal image has a straight line)
  • Adaptive Histogram Equalization to improve contrast
    • The histogram equalization algorithm attempts to adjust the pixel values in the image to create a more uniform distribution
    • exposure.equalize_adapthist from skimage
      • Before and After Equalization
      • @Before and After Equalization

Image Manipulation

Blurring and Denoising

Pre-whitening together with Denoising can improve the sobel edge detection result –> clearer edge

The reason may be that it covers and removes the unnecessary / meaningless portion of image, which also happens in time series analysis when doing cross-correlation function for two series case

Alt text

Alt text

Working with Images

Feature Extraction

Sobel Edge Detection

Detecting edge by looking for single direction gradients within selected area Viola and Jones Method in Course 7 @Viola and Jones Method in Course 7

Alt text

Segmentation

Remove noise or unwanted portion

Simplest way –> threshold (move out the points under or over threshold)

Harris Corner Detection

Compute Q matrix in E function representing a ellipse, and detect a corner when Q has 2 large eigenvalues which illustrates smaller principal axes of ellipse

Alt text

Introduction to Mathematical Morphology

Dilation and Erosion

Fill or remove the center pixel of specific shape to do dilation and erosion

Alt text

Opening and Closing

Alt text

Image Analysis In Context

Alt text


Alt text

PREVIOUSNeural Networks and Deep Learning - Deep Learning Specialization 1
NEXTProgramming with Python for Data Science - Microsoft Certificate in Data Science 8b