Applied Machine Learning - Microsoft Certificate in Data Science 9a
1. Time Series and Forecasting
Introduction to Time Series
Finance / stock / currency exchange rate / sales forecast / temperature / heartrate / Semicon ET and inline long-term trend / …
The Nature of Time Series Data
Time Series vs. Random or Independent Noise
Autocorrelation: value at has correlation with the value at following
Autocorrelation
Regular Reporting: Some algorithms can only work with regular reporting
Regular vs. Irregular Reporting
Decomposition – STL Package (Investigating)
STL Package Procedure
- Start with time series data
- Use Loess to find a general trend
- Use Moving Average Smoothing to find fine-grained trend
- Get seasonal / periodic component
- Get final trend by smoothing the nonseasonal trend with Loess
- Get remainder
Lowess / Loess Regression
General trend for smoothing time series data
Idea: fit local polynomial models and merge them together local for flexible polynomial for smooth
Step 1: Define the window width m, and do local regression with m nearest neighbors
Step 2: Choose a weight function giving higher weights to nearer points to center
Step 3: Do quadratic Taylor’s polynomial regression considering the weights from Step 2
Step 4: Substitute with , which is calculated from regression when
Step 5: Repeat above for each of , then connect points to get the general trend
Moving Average Smoothing
Fine-grained trend for smoothing time series data with clearly periodicity (after extract general trend)
Stationary Remainder / Time Series
Second-order stationarity conditions:
- Constant Mean
- Constant Variance
- An autocovariance that does not depend on time
Technique 1: Boxplots with binned data point into upper hierarchy
Technique 2: Boxplots with binned data point into upper hierarchy
Autocorrelation and Autocovariance
Same as Correlation (normalized Covariance) used to describe (linear) relationship between Feature X and Feature Y
Auto = self ACF = Autocorrelation Function
Working with Time Series
Introduction of models for modeling different types of time series data so that we can do forecasting
Remainder can also have time series pattern which need to be carefully modeled and removed, then the left residue (prediction error) should be normal distributed along time
Noticed successful STL should have below appearance
- histogram of remainder is close to normal distribution
- boxplot of remainder at seasonal level (like month) is stable
Moving Average Models MA(q)
Microsoft announces one news everyday, and its stock will be affected by today’s and last 2-days news
A model has only short memory of the previous noise
ACF: sharp cut off after order q; can identify whether you data can be modeled as with what order
Autoregression Models AR(p)
Today’s value is slightly different from a combination of the last day’s values
ACF: Exponential decay; can not identify order
Partial Autocorrelation
The correlation that is not accounted for all of the lags in between
Auto-Regressive Moving Average Model ARMA(p,q)
Used when both ACF and PACF shows slow decay
Auto-Regressive Integrated Moving Average Model ARIMA(d,p,q)
Differencing
Non-stationary time series can have stationary differences
Higher order trends can be turned into stationary models through repeated differencing
Model Details
Exponentially Weighted Moving Averages Model EWMA / Simple Exponential Smoothing Model SES
Most widely used for business applications / forecasting
Forecasting in Context
Reference
Time Series Analysis (TSA) in Python - Linear Models to GARCH
2. Spatial Data Analysis
Mobile marketing / smart watch data / oil exploration / real estate pricing / transportation network / crimes data …
Introduction to Spatial Data
Types of Spatial Data
- Points (location only)
- Polygons
- Pixels / Raster (location + count/density shown as colors)
Types of Distance
- Euclidean distance (physical distance; use built-in tool to calculate since earth is round)
- Driving / Walking distance
- Adapted to the local area (like same building) Distance Matrix
Visualize relationship of different features and overlay multiple features in one plot by various way like bubble size or filled color
Kernel Density Estimation KDE
- Go-to method for density / event rate estimation
- “Nonparametric”, meaning that there is a bump on each point
K-Nearest Neighbour
Localized technique of probability estimation
- Classification by majority vote
- Regression by average vote
- Take care
- scale sensitive: consider normalization
- selection of K and weight of distance
Working with Spatial Data
Spatial Poisson Processes
Probability estimation of occurrence count in an area in a period, based on Poisson distribution, which is a discrete probability distribution
Variogram
Estimate the (label) covariance between samples with spatial changes in units, which is just like the ACF and PACF for time series Input data is labeled Consider overall data in dataset
- Reference
- Semi-Variogram: Nugget, Range and Sill
- Estimation and Modeling of Spatial Correlations (about the second-order stationary assumption)
Kriging / Gaussian Process / Spatial Regression
Overall technique of probability estimation based on Variogram providing the covariance k
k can be modelled by arbitrary covariance function in Variogram stage
Interpolation method for estimating the property of unsampled location, so can get the complete map
Spatial Data in Context
3. Text Analytics
Summary of text / compare between text or classification
Introduction to Text Analytics
Word Frequency
- Frequency plot
- Cumulative plot to examine the cleaned up dataset
Stemming
Only for English
connection, connected, connective, connecting –> connect
- Porter’s Algorithm
- V is one or more vowels (A, E, I, O, U)
- C is one or more consonants
- All words are of the following form
- [C]VC{m}[V], optional in brackets and stack times in parentheses
- For each words, we check whether it obeys a condition, and shorten or lengthen it accordingly
Feature Hashing (Dimensionality Reduction)
Fast and space-efficient way of vectorizing features, by applying a hash function to the features and using their hash values as indices directly.
Also called hashing (kernel) trick.
Working with Text
Calculating Word Importance by TF-IDF
TF = Term Frequency (the number of times you see a word) IDF = Inverse Document Frequency
TF-IDF is the key factor used in search engines
- TF-IDF is high when
- the term appears many times in few documents
- TF-IDF is low when
- the term appears in almost all documents
- the term does not appear often
Introduction to Natural Language Processing
Text Analytics in Context
4. Image Analysis
Photographs / Security cameras / Check reader / Medical images / Art work analysis …
Introduction to Image Analysis
Read / Plot Image
- misc function from scipy (output a numpy array with rows and columns as the image size)
- imshow from matplotlib.pyplot
- glob.glob for multiple images reading
Image Properties
- Examine the distribution of gray scale
- Histogram (ideal image has nearly uniform distribution)
- CDF (ideal image has a straight line)
- Adaptive Histogram Equalization to improve contrast
- The histogram equalization algorithm attempts to adjust the pixel values in the image to create a more uniform distribution
- exposure.equalize_adapthist from skimage
- Before and After Equalization
Image Manipulation
- Resize by misc.imresize from scipy
- Rotate by interpolation.rotate from scipy.ndimage
Blurring and Denoising
Pre-whitening together with Denoising can improve the sobel edge detection result –> clearer edge
The reason may be that it covers and removes the unnecessary / meaningless portion of image, which also happens in time series analysis when doing cross-correlation function for two series case
- Pre-whitening to add noise
- Denoising by gaussian_filter / median_filter from scipy.ndimage.filters
Working with Images
Feature Extraction
Sobel Edge Detection
Detecting edge by looking for single direction gradients within selected area Viola and Jones Method in Course 7
Segmentation
Remove noise or unwanted portion
Simplest way –> threshold (move out the points under or over threshold)
Harris Corner Detection
Compute Q matrix in E function representing a ellipse, and detect a corner when Q has 2 large eigenvalues which illustrates smaller principal axes of ellipse
Introduction to Mathematical Morphology
Dilation and Erosion
Fill or remove the center pixel of specific shape to do dilation and erosion