Querying Data with Transact-SQL - Microsoft Certificate in Data Science 2
1. Introduction to Transact-SQL
-
What is Transact-SQL
- Structured Query Language (SQL)
- T-SQL is for SQL server and Azure SQL Database
- SQL is declarative, not procedural – describe what you want, don’t specify steps
-
Relational Databases (like the relationship used in Power Pivot between different tables)
-
Entities are represented as relations (tables), in which their attributes are represented as domains (columns)
- Most relational databases are normalized, with relationships defined between tables through primary and foreign keys
-
- Schemas and Object Names
-
Schemas are namespaces for database objects (tables)
-
Fully-qualified names ([ ] means optional)
-
[server_name.][database_name.][schema_name.]object_name
- in most cases we abbreviate into
schema_name.object_name
-
-
- Types
- Data Manipulation Language (DML) – for querying and modifying data (mainly focus on)
- Data Definition Language (DDL) – for defining database objects
- Data Control Language (DCL) – for assigning security permissions
- Data Manipulation Language (DML) – for querying and modifying data (mainly focus on)
The SELECT Statement
-
Sequence of Statement Running
- Main logic here is that grab data from certain source, and do conditioning, filtering and sorting
-
Basic Examples
-
-- All columns (not recommended) SELECT * FROM Production.Product;
-
-- Specific columns SELECT Name, ListPrice FROM Production.Product;
-
-- Expressions and Aliases SELECT Name AS Product, ListPrice * 0.9 AS SalePrice FROM Production.Product
-
Working with Data Types
- Implicit Conversion: compatible data types can be automatically converted between subtypes
- Explicit Conversion: requires an explicit conversion function
CAST
CONVERT
/PARSE
has options to formatting dataTRY_CAST
/TRY_CONVERT
/TRY_PARSE
return NULL instead of stop due to errorSTR
number to string
Working with NULLs
-
NULL represents a missing or unknown value
-
ANSI behavior for NULL values
- The result of any expression containing a NULL value is NULL
2 + NULL = NULL
'MyString:' + NULL = NULL
- Equality comparisons always return false for NULL values
NULL = NULL
returns false (unknown cannot be equal to unknown)NULL IS NULL
returns true (unknown is unknown)
- The result of any expression containing a NULL value is NULL
-
ISNULL
column / variable, value-
Returns value if the column or variable is NULL
-
-- If cannot CAST AS integer , return 0 instead of NULL -- NULL numbers = 0 SELECT Name, ISNULL(TRY_CAST(Size AS integer),0) AS NumericSize FROM SalesLT.Product;
-
-- If Color/Size is NULL, return blank instead of NULL -- NULL strings = blank string SELECT ProductionNumber, ISNULL(Color,'') + ', ' + ISNULL(Size,'') AS NumericSize FROM SalesLT.Product;
-
NULLIF
column/variable, value-
-- If Color is Multi, make it as NULL is output -- Multi color = NULL SELECT Name, NULLIF(Color, 'Multi') AS SingleColor FROM SalesLT.Product;
-
-
CASE
, likeif-else
in python-
/* Select Name, and then go to the other column: if SellEndDate is NULL, then output On Sale (can use ISNULL function), otherwise Discontinued, in SalesStatus column */ SELECT Name, CASE WHEN SellEndDate IS NULL THEN 'On Sale' ELSE 'Discontinued' END AS SalesStatus FROM SalesLT.Product;
-
/* If Size is number, output number; if Size is NULL, output n/a */ SELECT Name, CASE Size WHEN 'S' THEN 'Small' WHEN 'M' THEN 'Medium' WHEN 'L' THEN 'Large' WHEN 'XL' THEN 'Extra-Large' ELSE ISNULL(Size, 'n/a') END AS ProductSize FROM SalesLT.Product;
-
-
2. Querying Tables with SELECT
Removing Duplicates
-
SELECT ALL
– return all rows -
SELECT DISTINCT
– return unique only -
SELECT DISTINCT Color FROM Production.Product;
Sorting Resulting
-
/* ORDER BY Can use unselected columns for sorting */ SELECT ProductCategory AS Category, ProductName FROM Production.Product ORDER BY Category, Price DESC;
-
/* TOP -- limiting sorted results SELECT TOP (N) | TOP (N) Percent SELECT TOP (N) WITH TIES */ SELECT TOP 100 Name, ListPrice FROM SalesLT.Product ORDER BY ListPrice DESC;
-
/* OFFSET and FETCH -- paging through result like bbs pages (dependent on ORDER BY clause) */ SELECT Name, ListPrice FROM SalesLT.Product ORDER BY ProductNumber OFFSET 10 ROWS FETCH FIRST 10 ROWS ONLY;
Filtering and Predicates
-
LIKE
with glob-
-- List info about products taht have a product number beginning FR SELECT productnumber, Name, ListPrice FROM SalesLT.Product WHERE ProductNumber LIKE 'FR%'; -- % for no matter length
-
-- Filter the previous query to ensure that the product number contains two sets of two digits SELECT Name, ListPrice FROM SalesLT.Product WHERE ProductNumber LIKE 'FR-_[0-9][0-9]_-[0-9][0-9]'; -- _ for any one character
-
-
Combining Multiple Conditions
-
-- Find products that have a category ID of 5,6,7 and have a sell end date SELECT ProductCategoryID, Name, ListPrice, SellEndDate FROM SalesLT.Product WHERE ProductCategoryID IN (5,6,7) AND SellEndDate IS NULL;
-
3. Querying Multiple Tables with Joins (Columns)
-
Introductions to Joins – combine rows from multiple tables by specifying matching criteria
-
Usually based on on primary key - foreign key relationships (like VLOOKUP)
- Collection of tables of Employee and SalesOrder
-
-- JOIN statement in FROM clause -- SQL will join first, and then select from new table SELECT ... FROM Table1 JOIN Table2 ON <on_predicate>;
-
Inner Joins
Default of JOIN statement, starting with Cartesian product then adds filter
-
-- Normal Inner Join SELECT p.Name As ProductName, c.Name AS Category FROM SalesLT.Product AS p JOIN SalesLT.ProductCategory As c ON p.ProductCategoryID = c.ProductCategoryID;
-
-- Join more than 2 tables SELECT * FROM SalesLT.SalesOrderHeader AS oh JOIN SalesLT.SalesOrderDetail AS od ON od.SalesOrderID = oh.SalesOrderID JOIN SalesLT.Product AS p ON od.ProductID = p.ProductID
-
-- Multiple join predicates SELECT * FROM SalesLT.SalesOrderHeader AS oh JOIN SalesLT.SalesOrderDetail AS od ON od.SalesOrderID = oh.SalesOrderID JOIN SalesLT.Product AS p ON od.ProductID = p.ProductID AND od.UnitPrice < p.ListPrice -- Like what does WHERE do as a filtering
Outer Joins
Starting with Cartesian product, filtered, adds back non-matching rows (with NULL placeholders)
- Preserved one table –> matches from the other –> additional rows (NULL) added to results for non-matched rows
- Return all employees, and sales info of those has order amount; employee has no order amount will has NULL in that column
-
-- Left outer join SELECT * FROM SalesLT.Product AS p LEFT JOIN SalesLT.SalesOrderDetail AS od ON p.ProductID = od.ProductID LEFT JOIN SalesLT.SalesOrderHeader AS oh ON od.SalesOrderID = oh.SalesOrderID
-
-- Outer join combined with inner join SELECT * FROM SalesLT.Product AS p LEFT JOIN SalesLT.SalesOrderDetail AS od ON p.ProductID = od.ProductID LEFT JOIN SalesLT.SalesOrderHeader AS oh ON od.SalesOrderID = oh.SalesOrderID INNER JOIN SalesLT.ProductCategory AS c ON p.ProductCategoryID = c.ProductCategoryID
Cross Joins
Cartesian product only, which outputs all possible combinations
Self Joins
Compare rows in same table to each other – create one more instances as same as the original table using alias, then left join