DAX introduction
This article shows basic differences between SQL and DAX.
Using the DAX studio query builder, you will soon find out DAX is not that difficult, it is just different than SQL.
Simpliest Query
You need to use EVALUATE
keyword to return data. In it’s simpliest form, you just tell DAX what table you want to query.
SELECT * FROM DIM.AIRLINES
EVALUATE ('DIM AIRLINES')
Count Rows
It is easy to retrieve number of rows of a table, just use COUNTROWS
function.
SELECT COUNT(*) as [Number of Rows]
FROM DIM.AIRLINES
EVALUATE
ROW (
"Number of Rows",
COUNTROWS ( 'DIM AIRLINES' )
)
Filtering
If you want only a part of the records in a table, use FILTER
function:
SELECT *
FROM DIM.AIRLINES
WHERE COUNTRY_OF_ORIGIN = 'Italy'
EVALUATE
FILTER (
'DIM AIRLINES',
[COUNTRY_OF_ORIGIN] = "Italy"
)
If you need more conditions, just use &&
for AND and ||
for OR. And as many parentheses as you wish.
Grouping / Distinct values
Use SUMMARIZECOLUMNS
to get distinct values:
SELECT AIRLINE_TYPE, COUNTRY_OF_ORIGIN
FROM DIM.AIRLINES
GROUP BY AIRLINE_TYPE, COUNTRY_OF_ORIGIN
EVALUATE
SUMMARIZECOLUMNS(
'DIM AIRLINES'[AIRLINE_TYPE],
'DIM AIRLINES'[COUNTRY_OF_ORIGIN]
)
Measures
Nice, distinct values, but how do I add a measure value to those? Like this:
SELECT a.AIRLINE_NAME,
AVG(d.DELAY_MINUTES) AS [AvgDelay]
FROM DIM.AIRLINES AS a
JOIN FACT.DEPARTURES AS d
ON a.AIRLINE_ID = d.AIRLINE_ID
GROUP BY a.AIRLINE_NAME;
EVALUATE
SUMMARIZECOLUMNS (
'DIM AIRLINES'[AIRLINE_NAME],
"AVG Delay in Minutes", [AVG Delay in Minutes]
)
Notice the difference between SQL and DAX. You need to join two tables in SQL, in DAX it is much easier to retrieve the data. The query uses defined relationships from the model.
Sometimes you may need a value of a measure, without any context / filters. Use ROW
function to display it. This DAX query returns value of Total Revenue measure:
EVALUATE
ROW ( "Total Revenue", [Total Revenue] )
Sorting
When comparing data, sets match
expectation needs sorted data. This is very similar, just use ORDER BY
as in SQL. You can use any column name(s).
SELECT a.AIRLINE_NAME,
AVG(d.DELAY_MINUTES) AS [AvgDelay]
FROM DIM.AIRLINES AS a
JOIN FACT.DEPARTURES AS d
ON a.AIRLINE_ID = d.AIRLINE_ID
GROUP BY a.AIRLINE_NAME
ORDER BY AvgDelay DESC;
EVALUATE
SUMMARIZECOLUMNS (
'DIM AIRLINES'[AIRLINE_NAME],
"AVG Delay in Minutes", [AVG Delay in Minutes]
)
ORDER BY [AVG Delay in Minutes] DESC
Limit Number of Rows
Here is an example that gets top three airlines with biggest average delay.
SELECT TOP (3)
a.AIRLINE_NAME,
AVG(d.DELAY_MINUTES * 1.0) AS [Avg Delay in Minutes]
FROM DIM.AIRLINES AS a
JOIN FACT.DEPARTURES AS d
ON d.AIRLINE_ID = a.AIRLINE_ID
GROUP BY a.AIRLINE_NAME
ORDER BY [Avg Delay in Minutes] DESC;
EVALUATE
TOPN (
3,
SUMMARIZECOLUMNS (
'DIM AIRLINES'[AIRLINE_NAME],
"AVG Delay in Minutes", [AVG Delay in Minutes]
),
[AVG Delay in Minutes],
DESC
)
ORDER BY [AVG Delay in Minutes] DESC
Trick
This example shows two tricks. First one uses function SWITCH
- it compares hardcoded value TRUE
with two conditions and returns text.
The second returns hardcoded value within expected range. Both queries are DAX.
EVALUATE
ROW (
"Revenue Actual",
SWITCH (
TRUE,
[Total Revenue] > 500000000, "to high",
[Total Revenue] < 100000000, "to low",
"within expected range"
)
)
EVALUATE
ROW (
"Revenue Expected",
"within expected range"
)
You can use similar solutions as a fallback if you have troubles to express your expectation using built-in expectations in CAT. Issue similar two queires and use sets match
expectation.