An Example of Using LEAD and LAG Functions from SQL Server 2012

I recently encountered a technical interview question and it was something like this:

A company has its door access logging system and the company policy dictates that every employee’s card-in/card-out records shall reflect his/her true presence in or absence from office. But somehow, some employees skip cardings or tailgate behind others. Therefore, the logging system may have messy records with multiple “card-in’s” before one “card-out” or records that cannot be paired. Your task is to write a script to calculate the exact amount of time that the employee is in office.

As I’ve dwelled too long on the database administration side, I actually had no freaking clue how to write an effcient script without resorting to the RBAR appoach, so I acknowledged my defeat presented my hnoest thoughts and admitted that I would need Google for a lot more help 🙂

Now I am back at my shabby cozy little cubicle from my potential employer’s shiny high-end downtown office building, I did start Googling and found out the two newly-introduced functions LEAD and LAG from SQL Server 2012 might just be the solution that I was looking for.

First of all, for the sake of explanation, I fabricate some fake entry logs with courtesy again to two of the most prominent NBA’s superstars:

[sourcecode language=”sql”]
USE TestDB
GO

CREATE TABLE CardingLogs
(
CardingTime DATETIME, –time holder get carded by the system
Name VARCHAR(50), –name of the card holder
InOut VARCHAR(4) –with ‘in’ meaning card-in and ‘out’ meaning card-out
)
GO

INSERT CardingLogs VALUES
(‘2014-03-30T07:30:00’, ‘Lebron James’, ‘in’),
(‘2014-03-30T07:45:00’, ‘Lebron James’, ‘in’),
(‘2014-03-30T08:00:00’, ‘Lebron James’, ‘in’),
(‘2014-03-30T09:00:00’, ‘Lebron James’, ‘out’),
(‘2014-03-30T10:00:00’, ‘Lebron James’, ‘out’),
(‘2014-03-30T11:00:00’, ‘Lebron James’, ‘out’),
(‘2014-03-30T12:00:00’, ‘Lebron James’, ‘out’),
(‘2014-03-30T14:00:00’, ‘Lebron James’, ‘in’),
(‘2014-03-30T15:00:00’, ‘Lebron James’, ‘in’),
(‘2014-03-30T16:00:00’, ‘Lebron James’, ‘out’),
(‘2014-03-30T17:00:00’, ‘Lebron James’, ‘in’)
GO

INSERT CardingLogs VALUES
(‘2014-03-30T07:30:00’, ‘Dwyane Wade’, ‘out’),
(‘2014-03-30T07:45:00’, ‘Dwyane Wade’, ‘in’),
(‘2014-03-30T08:00:00’, ‘Dwyane Wade’, ‘out’),
(‘2014-03-30T09:00:00’, ‘Dwyane Wade’, ‘in’),
(‘2014-03-30T10:00:00’, ‘Dwyane Wade’, ‘out’),
(‘2014-03-30T11:00:00’, ‘Dwyane Wade’, ‘in’),
(‘2014-03-30T12:00:00’, ‘Dwyane Wade’, ‘out’),
(‘2014-03-30T14:00:00’, ‘Dwyane Wade’, ‘out’),
(‘2014-03-30T15:00:00’, ‘Dwyane Wade’, ‘in’),
(‘2014-03-30T16:00:00’, ‘Dwyane Wade’, ‘out’),
(‘2014-03-30T17:00:00’, ‘Dwyane Wade’, ‘out’)
GO
[/sourcecode]

And then if you’d like to see what’s inside the table with SELECT * FROM CardingLogs, you’ll get the below output:

As illustrated by the red arrow, the programming logic is to find if the current ‘out’ record is ‘leading’ an immediate ‘in’ record behind; or put it another way, if the current ‘in’ is ‘lagged’ by an immediate ‘out’ record in front of itself. And hence the function LEAD() and LAG() come in handy. My approach would be use the lag function to mark all the ‘out’ rows (in green) which can be described as the first ‘out’ after an ‘in’ (or a series of ‘in’s’). Below query

[sourcecode language=”sql”]
SELECT *,
IsFO =
CASE
WHEN (InOut = ‘out’ AND LAG(InOut, 1, 0) OVER (PARTITION BY Name ORDER BY CardingTime ASC) = ‘in’) THEN 1
ELSE 0
END
FROM CardingLogs
[/sourcecode]

shows the “marked” rows as follows:
MarkedRows1

Now what we need to do is to calculate the time diffierences between all rows marked with ‘1’ and their immediate preceeding rows marked with ‘0’ as illustrated by the green box in the above resultset. And which function can you think of that can do this job with a breeze? You betcha it is the LAG() again! And the final script with a little help from our buddy common table expression is like this:

[sourcecode language=”sql”]
WITH TMP AS
(
SELECT DATEDIFF(MINUTE, LAG(CardingTime, 1, 0) OVER(PARTITION BY Name ORDER BY CardingTime ASC), CardingTime) AS TimeInOffice,
*,
IsFO =
CASE
WHEN (InOut = ‘out’ AND LAG(InOut, 1, 0) OVER (PARTITION BY Name ORDER BY CardingTime ASC) = ‘in’) THEN 1
ELSE 0
END
FROM CardingLogs
)
SELECT Name, SUM(TimeInOffice) AS TotalTimeInOffice FROM tmp
WHERE IsFO = 1
GROUP BY Name
[/sourcecode]

The final resultset:

CardingResultSet

And if you’d like a better understanding of LEAD and LAG function, do not hesitate to consult Microsoft’s official explanation of them here.


Posted

in

,

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *