a god of the gaps

by Jan Theodore Galkowski

This little study was prompted by a question posed by baroldgene on the Ars Technica forums. He said:
I have a database full of start times and end times for shifts at a computer lab. I want to calculate dead time (time with no shifts scheduled). I could easily do this with looping PHP and SQL statements but this causes lots of overhead and we already have functions that take 10 minutes or longer to run, so I'm trying to do it in as few SQL queries as possible.

I've thought of trying to get an entire schedule out and process it in PHP, but I'm not sure if thats better than using a complex SQL query. To make matters more complicated their can be shifts that overlap. (this is due to the idea that there are multiple positions in a lab. i.e. we can man 2 positions and the schedules don't necessarily line up).

So does anyone have any interesting thoughts/ideas on how I could process this information? I was trying to use a complex SQL query, but I can't seem to find a good way to get all the information that I want out.


Now, baroldgene did not, at first, describe the schemas used, so i made some assumptions and, using MySQL as the realization of a relational database, began with:


CREATE TABLE shifts (
id INT NOT NULL,
whenCreated TIMESTAMP,
startOfShift TIMESTAMP,
endOfShift TIMESTAMP
) ;


That first "whenCreated" TIMESTAMP is an attempt to ground MySQL's penchant for making first TIMESTAMP fields auto-updating, to preserve the actual carriers of information, "startOfShift" and "endOfShift". The "id" field is chosen is a shift designator, and is tied to, say, an individual doing a job. Shifts may overlap. A shift in progress is designated by making "endOfShift" be NULL. Shifts in progress are not considered by this code. the proper thing to do if they were is to insert a placeholder TIMPSTAMP in their "endOfShift" fields when the calculation is run, probably the value provided by "NOW()".

After data is loaded, it's advisable to add:


ALTER TABLE shifts ADD PRIMARY KEY( id ) ;
ALTER TABLE shifts ADD INDEX ( startOfShift, endOfShift ) ;
ANALYZE TABLE shifts ;
OPTIMIZE TABLE shifts ;


For purposes of illustration, I'm using this data set for testing:

[Test data for gaps determination]

The code itself is:


DROP TABLE IF EXISTS times ;
CREATE TABLE times (
id INT,
whenCreated TIMESTAMP NOT NULL,
eventTime TIMESTAMP NOT NULL,
openOrClosed INT NOT NULL DEFAULT 0
) ;

INSERT INTO times ( id, eventTime, openOrClosed )
SELECT id, DATE_ADD( startOfShift, INTERVAL 1 SECOND), 1 FROM shifts
WHERE endOfShift IS NOT NULL
ORDER BY startOfShift ;

INSERT INTO times ( id, eventTime, openOrClosed )
SELECT id, endOfShift, -1 FROM shifts
WHERE endOfShift IS NOT NULL
ORDER BY endOfShift ;

ALTER TABLE times ADD PRIMARY KEY (eventTime, openOrClosed, id) ;

ANALYZE TABLE times ;
OPTIMIZE TABLE times ;

DROP TABLE IF EXISTS summations ;

CREATE TABLE summations AS
SELECT A.id, A.eventTime, A.openOrClosed, SUM(B.openOrClosed) AS nestingDepth
FROM times A, times B
WHERE B.eventTime <= A.eventTime
GROUP BY A.eventTime
ORDER BY A.eventTime ;

ALTER TABLE summations ADD PRIMARY KEY (eventTime) ;

ANALYZE TABLE summations ;
OPTIMIZE TABLE summations ;

CREATE TABLE gaps AS
SELECT A.id AS startingID, B.id AS endingID, A.eventTime AS gapStart, DATE_SUB(B.eventTime, INTERVAL 1 SECOND) as gapEnd
FROM summations A, summations B
WHERE A.nestingDepth = 0
AND 1 < (UNIX_TIMESTAMP(B.eventTime) - UNIX_TIMESTAMP(A.eventTime))
AND B.eventTime = (SELECT MIN(C.eventTime) FROM summations C
WHERE A.eventTime < C.eventTime)
ORDER BY A.eventTime ;

ALTER TABLE gaps ADD PRIMARY KEY (gapStart) ;


A "times" table is built, having all the "startOfShift" and "endOfShift" times dumped into it, with "openOrClosed" indicating whether the time is the opening of a shift or its close. The procedure works only if times are distinct, so an offset of one second is added to shift starts to assure they are, as one person might begin a shift the same moment another finished.

The "times" table for the sample is shown below:

[Test data for gaps determination]

From "times", a "summations" table is built which features a "nestingDepth". Because shifts can overlap, even in ways which parenthesized computational expressions cannot, it is import to constrain gap seeking to places where all shifts are accounted for, that is, where "nestingDepth" is zero. The production of "nestingDepth" is done using a GROUP BY sum.

The "summations" table for the sample is shown below:

[Test data for gaps determination]

Finally, a "gaps" table is produced giving the starting time and ending time of any gaps in coverage of shifts. The "gaps" table resulting from the sample is:

[Resulting gaps]

Click to return to the main page.

Last changed on 17th May 2007, The Smalltalk Idiom