Skip to content

The Dummy Programmer

Stories of daily programming

  • Home
  • My other dummy projects
  • Games
    • Space Y: An Asteroids clone
  • Services
    • What’s my user agent
    • What’s my IP address
  • About
  • Toggle search form

Merge two or more historical tables with Sql Server T-SQL – Part 2

Posted on April 9, 2024April 9, 2024 By No Comments on Merge two or more historical tables with Sql Server T-SQL – Part 2

In the previous post we saw how to merge two or more historical tables with a T-Sql script. We will now propose a different solution to the problem that also takes into account the presence of gaps in the validity periods of the records.

We’ll start from the same test database we used for the previous post, so you can go and get the same SQL script to create the test tables we need.

Another solution…

First of all, in this solution, we are going to put all start dates and all end dates in a row. Below is the query that performs this task:

WITH DATES
AS
(
	SELECT E.ID AS EmployeeID, E.HireDate AS DATE_CHANGE
	FROM Employees E
	UNION
	SELECT E.ID, DATEADD(DD, 1, E.EndDate)
	FROM Employees E
	WHERE E.EndDate IS NOT NULL
	UNION
	SELECT HC.EmployeeID, HC.FromDate
	FROM HistEmpCompany HC
	UNION
	SELECT HC.EmployeeID, DATEADD(DD, 1, HC.ToDate)
	FROM HistEmpCompany HC
	WHERE HC.ToDate IS NOT NULL
	UNION
	SELECT HD.EmployeeID, HD.FromDate
	FROM HistEmpDept HD
	UNION
	SELECT HD.EmployeeID, DATEADD(DD, 1, HD.ToDate)
	FROM HistEmpDept HD
	WHERE HD.ToDate IS NOT NULL
)
SELECT *
INTO #DATES
FROM DATES;

In this way we have a list of dates when something, in this step we don’t know what, has been changed. Note the call to DATEADD in the queries that extract the end date of the validity periods: this is necessary because something could change also the day after the day following the end of a period. This could create duplicates rows if adjacent periods exist, but the UNION operator will remove them.

The next step involves assigning an ID to each row of the table. The ID will be unique within an employee, so we may find the same ID across different employees. This ID will be used to assign each start date an end date, calculated as the start date minus one day of the following period.

Here is the code for this second step:

WITH DATES_WITH_ID
AS
(
	SELECT	ROW_NUMBER() 
                     OVER(PARTITION BY D.EmployeeID ORDER BY D.DATE_CHANGE) AS ROW_ID, 
		D.EmployeeID,
		D.DATE_CHANGE
	FROM #DATES D
)
SELECT *
INTO #DATES_WITH_ID
FROM DATES_WITH_ID

And now we are ready to rebuild our validity periods:

SELECT T1.ROW_ID, 
       T1.EmployeeID, 
       T1.DATE_CHANGE AS DATE_START, 
       DATEADD(DD, -1, T2.DATE_CHANGE) AS DATE_END
INTO #PERIODS
FROM #DATES_WITH_ID T1
LEFT JOIN #DATES_WITH_ID T2 ON T1.EmployeeID = T2.EmployeeID
				AND T1.ROW_ID = (T2.ROW_ID - 1)

If you execute the following query:

SELECT *
FROM #PERIODS
ORDER BY EmployeeID, DATE_START

and look at the table #PERIODS, you will see the following:

The content of table #PERIODS

As said before, we have built a table with all the employee periods. Now for each period we must take the values we need from historical tables. To do this simply take the value at the beginning of the period, as in the following query:

DECLARE @MAX_DATE	DATETIME = '99991231'

SELECT E.ID, T.EmployeeID, 
       E.LastName, E.LastName, HC.CompanyName, HD.DeptName, 
       T.DATE_START, T.DATE_END
FROM #PERIODS T
LEFT JOIN Employees E		ON T.EmployeeID = E.ID 
          AND T.DATE_START BETWEEN E.HireDate AND ISNULL(E.EndDate, @MAX_DATE)
LEFT JOIN HistEmpCompany HC	ON T.EmployeeID = HC.EmployeeID 
          AND  T.DATE_START BETWEEN HC.FromDate AND ISNULL(HC.ToDate, @MAX_DATE)
LEFT JOIN HistEmpDept HD	ON T.EmployeeID = HD.EmployeeID 
          AND  T.DATE_START BETWEEN HD.FromDate AND ISNULL(HD.ToDate, @MAX_DATE)
ORDER BY T.EmployeeID, T.DATE_START

This is the final result:

First query output

Let’s analyze the result

Let’s look at some special cases in our final result. Take for example row number 2 where we have a NULL in the company field in period 2005-05-01 to 2006-04-30. This is correct, since if you look at the historical table HistEmpCompany you will see that has a gap for employee Davolio from 2005-05-01 to 2006-04-30.

If you look carefully, you will see that there are rows with the ID, Last name, First name fields NULL. Why this? We might consider this an anomaly in source data. Employees King and Fuller were hired on 2016-05-09 and 2005-02-01 respectively but their historical tables start from dates that are prior to the their hire date. This explains their gaps in the columns containing ID, Last name and Full name.

I also want you to point out line 15: the employee King quit on 2023-12-31 but his company’s history is still open (that is has NULL in the field ToDate). Row 15 correctly report this since it has all fields but company set to NULL. The field “company” correctly contains the value “Subcompany 1”.

Just one last thing before we close: execute this update to set the end date of the last record in company history for employee King:

UPDATE HistEmpCompany SET ToDate = '20231231' WHERE EmployeeID = 2 AND CompanyName = 'SubCompany2'

Now if you run the script again you will see the following output:

Query output with data modified

At row 15, apart for date start equals to 2024-01-01, we now have all NULL values. This is correct because all histories for that employee are closed as of 2023-12-31 and therefore there is no data to show. If you want you can get rid of this kind of record adding adding one more step to the query.

This is the final complete version of the script containing this last step:

DROP TABLE IF EXISTS #DATES;
DROP TABLE IF EXISTS #DATES_WITH_ID;
DROP TABLE IF EXISTS #PERIODS;

WITH DATES
AS
(
	SELECT E.ID AS EmployeeID, E.HireDate AS DATE_CHANGE
	FROM Employees E
	UNION
	SELECT E.ID, DATEADD(DD, 1, E.EndDate)
	FROM Employees E
	WHERE E.EndDate IS NOT NULL
	UNION
	SELECT HC.EmployeeID, HC.FromDate
	FROM HistEmpCompany HC
	UNION
	SELECT HC.EmployeeID, DATEADD(DD, 1, HC.ToDate)
	FROM HistEmpCompany HC
	WHERE HC.ToDate IS NOT NULL
	UNION
	SELECT HD.EmployeeID, HD.FromDate
	FROM HistEmpDept HD
	UNION
	SELECT HD.EmployeeID, DATEADD(DD, 1, HD.ToDate)
	FROM HistEmpDept HD
	WHERE HD.ToDate IS NOT NULL
)
SELECT *
INTO #DATES
FROM DATES;

WITH DATES_WITH_ID
AS
(
	SELECT	ROW_NUMBER() OVER(PARTITION BY D.EmployeeID 
                                  ORDER BY D.DATE_CHANGE) AS ROW_ID, 
			D.EmployeeID,
			D.DATE_CHANGE
	FROM #DATES D
)
SELECT *
INTO #DATES_WITH_ID
FROM DATES_WITH_ID

SELECT T1.ROW_ID, T1.EmployeeID, 
       T1.DATE_CHANGE AS DATE_START, DATEADD(DD, -1, T2.DATE_CHANGE) AS DATE_END
INTO #PERIODS
FROM #DATES_WITH_ID T1
LEFT JOIN #DATES_WITH_ID T2 ON T1.EmployeeID = T2.EmployeeID
				AND T1.ROW_ID = (T2.ROW_ID - 1)

SELECT P.EmployeeID, MAX(DATE_START) MAX_START_DATE
INTO #MAX_LAST_START_DATES
FROM #PERIODS P
GROUP BY P.EmployeeID


DECLARE @MAX_DATE	DATETIME = '99991231'

SELECT E.ID, T.EmployeeID, 
       E.LastName, E.LastName, 
       HC.CompanyName, HD.DeptName, 
       T.DATE_START, T.DATE_END
FROM #PERIODS T
LEFT JOIN Employees E ON T.EmployeeID = E.ID 
          AND T.DATE_START BETWEEN E.HireDate AND ISNULL(E.EndDate, @MAX_DATE)
LEFT JOIN HistEmpCompany HC ON T.EmployeeID = HC.EmployeeID 
          AND  T.DATE_START BETWEEN HC.FromDate AND ISNULL(HC.ToDate, @MAX_DATE)
LEFT JOIN HistEmpDept HD ON T.EmployeeID = HD.EmployeeID 
          AND  T.DATE_START BETWEEN HD.FromDate AND ISNULL(HD.ToDate, @MAX_DATE)
WHERE (E.ID IS NOT NULL OR HC.CompanyName IS NOT NULL OR HD.DeptName IS NOT NULL)
		OR EXISTS(SELECT * 
                          FROM #MAX_LAST_START_DATES X 
                          WHERE T.DATE_START < X.MAX_START_DATE)
ORDER BY T.EmployeeID, T.DATE_START

And this is the final result:

The final outcome

And at this point we can close…
I hope this long post will be useful to you and you can take inspiration to solve your problems related to historical data!

Sql Server

Post navigation

Previous Post: Merge two or more historical tables with Sql Server T-SQL – Part 1

Related Posts

Sql Server error: A severe error occurred on the current command.  The results, if any, should be discarded. Sql Server
SQL Server and SARGables predicates Sql Server
Load a huge amount of data in a Sql Server table BCP
Sql Server error using a database with a CLR assembly CLR
Improving SQL Server cursor performance Sql Server
Sql Server error: Cannot construct data type date, some of the arguments have values which are not valid. Sql Server

Leave a Reply Cancel reply

Please do not insert personal or sensitive data in the comment.


Search

Related posts

  • Merge two or more historical tables with Sql Server…
  • Get all referencing tables of a table in Sql Server
  • Delete all rows from all tables in a Sql Server database
  • Get SQL Server version in T-SQL

Categories

  • .NET 6 (1)
  • .NET 7 (1)
  • AJAX (1)
  • Android (2)
  • Apache (4)
  • ASP.NET (9)
  • ASP.NET MVC (3)
  • Avalonia UI (1)
  • BCP (1)
  • Bitlocker (2)
  • C# (14)
  • CentOS (4)
  • ClosedXML (1)
  • CLR (1)
  • DNS (1)
  • Encryption (3)
  • Excel (2)
  • FuelPHP (3)
  • Games (2)
  • Google Chrome (1)
  • GSuite (1)
  • HTML (1)
  • Imagick (2)
  • Javascript (1)
  • Kindle (1)
  • LAMP (3)
  • Linux (7)
  • MariaDB (2)
  • Mathematics (2)
  • MySql (4)
  • NPOI (1)
  • Office 365 (1)
  • Perl (1)
  • PHP (6)
  • Programming (1)
  • Remote desktop (1)
  • SFTP (2)
  • Sockets (3)
  • Sql Server (20)
  • Sql Server 2016 (14)
  • Sql Server 2019 (1)
  • SSL (4)
  • Task scheduler (1)
  • Telerik ASP.NET AJAX (2)
  • The Dummy Programmer Chat (2)
  • Threading (5)
  • Tools (1)
  • TPL (3)
  • TypeScript (3)
  • Ubuntu (4)
  • Virtualization software (3)
  • Visual Studio (1)
  • Visual Studio Code (2)
  • VueJS (1)
  • Web fonts (1)
  • Web programming (6)
  • Windows (12)
  • Windows 10 (15)
  • Windows Forms (1)
  • Windows Server (6)

Copyright © 2024 The Dummy Programmer | Privacy Policy | Terms of use |

Powered by PressBook Masonry Dark

Manage Cookie Consent
This site doesn’t collect user personal data and doesn’t install profiling or analytical cookies, either its own or from third parties. Read our privacy policy for more info.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
View preferences
{title} {title} {title}