Month: April 2023

Design to Purge Data Based on Business Requirements – Keeping Data Safe and Secure

The primary difference between purging and deleting has to do with whether or not the data is gone for good. When you purge data, it means there is no way to recover it. If something called a soft delete is enabled, it means that the data can be recovered during a preconfigured timeframe. After that timeframe, the data will be purged. Soft‐deleted data continues to consume storage space in your database or on your datastore, like an Azure storage container. The storage consumption is only freed when the data is purged. Like all scenarios related to retention and data deletion discussed up to now, you need to first decide which data has a sensitivity level that must adhere to a retention policy. Once you determine which data must be deleted, you need to determine at what age the data should be removed. After identifying those two pieces of information, you might consider deleting the data from your database using the DELETE SQL command. The following command removes all the data from the SUBJECTS table where the CREATE_DATE value is 3 months old from the current date:

 DELETE FROM SUBJECTS WHERE CREATE_DATE < DATEADD(month, -3, GETDATE())

When the amount of data is large, this kind of query can have a significant impact on performance. The impact can result in latency experienced by other data clients inserting, updating, or reading data from the same database. A very fast procedure for removing data is to place the data onto a partition that is defined by the column that defines the lifecycle of the data, for example, using the CREATE_DATE in the SUBJECTS table as the basis for a partition. When the data on that partition has breached the retention threshold, remove the partition, and the data is removed. Another approach is to select the data you want to keep, use the result to insert it into another table, and then switch the tables. This is achieved using CTAS, which was introduced in Chapter 2, “CREATE DATABASE dbName; GO,” along with the partitioning concept mentioned previously. The following SQL snippet is an example of how to achieve the purging of data without using the DELETE SQL command:

 SELECT * INTO SUBJECTS_NEW FROM SUBJECTS 
 WHERE CREATE_DATE> DATEADD(month, -3, GETDATE())
 RENAME OBJECT SUBJECTS TO SUBJECTS_OLD
 RENAME OBJECT SUBJECTS_NEW TO SUBJECTS
 DROP TABLE SUBJECTS_OLD

The SELECT statement retrieves the data with a creation date that is not older than 3 months and places the data into a new table. The existing primary table named SUBJECTS is renamed by appending _OLD to the end. Then the newly populated table that was appended with _NEW is renamed to the primary table name of SUBJECTS. Lastly, the table containing data that is older than 3 months is dropped, resulting in its deletion.