Category: Design for Data Privacy

Design Data Encryption for Data at Rest and in Transit – Keeping Data Safe and Secure

Encryption is a very scientific, mathematics‐heavy concept. The internals are outside the scope of this book, but in simple terms when data is encrypted, it looks like a bunch of scrambled letters and numbers that are of no value. The following is an example of the word csharpguitar using the key created in Exercise 8.1:

 p0syrFCPufrCr+9dN7krpFe7wuwIeVwQNFtySX0qaX3UcqzlRifuNdnaxiTu1XgZoKwKmeu6LTfrH
 rGQHq4lDClbo/KoqjgSm+0d0Ap/y2HR34TFgoxTeN0KVCoVKAtu35jZ52xeZgj1eYZ9dww2n6psGG
 nMRlux/z3ZDvm4qlvrv55eAoSawbCGWOql3mhdfHFZZxLBCN2eZzvBpaTSNaramME54ELMr6ScIJI
 ITq6XJYTFH8BGvPaqhfTTO4MbizwenpijIFZvdn3bzQGbnPElht0j+EQ7aLvWOOyzJjlKcR8MN4jO
 oYNULCZTBi/BVvlhYpUsKxxN+YW27POMAw==

There is no realistic method for anyone or any computer to revert that set of characters back into the original word. That is the power of encryption implemented using public and private keys. Only by having access to the private key can one make sense of that character sequence. The only means for decryption is to use the az keyvault key decrypt Azure CLI cmdlet or a REST API that has access to the private key. This leads well into two very important security concepts that pertain greatly to the storage of data on Azure: encryption‐at‐rest and encryption‐in‐transit.

Data stored in an Azure storage account is encrypted by default. No action is required by you to encrypt your data that is stored in a container. It is encrypted even if it is not used, which is where the name encryption‐at‐rest comes from. The data is simply stored, idle, doing nothing, but is secured by encryption. This kind of protection is intended to defend against a bad actor getting access to the physical hard drive that contains data. When the bad actor attempts to access the data, they will see only the scrambled characters. If they do not have the associated keys, which should only be accessible in a key vault, there is no chance of decrypting the data. Therefore, your data is safe, even when it is resting and not being used. Back in Exercise 3.1 where you created an Azure storage account and an ADLS container, there was a tab named Encryption. That tab includes two radio buttons, as shown in Figure 8.12. The default was to use a Microsoft‐Managed Key (MMK) for the encryption‐at‐rest operation; the other optiom is named Customer‐Managed Key (CMK). If you select CMK, then you can reference a key you have created in an Azure Key Vault to use as a default encryption key.

Storage account encryption is available for customers who need the maximum amount of security due to compliance or regulations. Also notice the Enable Infrastructure Encryption check box. When this box is selected, the data stored in the account is doubly encrypted. Double encryption is available for both data at rest and data in transit. Instead of being encrypted with just one key, the data is encrypted with two separate keys, the second key being implemented at the infrastructure level. This is done for scenarios where one of the encryption keys or algorithms is compromised. When Enable Infrastructure Encryption is selected and one of the encryption keys is compromised, your data is still encrypted with 256‐bit AES encryption by the other key. The data remains safe in this scenario. Another common encryption technology on the Azure platform that is targeted towards databases is Transparent Data Encryption (TDE). TDE protects data at rest on SQL Azure databases, Azure SQL data warehouses, and Azure Synapse Analytics SQL pools. The entire database, data files, and database backups are encrypted using an AES encryption algorithm by default, but like Azure Storage, the encryption key can be managed by the customer or by Microsoft and stored in an Azure key vault.

FIGURE 8.12 Azure storage account encryption type

The opposite of resting is active, which can be inferred to data being retrieved from some remote consumer. As the data moves from the location where it is stored to the consumer, the data can be vulnerable to traffic capture. This is where the concept of encryption‐in‐transit comes into scope. You encrypt data in transit by using TLS 1.2, which is currently the most secure and widely supported version. As previously mentioned, TLS is achieved by using an x509 certificate in combination with the HTTP protocol. Consider the following common Azure product endpoints:

In all cases, the transfer of data happens using HTTPS, meaning the data is encrypted while in transit between the service that hosts it and the consumer who has authorization to retrieve it. When working with Linux, the protocol to use is secure shell (SSH), which ensures the encryption of data in transit; HTTPS is also a supported protocol. An additional encryption concept should be mentioned here: encryption‐in‐use. This concept is implemented using a feature named Always Encrypted and is focused on the protection of sensitive data stored in specific columns of a database. Identification numbers, credit card numbers, PII, and need‐to‐know data are examples of data that typically resides in the columns of a database. This kind of encryption, which is handled client‐side, is intended to prevent DBAs or administrators from viewing sensitive information when there is no business justification to do so.

The final topic to discuss in the section has to do with the WITH ENCRYPTION SQL statement. In Exercise 2.3 you created a view using a statement similar to the following:

 CREATE VIEW [views].[PowThetaClassicalMusic]

In Exercise 5.1 you created a stored procedure using the following command:

 CREATE PROCEDURE brainwaves.uspCreateAndPopulateFactReading

Each of those statements can be used by placing the WITH ENCRYPTION SQL directory after the CREATE command, like the following:

CREATE VIEW [views].[PowThetaClassicalMusic] WITH ENCRYPTION
 CREATE PROCEDURE brainwaves.uspCreateAndPopulateFactReading WITH ENCRYPTION

If you then attempt to view the text for the stored procedure, you will not see it; instead, you will see a message explaining that it is encrypted. Using the WITH ENCRYPTION statement provides a relatively low level of security. It is relatively easy to decrypt for technically savvy individuals; however, it is quick and simple to implement, making it worthy of consideration.

Create a Microsoft Purview Account – Keeping Data Safe and Secure

  1. Log in to the Azure portal at https://portal.azure.com ➢ enter Purview in the search box in the upper middle of the browser ➢ select Microsoft Purview account ➢ select the + Create menu option ➢ select the subscription ➢ select the resource group ➢ enter a Microsoft Purview account name (I used brainjammer) ➢ select a region ➢ leave the managed resources as the default ➢ navigate through the other tabs ➢ leave the defaults ➢ click the Review + Create button ➢ and then click Create.
  2. Once provisioning is complete, navigate to the Microsoft Purview Overview blade ➢ select the Open link to open the Microsoft Purview Governance Portal ➢ select the Data Map hub ➢ select Collections ➢ and then select the Role Assignments tab, as shown in Figure 8.7. Make sure your account is within the Collection Admins group; if not, add it.

FIGURE 8.7 Microsoft Purview default root collection

  1. Click the + Add a Collection menu button ➢ enter Data Engineering in the Display Name text box ➢ enter your account into the Collection Admins group ➢ click the Create button ➢ and then select the root collection (for example, brainjammer) to do the same again, but this time enter R&D in the Display Name text box.
  2. Select the Sources navigation link ➢ click the Register menu button ➢ select the Azure Data Lake Storage Gen2 resource ➢ click Continue ➢ enter a name (I used ADLS‐csharpguitar) ➢ select the subscription that contains the ADLS container you created in Exercise 3.1 ➢ select the storage account name ➢ select Data Engineering from the Select a Collection drop‐down list ➢ click the Register button ➢ click the Register button again ➢ select Azure Synapse Analytics ➢ click Continue ➢ enter a name (I used ASA‐csharpguitar) ➢ select the subscription that contains the Azure Synapse Analytics workspace you created in Exercise 3.3 ➢ select R&D from the Select a Collection drop‐down list ➢ and then click Register. The result should resemble Figure 8.8.

FIGURE 8.8 Microsoft Purview Map view

  1. Navigate to the Azure Key Vault you created in Exercise 8.1 ➢ select Access Policies ➢ click the + Create menu button ➢ check the Get and List operations in the Secret permissions / Secret Management Operations section ➢ click Next ➢ search for and select the Microsoft Purview account name you just provisioned ➢ click the Next button twice ➢ and then click Create.

The additional registration of the Power BI workspace and an Azure SQL database are for effect only at this point. Feel free to register additional or different resources to your collections. The provisioning of the account was straightforward. You were again confronted by the concept of a Managed Resource group, which you experienced in Exercise 3.3. As a reminder, this resource group contains Azure products required by the provisioned resource. In this case, an Azure storage account is required and was stored in the provisioned Managed Resource group. In Exercise 8.2 you configured two collections, Data Engineering and R&D. The Data Engineering collection has the Power BI workspace and the ADLS container associated with it, while the R&D collection has the Azure Synapse Analytics workspace and an Azure SQL database. The structure of the collection hierarchy and associated sources provides some context to approach the policies, compliance, and governance constraints placed on them. Sorting together which datastores are necessary per collection provides you the means for setting policies on those resources based on the individuals affiliated with those groups. You did not perform the activity of scanning in Exercise 8.2 because of the requirement of managed identities. This will be discussed in the “Implement a Data Auditing Strategy” section.

Create an Azure Key Vault Resource – Keeping Data Safe and Secure-1

  1. Log in to the Azure portal at https://portal.azure.com ➢ click the menu button on the upper left of the browser ➢ click + Create a Resource ➢ select Security from the Categories section ➢ select Key Vault ➢ select the subscription ➢ select the resource group ➢ enter a key vault name ➢ select a region ➢ and then select a pricing tier (I used Standard). Leave the remaining options as the defaults.
  2. Click the Next button ➢ leave the defaults on the Access Policy tab ➢ select the check box next to your user identity in the Access Policies section ➢ click the Edit button ➢ observe the default Key, Secret, and Certificate default permissions ➢ click Next ➢ leave the defaults on the Networking tab ➢ click the Review + Create button ➢ and then click Create.
  3. Once the key vault is provisioned, navigate to it ➢ select Keys from the navigation menu ➢ select the + Generate/import menu link ➢ and then enter a name (I used brainjammerKey). The configuration should resemble Figure 8.2. The Elliptic Curve Name radio buttons show the available algorithms.

FIGURE 8.2 Creating an Azure Key Vault key

  1. Click the Create button ➢ select the Secrets navigation item ➢ click the + Generate/Import menu option ➢ enter a name (I used azureSynapseSQLPool) ➢ and then enter a secret value (I used the password of my Azure Synapse Analytics dedicated SQL pool). The configuration should resemble Figure 8.3.

FIGURE 8.3 Creating an Azure Key Vault secret

  1. Click the Create button ➢ select the Certificates navigation item ➢ click the + Generate/Import menu option ➢ enter a certificate name (I used brainjammerCertificate) ➢ and then enter a subject value (I used “CN=brainjammer.net”). The configuration should resemble Figure 8.4.

FIGURE 8.4 Creating an Azure Key Vault certificate

  1. Click Create.

Exercise 8.1 is straightforward in that you should recognize most of the options and understand what they mean. A few features and concepts, however, are worthy of discussion. On the Access Policy tab, you likely noticed the option to manage access by either Key Vault access policy or Azure role‐based access control (RBAC). The Key Vault access policy enables you to grant service principals, users, applications, or user groups access to specific operations on the keys, secrets, and certificates hosted in the key vault—for example, those shown in Figure 8.5. Figure 8.5 is similar to what you saw in step 2 of Exercise 8.1 when viewing the default permissions.

FIGURE 8.5 Vault access policy operations

With the RBAC approach, you grant a user or group access to the key vault using a role. There are numerous built‐in key vault roles, such as Key Vault Administrator, Key Vault Reader, and Key Vault Secrets User. If any of the built‐in roles do not meet your requirements, you can create a custom role, using a JSON document similar to the following:

Design Row‐Level and Column‐Level Security – Keeping Data Safe and Secure

In a relational database, a table is made up of rows of data. Each row can have many columns. Once your data has been ingested and transformed and is ready for consumption, you may need to apply some additional security at the row or column level. Row‐level security (RLS) is very similar to a filter commonly implemented using the WHERE clause. This works fine as long as no one has direct access to the data and can run queries that circumvent this level of projection. In this case, if you need to restrict access on a row level, and you have clients that connect directly to your database, then you need to apply RLS. Implementing RLS requires what is called a FILTER PREDICATE, which is applied using theCREATE SECURITY POLICY statement. Consider a scenario where you have implemented a global brain wave repository that allows anyone to upload their brain wave readings to your datastore for processing and data analytics. Consider taking it one step further and provide the option for those individuals to perform analytics on their scenarios. As the platform owner, you would want to see all readings from all those who have uploaded data; however, you would want to restrict individuals’ access to only their data. RLS is a means for achieving just that. Consider the following SQL statement:
 CREATE SECURITY POLICY BrainwavesFilter_ext
 ADD FILTER PREDICATE Security.fn_securitypredicate(brainwaveProducer)
 ON dbo.Brainwaves_ext
 WITH (STATE = ON);

The statement creates a policy named BrainwavesFilter_ext and adds a predicate based on a value stored on a table named Brainwave_ext. The value in column brainwaveProducer is the user account ID of the person who uploaded brain waves. When CRUD queries are executed against the Brainwave_ext table, the policy uses the predicate to filter, change, remove, insert, and/or deliver the data for the user who uploaded it.

Another feature that you can apply to the data is called column‐level security. Recall from the SUBJECTS table a few columns are worthy of a Confidential sensitivity level. Columns like USERNAME, ZIPCODE, EMAIL, and BIRTHDATE contain data that should not be accessible to the public or generally available. If you have a user in your Azure Synapse Analytics dedicated SQL pool named brainjammer and do not want that user to have access to these columns, you can execute the following command to exclude the columns from the list:

 GRANT SELECT ON SUBJECTS
   (ID, FIRSTNAME, LASTNAME, EMAIL, COUNTRY, CREATE_DATE) TO brainjammer;

If that user then attempts the following SELECT statement, an error would be rendered stating that access is denied:

 SELECT * FROM SUBJECTS

In the Implement Row‐Level and Column‐Level Security section you will perform an exercise and experience column‐level security firsthand.