Design a Data Masking Strategy – Keeping Data Safe and Secure
A mask is an object that partially conceals what is behind it. From a data perspective, a mask would conceal a particular piece of the data but not all of it. Consider, for example, email addresses, names, credit card numbers, and telephone numbers. Those classifications of data can be helpful if there is ever a need to validate a person’s identity. However, you would not want all of the data rendered in a query; instead, you can show only the last four digits of the credit card number or the first letter of an email address and the top level domain value like .com, .net, or .org, like the following:
There is a built‐in capability for this masking in the Azure portal related to an Azure Synapse Analytics dedicated SQL pool. As shown in Figure 8.13 navigating to the Dynamic Data Masking blade renders masking capabilities.
The feature will automatically scan your tables and find columns that may contain data that would benefit from masking. You apply the mask by selecting the Add Mask button, selecting the mask, and saving it. Then, when a user who is not in the excluded list, as shown in Figure 8.13, accesses the data, the mask is applied to the resulting dataset. Finally, the primary objective of masking is to conceal enough of the data in a column so that it can be used but not exploited. That partial data visibility demonstrates the difference between masking and encryption. When data in a column is encrypted, none of it is readable, whereas a mask can be configured to allow partial data recognition.
FIGURE 8.13 Dynamic Data Masking dedicated SQL pool
Design Access Control for Azure Data Lake Storage Gen2
There are four authorization methods for Azure storage accounts. The method you have been using in most scenarios up to now has been through access keys. An access key resembles the following:
4jRwk0Ho7LB+si85ax…yuZP+AKrr1FbWbQ==
The access key is used in combination with the protocol and storage account name to build the connection string. Clients can then use this to access the storage account and the data within it. The connections string resembles the following:
DefaultEndpointsProtocol=https;AccountName=<name>;AccountKey=<account-key>
On numerous occasions you have created linked services in the Azure Synapse Analytics workspace. When configuring a linked service for an Azure storage account, you might remember seeing that shown in Figure 8.14, which requests the information required to build the Azure storage account connection string.
FIGURE 8.14 ADLS access control access keys
Notice that the options request the authentication type, which is set to access key, sometimes also referred to as an account key, to be used as part of a connection string, followed by the storage account name and the storage account key. Those values are enough for the client—in this case, an Azure Synapse Analytics linked service—to successfully make the connection to the storage account. HTTPS is used as default, which enforces data encryption‐in‐transit; therefore, an authentication type is not requested. Another authorization method similar to access keys is called shared access signature (SAS) authorization. This authorization method gives you a bit more control over what services, resources, and actions a client can access on the data stored in the account. Figure 8.15 shows the Shared Access Signature blade in the Azure portal for Azure storage accounts.
When you use either an access key or a SAS URL, any client with that token will get access to your storage account. There is no identity associated with either of those authorization methods; therefore, protecting that token key is very important. This is a reason that offering the retrieval of the account key from an Azure key vault is also an option, as you saw in Figure 8.14. Storing the access key and/or the SAS URL in an Azure key vault would remove the need to store the key within the realm of an Azure Synapse Analytics workspace. Although this is safe, reducing the number of clients who have possession of your authorization keys is a good design. Any entity that needs these keys can be granted access to the Azure key vault and the keys for making the connection to your storage account. The other two remaining authorization methods are RBAC and ACL, which are covered in the following sections. As an introduction to those sections, Table 8.2 provides some details about both the Azure RBAC and ACL authorization methods.
TABLE 8.2 Azure storage account authorization methods
Method | Scope | Require an identity | Granularity level |
Azure RBAC | Storage account, container | Yes | High |
ACL | File, directory | Yes | Low |
FIGURE 8.15 ADLS Access control shared access signature
Authorization placed upon an Azure storage account using an RBAC is achieved using a role assignment at the storage account or container level. ACLs are implemented by assigning read, write, or delete permissions on a file or directory. As shown in Figure 8.16, if the identity of the person or service performing the operation is associated with an RBAC group with the assignment allowing file deletion, then that access is granted, regardless of the ACL permission.
However, if the identity associated with the group that is assigned RBAC permissions does not have the authorization to perform a delete but does have the ACL permission, then the file can be deleted. The following sections describe these authentication methods in more detail.