Category: Azure Role‐Based Access Control

Implement Data Masking – Keeping Data Safe and Secure

  1. Log in to the Azure portal at https://portal.azure.com ➢ navigate to the Azure Synapse Analytics workspace you created in Exercise 3.3 ➢ select the SQL Pools navigation menu link ➢ select your dedicated SQL pool ➢ start the SQL pool ➢ after the SQL pool is running, select the Dynamic Data Masking blade ➢ select the + Add Mask menu item ➢ make the configurations as shown in Figure 8.39 ➢ and then click the Add button.

FIGURE 8.39 Implement data masking and masking rule.

  1. Click the Save button ➢ log in to the dedicated SQL pool using Azure Data Studio, with the credentials created in Exercise 8.7 ➢ and then execute the following:
    SELECT ID, FIRSTNAME, LASTNAME, EMAIL, COUNTRY, CREATE_DATE FROM dbo.SUBJECTS
  2. Notice that the data in the EMAIL column has the mask configured in step 1 and illustrated in Figure 8.39. Stop the dedicated SQL pool.

It is possible to configure a custom mask instead of using the preconfigured masks. This requires that you provide the number of starting characters to show, followed by the padding string (something like xxxxx.), followed by the number of ending characters to display. Using a prefix and suffix value of three and the padding on the EMAIL column would result in benxxxxx.net, for example, which is a bit more useful than what is provided using the default.

Manage Identities, Keys, and Secrets Across Different Data Platform Technologies

Protecting credentials has historically been very challenging. Developers and data engineers need to access data, and that data is protected by an ID and password. As the complexity and size of your organization grows, it is easy to lose control over who has what credentials. Add that loss of control to the potential impact changing a password can have on a production environment. This scenario is commonly referred to as credential leakage. An initial solution to credential leakage was to store connection details in a managed credential store, something like Azure Key Vault. However, access to the credential store also requires credentials, so you are back in the same place as before the implementation of the credential store. The ultimate solution is to use a combination of Azure Key Vault and managed identities. Instead of using a credential to make a connection to a storage account or a database from application code, you instead reference the Azure Key Vault endpoint. An Azure Key Vault secret endpoint resembles the following:

https://<accountName>.vault.azure.net/secrets/<secretName>/5db1a9b5…

The code that uses that endpoint must implement the DefaultAzureCredential class from the Azure Identity library. The library works with all popular programming languages: .NET, Python, Go, Java, etc. Passing a new DefaultAzureCredential class to the SecretClient class results in the acquisition of the managed identity credential, which is a token. The client then stores all necessary attributes to perform the retrieval of a secret from the Azure Key Vault endpoint. The following C# code performs this activity:

 var kvUri = “https://” + accountName + “.vault.azure.net”;
 var client = new SecretClient(new Uri(kvUri), new DefaultAzureCredential());

You can use the client to get a secret by using the following C# syntax:

 var secret = await client.GetSecretAsync(secretName);

Now you know how a managed identity can avoid credential leakage, but you might be wondering what exactly are managed identities and what important aspects must you know in order to implement them safely and securely? Table 8.3 compares the two types of managed identities: system‐assigned and user‐assigned.

TABLE 8.3 Managed identity types

CharacteristicSystem‐assigned managed identityUser‐assigned managed identity
ProvisioningAzure resources receive an identity by default, where supported.Created manually
RemovalThe identity is deleted when the associated Azure resource is deleted.Deleted manually
SharingThe identity cannot be shared among Azure resources.Can be shared

A system‐assigned managed identity is created during the provisioning of the Azure resource. For example, an Azure Synapse Analytics workspace and a Microsoft Purview account both have a system‐assigned identity by default. Azure products that are generally used to make connections to other Azure products or features have this managed identity created by default. In contrast, an Azure storage account receives data but does not commonly push data out to other systems, that would need an identity to do so. This is why you see no managed identities for Azure storage accounts. A system‐assigned managed identity can be used only by the Azure resource to which it is bound, and it is deleted when the Azure resource is deleted. A user‐assigned managed identity can be shared across Azure products and is a separate resource in itself and can have its own lifecycle. Perform Exercise 8.9, where you create a user‐assigned managed identity.

Configure and Perform a Data Asset Scan Using Microsoft Purview – Keeping Data Safe and Secure-2

If you have not created the SUBJECTS table on your dedicated SQL pool, create the table using the SUBJECTS.sql file located in the Chapter08 directory on GitHub.

The first action you took after accessing the Azure portal was to add the Microsoft Purview account identity to the Reader role of the Azure Synapse Analytics workspace. Note that adding this role assignment at the workspace level results in the Reader permissions being granted to all resources that exist in the workspace. This is a good level of access for Microsoft Purview to perform proper governance and auditing activities. It is also possible to provide this level of access specifically to a SQL or Spark pool using the same approach via the Access control (IAM) role assignments feature while those analytics pools are in focus. Next, you navigated to the Manage hub on the Azure Synapse Analytics workspace and bound the Microsoft Purview account together with the workspace. This provided easy access to the Microsoft Purview Governance portal.

 Until you configure the new credential, as shown in Figure 8.21, you may receive a Failed to load serverless databases from Synapse workspace error message. Once you select the new credential (for example, sqladminuser), the error will go away. In this example, the username and password are the same for both the serverless and dedicated SQL pools.

Once in the Microsoft Purview Governance portal, you registered a collection named ASA‐csharpguitar into the R&D parent collection. After the collection that targeted your Azure Synapse Analytics workspace was completed, you began with the configuration of an asset scan. A credential that can access both the serverless and dedicated SQL pool is required at this point. Selecting the + New item from the Credential drop‐down list box provided the option to do this. You added a connection to the Azure Key Vault connection that targets the secret created in Exercise 8.1. The secret contains the password of your dedicated SQL pool, which is, in this example, the same as the built‐in serverless database SQL pool. Once configured and selected from the Credential drop‐down list box, you were able to select the dedicated SQL pool as the target data source of the scan.

When you selected to use the System Default scan rule set, you chose to use all the supported classification rules. While configuring the scan, you might have noticed the View Details link below that value. Clicking the View Details link currently renders a list of 208 classification rules grouped together with names such as Government, Financial, Base, Personal, Security, and Miscellaneous. You also have the option to create a custom rule that allows you to include your own additional set of items to scan for. The Security scan checks for passwords that match common patterns; the Government scan checks for values that match an ID; and the Personal scan checks for birth dates, email addresses, and phone numbers, for example. If you didn’t look at that, go back and check it out for the full set of attributes that are searched for when running an asset scan. The next window gives you the option to schedule the audit scan weekly or monthly. In a live scenario, where you have a lot of activity on your data sources, this would be a good idea. Lastly, you ran the scan, viewed the results shown in Figure 8.22, and then stopped the dedicated SQL pool. In Exercise 8.5 you will use those results to classify and curate the data existing in the SUBJECTS table.

Azure Synapse Analytics includes an Auditing feature for dedicated SQL pools. Complete Exercise 8.4 to configure and implement Auditing on an Azure Synapse Analytics dedicated SQL pool.

Configure and Perform a Data Asset Scan Using Microsoft Purview – Keeping Data Safe and Secure-1

  1. Log in to the Azure portal at https://portal.azure.com ➢ navigate to the Azure Synapse Analytics workspace you created in Exercise 3.3 ➢ select the Access Control (IAM) navigation menu item ➢ click the + Add menu button ➢ select Add Role Assignment from the drop‐down list ➢ select Reader from the Role list ➢ click Next ➢ select the Managed Identity radio button ➢ click the + Select member link ➢ select Microsoft Purview Account from the Managed Identity drop‐down list box ➢ select the Microsoft Purview account you created in Exercise 8.2 ➢ click the Select button ➢ click the Review + Assign button ➢ navigate to the Overview blade ➢ click the Open link in the Open Synapse Studio tile ➢ select the Manage hub ➢ select SQL Pools from the menu list ➢ and then start the dedicated SQL pool.
  2. Select the Manage hub ➢ select the Microsoft Purview item in the External Connections section ➢ click the Connect to a Purview Account button ➢ select the Purview account created in Exercise 8.2 (for example, brainjammer) ➢ click Apply ➢ and then select the link to your Microsoft Purview account tab, as shown in Figure 8.20.

FIGURE 8.20 Connecting Microsoft Purview to Azure Synapse Analytics workspace

  1. Select the Data Map hub ➢ select Sources from the navigation menu ➢ select the Register menu item ➢ select Azure Synapse Analytics from the Register Source window ➢ click Continue ➢ enter a name (I used ASA‐csharpguitar) ➢ select the workspace you configured in step 2 from the Workspace Name drop‐down list box ➢ select the R&D collection from the Select a Collection drop‐down list box ➢ and then click Register.
  2. Select the View Details link on the just registered source in the Map view ➢ select the New Scan menu item ➢ enter a name (I used ScanDedicatedSQLPool) ➢ select + New from the Credential drop‐down list box ➢ enter a name (I used sqladminuser) ➢ enter the user ID/name of your Azure Synapse Analytics dedicated SQL pool (I used sqladminuser) ➢ select the + New from the Key Vault Connection drop‐down list box ➢ enter a name (I used brainjammerKV) ➢ select the Key Vault you created in Exercise 8.1 ➢ click Create ➢ enter the Azure Key Vault secret name that stores your Azure Synapse Analytics password (I used azureSynapseSQLPool) ➢ click Create ➢ select your dedicated SQL pool from the SQL Database drop‐down list box ➢ select the Test Connection link ➢ and then click Continue. The configuration resembles Figure 8.21.

FIGURE 8.21 Configuring scanning in Microsoft Purview

  1. Click the Continue button to perform the scan using the default scan rule set ➢ select the Once radio button ➢ click Continue ➢ and then click the Save and Run button. When the scan is complete, you will see something like Figure 8.22.

FIGURE 8.22 The result of a Microsoft Purview scan

  1. Stop your Azure Synapse Analytics dedicated SQL pool.

Implement Column‐level Security – Keeping Data Safe and Secure

  1. Log in to the Azure portal at https://portal.azure.com ➢ navigate to the Azure Synapse Analytics workspace you created in Exercise 3.3 ➢ select the SQL Pools navigation menu link ➢ select your dedicated SQL pool ➢ start the SQL pool ➢ after the SQL pool is running, select the Overview blade ➢ and then click the Open link in the Open Synapse Studio tile.
  2. Navigate to the Data hub ➢ select the ellipse (…) to the right of your dedicated SQL pool ➢ select New SQL Script from the pop‐up menu ➢ select Empty Script ➢ select master from the Use Database drop‐down list box ➢ and then execute the following:
    CREATE LOGIN brainjammer WITH PASSWORD = ‘csh@rpgu1tar’
  3. Select your dedicated SQL pool from the Use Database drop‐down list box, and then execute the following. The SQL is located in the column‐level.sql file in the Chapter08/Ch08Ex07 directory on GitHub.
    CREATE USER brainjammer FROM LOGIN brainjammer WITH DEFAULT_SCHEMA = dbo
    EXEC sp_addrolemember ‘db_datareader’, ‘brainjammer’
    DENY SELECT ON dbo.SUBJECTS TO brainjammer
    GRANT SELECT ON SUBJECTS
    (ID, FIRSTNAME, LASTNAME, EMAIL, COUNTRY, CREATE_DATE) TO brainjammer
  4. Log in to the dedicated SQL pool using Azure Data Studio; for example, executing the following with the user created in step 2 will give you the error shown in Figure 8.37.
    SELECT * FROM [dbo].[SUBJECTS]

FIGURE 8.37 Column‐level security enforcement exception

  1. Stop the dedicated SQL pool.

Before you can create a user for a specific database, the login must first exist on the master database. Executing the following SQL command while targeting the master database hosted on your dedicated SQL pool accomplishes this:
CREATE LOGIN brainjammer WITH PASSWORD = ‘csh@rpgu1tar’

Redirecting focus back to the SQLPool database, you then create a user who uses the login credentials created on the master database. You accomplished this with the following SQL statement. In addition to the user account creation, the default schema is set to dbo, which is where the SUBJECTS table is located.
CREATE USER brainjammer FROM LOGIN brainjammer WITH DEFAULT_SCHEMA = dbo

Adding the user account to a read‐only group, as follows, provides read access permissions to all the tables on that database:
EXEC sp_addrolemember ‘db_datareader’, ‘brainjammer’

To view all roles and the members of those roles, execute the following SQL command. This SQL text is located in the databaseRole.sql file in the Chapter08/Ch08Ex07 directory on GitHub. Figure 8.38 shows the result of the query.

FIGURE 8.38 Role and membership details on a dedicated SQL pool database

When the user is granted read access to the tables, it defaults to all the columns on that table. In order to remove access to all the columns for the SUBJECTS table, you first execute the following SQL statement, which removes SELECT access to the table:
DENY SELECT ON dbo.SUBJECTS TO brainjammer

Then you use the following SQL command to grant access only to the columns that the user should be permitted to retrieve:
GRANT SELECT ON SUBJECTS
(ID, FIRSTNAME, LASTNAME, EMAIL, COUNTRY, CREATE_DATE) TO brainjammer

From that point on, when the user, brainjammer, accesses the SQLPool database and retrieves data from the SUBJECTS table, only the columns that have been granted permission can be retrieved. The following is the SQL query, followed by the output:
SELECT ID, FIRSTNAME, LASTNAME, EMAIL, COUNTRY, CREATE_DATE FROM dbo.SUBJECTS

Always shut down the dedicated SQL pool when not in use, to reduce costs.

Implement Data Masking

The primary difference between column‐level security and dynamic data masking is that column‐level security prevents the retrieval of the column completely. For example, if a user does not have permission on a column named BIRTHDATE and they attempt to retrieve it, a permission denied exception is rendered. Data masking, however, does allow the retrieval of the column data, although the data is partially concealed. Consider, for example, the EMAIL column from Exercise 8.7, which contains an individual’s email address. This is not data you would want everyone to have access to, for numerous reasons. For example, someone writes a program that is intended to notify your customers of downtime or outage events. Consider that there are hundreds of thousands of customers in your database. During testing, the data is selected and emails sent to all your customers. Placing a mask on the EMAIL column would change the retrieved value from [email protected] to [email protected]. Doing so would prevent sending all your customers an email on error. At the same time, however, you might want to see part of the column value for validation, perhaps the last four digits of a credit card number or identification number. Complete Exercise 8.8, where you will configure a dynamic data mask on the EMAIL column of the SUBJECTS table.

Design a Data Masking Strategy – Keeping Data Safe and Secure

A mask is an object that partially conceals what is behind it. From a data perspective, a mask would conceal a particular piece of the data but not all of it. Consider, for example, email addresses, names, credit card numbers, and telephone numbers. Those classifications of data can be helpful if there is ever a need to validate a person’s identity. However, you would not want all of the data rendered in a query; instead, you can show only the last four digits of the credit card number or the first letter of an email address and the top level domain value like .com, .net, or .org, like the following:

 [email protected]

There is a built‐in capability for this masking in the Azure portal related to an Azure Synapse Analytics dedicated SQL pool. As shown in Figure 8.13 navigating to the Dynamic Data Masking blade renders masking capabilities.

The feature will automatically scan your tables and find columns that may contain data that would benefit from masking. You apply the mask by selecting the Add Mask button, selecting the mask, and saving it. Then, when a user who is not in the excluded list, as shown in Figure 8.13, accesses the data, the mask is applied to the resulting dataset. Finally, the primary objective of masking is to conceal enough of the data in a column so that it can be used but not exploited. That partial data visibility demonstrates the difference between masking and encryption. When data in a column is encrypted, none of it is readable, whereas a mask can be configured to allow partial data recognition.

FIGURE 8.13 Dynamic Data Masking dedicated SQL pool

Design Access Control for Azure Data Lake Storage Gen2

There are four authorization methods for Azure storage accounts. The method you have been using in most scenarios up to now has been through access keys. An access key resembles the following:

 4jRwk0Ho7LB+si85ax…yuZP+AKrr1FbWbQ==

The access key is used in combination with the protocol and storage account name to build the connection string. Clients can then use this to access the storage account and the data within it. The connections string resembles the following:

 DefaultEndpointsProtocol=https;AccountName=<name>;AccountKey=<account-key>

On numerous occasions you have created linked services in the Azure Synapse Analytics workspace. When configuring a linked service for an Azure storage account, you might remember seeing that shown in Figure 8.14, which requests the information required to build the Azure storage account connection string.

FIGURE 8.14 ADLS access control access keys

Notice that the options request the authentication type, which is set to access key, sometimes also referred to as an account key, to be used as part of a connection string, followed by the storage account name and the storage account key. Those values are enough for the client—in this case, an Azure Synapse Analytics linked service—to successfully make the connection to the storage account. HTTPS is used as default, which enforces data encryption‐in‐transit; therefore, an authentication type is not requested. Another authorization method similar to access keys is called shared access signature (SAS) authorization. This authorization method gives you a bit more control over what services, resources, and actions a client can access on the data stored in the account. Figure 8.15 shows the Shared Access Signature blade in the Azure portal for Azure storage accounts.

When you use either an access key or a SAS URL, any client with that token will get access to your storage account. There is no identity associated with either of those authorization methods; therefore, protecting that token key is very important. This is a reason that offering the retrieval of the account key from an Azure key vault is also an option, as you saw in Figure 8.14. Storing the access key and/or the SAS URL in an Azure key vault would remove the need to store the key within the realm of an Azure Synapse Analytics workspace. Although this is safe, reducing the number of clients who have possession of your authorization keys is a good design. Any entity that needs these keys can be granted access to the Azure key vault and the keys for making the connection to your storage account. The other two remaining authorization methods are RBAC and ACL, which are covered in the following sections. As an introduction to those sections, Table 8.2 provides some details about both the Azure RBAC and ACL authorization methods.

TABLE 8.2 Azure storage account authorization methods

MethodScopeRequire an identityGranularity level
Azure RBACStorage account, containerYesHigh
ACLFile, directoryYesLow

FIGURE 8.15 ADLS Access control shared access signature

Authorization placed upon an Azure storage account using an RBAC is achieved using a role assignment at the storage account or container level. ACLs are implemented by assigning read, write, or delete permissions on a file or directory. As shown in Figure 8.16, if the identity of the person or service performing the operation is associated with an RBAC group with the assignment allowing file deletion, then that access is granted, regardless of the ACL permission.

However, if the identity associated with the group that is assigned RBAC permissions does not have the authorization to perform a delete but does have the ACL permission, then the file can be deleted. The following sections describe these authentication methods in more detail.

Optimize Pipelines for Analytical or Transactional Purposes – Design and Implement a Data Stream Processing Solution

There are numerous approaches for optimizing data stream pipelines. Two such approaches are parallelization and compute resource management. You have learned that setting a partition key in the date message results in the splitting of messages across multiple nodes. By doing this your data streams are processed in parallel. You can see how this looks in Figure 7.31 and the discussion around it. Managing the compute resources available for processing the data stream will have a big impact on the speed and availability of your data to downstream consumers. As shown in Figure 7.48, the increase in the watermark delay was caused by the CPU utilization on the processor nodes reaching 100 percent.

FIGURE 7.48 Azure Stream Analytics job metrics, CPU at 99 percent utilization

When the CPU is under such pressure, event messages get queued, which causes a delay in processing and providing output. Increasing the number of SUs allocated to the Azure Stream Analytics job will have a positive impact on the pipeline, making it more optimal for analytical and transactional purposes.

Scale Resources

As mentioned in the previous section, adding more resources to the Azure Stream Analytics Job will result in faster processing of the data stream. This assumes that you have noticed some processing delays and see that the current level of allocated resources is not sufficient. To increase the amount of compute power allocated to an Azure Stream Analytics job, select the Scale navigation link for the given job, as shown in Figure 7.49.

FIGURE 7.49 Azure Stream Analytics job scaling

Once compute power is allocated, when you start the job, instead of seeing 3, as shown in Figure 7.18, you will see the number described in the Manual Scale Streaming Unit configuration. In this case, the number of SUs would be 60.

Design a Data Auditing Strategy – Keeping Data Safe and Secure

When you take an audit of something, it means that you analyze it and gather data about the data from the results. Many times the findings result in actions necessary to resolve inefficient or incorrect scenarios. What you analyze, what you are looking for, and what kind of analysis you need to perform are based on the requirements of the object being audited. The data management perspective (refer to Figure 5.41) includes disciplines such as quality, governance, and security. Each of those are good examples of scenarios to approach when creating a data auditing strategy. From a data quality perspective, you have been exposed to cleansing, deduplicating, and handling missing data using the MAR, MCAR, and MNAR principles, as discussed in Chapter 5, “Transform, Manage, and Prepare Data,” and Chapter 6, “Create and Manage Batch Processing and Pipelines.” This chapter focuses on the governance and security of data and how you can learn to design and implement strategies around those topics.

Governance encompasses a wide range of scenarios. You can optimize the scope of governance by identifying what is important to you, your business, and your customers. The necessary aspects of data governance include maintaining an inventory of data storage, enforcing policies, and knowing who is accessing what data and how often. The Azure platform provides products to achieve these aspects of data governance (refer to Figure 1.10). Microsoft Purview, for example, is used to discover and catalog your cloud‐based and estate‐based data estate. Azure Policy provides administrators the ability to control who and how cloud resources are provisioned, with Azure Blueprints helping to enforce that compliance. Compliance is a significant area of focus concerning data privacy, especially when it comes to PII, its physical location, and how long it can be persisted before purging. In addition to those products, you can find auditing capabilities built into products like Azure Synapse Analytics and Azure Databricks. When auditing is enabled on those two products specifically, failed and successful login attempts, SQL queries, and stored procedures are logged by default. The audit logs are stored into Log Analytics workspace for analysis, and alerts can be configured in Azure Monitor when certain behaviors or activities are recognized. Auditing is applied across the entire workspace, when enabled, and can be extended to log any action performed that affects the workspace.

Microsoft Azure provides policy guidelines for many compliance standards, including ISO, GDPR, PCI DSS, SOX, HIPPA, and FISMA, to name just a few of the most common standards. From a security perspective, you have seen the layered approach (refer to Figure 8.1) and have learned about some of the information protection layer features, with details about other layers coming later. Data sensitivity levels, RBAC, data encryption, Log Analytics, and Azure Monitor are all tools for protecting, securing, and monitoring your data hosted on the Azure platform.

Microsoft Purview

Microsoft Purview is especially useful for automatically discovering, classifying, and mapping your data estate. You can use it to catalog your data across multiple cloud providers and on‐premises datastores. You can also use it to discover, monitor, and enforce policies, and classify sensitive data types. Purview consists of four components: a data map, a data catalog, data estate insights, and data sharing. A data map graphically displays your datastores along with their relationships across your data estate. A data catalog provides the means for browsing your data assets, which is helpful with data discovery and classification. Data estate insights present an overview of all your data resources and are helpful for discovering where your data is and what kind of data you have. Finally, data sharing provides the necessary features to securely share your data internally and with business customers. To get some hands‐on experience with Microsoft Purview, complete Exercise 8.2, where you will provision a Microsoft Purview account.

Design to Purge Data Based on Business Requirements – Keeping Data Safe and Secure

The primary difference between purging and deleting has to do with whether or not the data is gone for good. When you purge data, it means there is no way to recover it. If something called a soft delete is enabled, it means that the data can be recovered during a preconfigured timeframe. After that timeframe, the data will be purged. Soft‐deleted data continues to consume storage space in your database or on your datastore, like an Azure storage container. The storage consumption is only freed when the data is purged. Like all scenarios related to retention and data deletion discussed up to now, you need to first decide which data has a sensitivity level that must adhere to a retention policy. Once you determine which data must be deleted, you need to determine at what age the data should be removed. After identifying those two pieces of information, you might consider deleting the data from your database using the DELETE SQL command. The following command removes all the data from the SUBJECTS table where the CREATE_DATE value is 3 months old from the current date:

 DELETE FROM SUBJECTS WHERE CREATE_DATE < DATEADD(month, -3, GETDATE())

When the amount of data is large, this kind of query can have a significant impact on performance. The impact can result in latency experienced by other data clients inserting, updating, or reading data from the same database. A very fast procedure for removing data is to place the data onto a partition that is defined by the column that defines the lifecycle of the data, for example, using the CREATE_DATE in the SUBJECTS table as the basis for a partition. When the data on that partition has breached the retention threshold, remove the partition, and the data is removed. Another approach is to select the data you want to keep, use the result to insert it into another table, and then switch the tables. This is achieved using CTAS, which was introduced in Chapter 2, “CREATE DATABASE dbName; GO,” along with the partitioning concept mentioned previously. The following SQL snippet is an example of how to achieve the purging of data without using the DELETE SQL command:

 SELECT * INTO SUBJECTS_NEW FROM SUBJECTS 
 WHERE CREATE_DATE> DATEADD(month, -3, GETDATE())
 RENAME OBJECT SUBJECTS TO SUBJECTS_OLD
 RENAME OBJECT SUBJECTS_NEW TO SUBJECTS
 DROP TABLE SUBJECTS_OLD

The SELECT statement retrieves the data with a creation date that is not older than 3 months and places the data into a new table. The existing primary table named SUBJECTS is renamed by appending _OLD to the end. Then the newly populated table that was appended with _NEW is renamed to the primary table name of SUBJECTS. Lastly, the table containing data that is older than 3 months is dropped, resulting in its deletion.

Azure Role‐Based Access Control – Keeping Data Safe and Secure

RBAC has been discussed in Chapter 1, “Gaining the Azure Data Engineer Associate Certification,” Chapter 3, “Data Sources and Ingestion,” and Chapter 5. What RBAC is, its purpose, and how it achieves controlling access to Azure resources should be somewhat clear at this point. This section will go into a bit more detail in the context of an Azure storage account that includes an ADLS container. The implementation of RBAC assignments for a specific resource is performed on the Access Control (IAM) blade for the given resource, as you have seen previously in Figure 1.28 and Figure 5.47. Figure 8.17 provides an example of the permission related to an Azure storage account.

FIGURE 8.16 RBAC and ACL permission evaluation

FIGURE 8.17 RBAC Access Control (IAM) Azure storage account

Notice that the Role Assignments tab is selected and that there are numerous roles and users within them. The roles highlighted are Owner, Storage Blob Data Contributor, and Storage Blob Data Reader. An account that is part of the Owner role has full access to manage all resources within the subscription. In addition to that, members with an Owner role can assign roles to other members. Notice that the Owner group does not grant access to the data layer. The user in the Owner group will not be able to read the data in the storage account with just the Owner permission, although it can look this way in the portal if the Account Key Authentication option is enabled, since the owner does have access to those keys. The permissions section of the JSON role resembles the following, which supports the preceding statement of having full access to all resources:

 “permissions”: [{ “actions”: [ “*” …]

The Storage Blob Data Contributor role concentrates specifically on the storage accounts in the subscription. The following permissions are granted for this role. Notice that members of this role can delete, read, and write blobs to all the containers within the storage account. You might notice the account of type app named csharpguitar. That is the identity linked to the Azure Synapse Analytics workspace and is the way in which it has been granted permission to the ADLS container used for the workspace.

 “permissions”: [{ “actions”: [
  “Microsoft.Storage/storageAccounts/blobServices/containers/delete”,
  “Microsoft.Storage/storageAccounts/blobServices/containers/read”,
  “Microsoft.Storage/storageAccounts/blobServices/containers/write”,
  “Microsoft.Storage/storageAccounts/blobServices/generateUserDelegationKey/action” …]

The ability to receive a SAS key for accessing data within the container is also granted to the Storage Blob Data Contributor role. This is achieved via the generateUserDelegationKey permission. The permissions for Storage Blob Data Reader role are as follows.

 “permissions”: [{ “actions”: [
  “Microsoft.Storage/storageAccounts/blobServices/containers/read”,
  “Microsoft.Storage/storageAccounts/blobServices/generateUserDelegationKey/action” …]

Looking at the permission details, it is easy to see the difference between the Storage Blob Data Contributor role and Storage Blob Data Reader role. There are no permissions for delete or write operations on any container in the storage account for the reader. As illustrated in more granular detail by Figure 8.18, when the member attempts to delete a blob, the platform checks the RBAC role. If the role has permission to delete the blob, then the operation is allowed. If the role does not have the delete permission but the member does have the ACL permission, then the operation will be performed. Without an ACL delete permission, the operation is denied due to access restriction.

FIGURE 8.18 RBAC role and ACL permission evaluation

Remember that if a built‐in RBAC role does not meet your requirements, you can create a custom RBAC role to include or exclude permissions. As shown in Figure 8.17, a group named BRAINJAMMER has been added to a custom RBAC role named Storage Account Custom. The group is an Azure Active Directory security group that contains members who are also part of the Azure Active Directory tenant. Those members receive the permissions associated with the RBAC custom permission list. As you know, adding individual members to RBAC roles at the resource level is the most inefficient approach. Instead, you should create groups that effectively describe the role, add members to it, and then add the group to the RBAC role. You will create this group in Exercise 8.12.

POSIX‐like Access Control Lists – Keeping Data Safe and Secure

There was an extensive discussion about ACLs in Chapter 1. Two concepts need to be summarized again: the types of ACLs and the permission levels. The two kinds of ACLs are Access and Default, and the three permission levels are Execute (X), Read (I), and Write (W). Access ACLs control the access to both directories and files, as shown in Table 8.2. Default ACLs are templates that determine access ACLs for child items created below a directory with applied access ACLs. Files do not have default ACLs, and changing the ACL of a parent does not affect the default or access ACLs of the child items. The feature to configure and manage ACLs is available on the Manage ACL blade for a given ADLS container, as shown in Figure 8.19.

FIGURE 8.19 The Manage ACL blade

Figure 8.10 shows two tabs. The Access Permissions tab is in focus. The Azure AD security group BRAINJAMMER has been granted Access‐Read and Access‐Execute permissions, which means the content within the ADLS container directory can be listed. Listing the contents of a directory requires both Read (R) and Execute (X) permissions. The individual who was added to the Storage Blob Data Reader RBAC group shown in Figure 8.17 has been granted Access‐Write and Access‐Execute. Write (W) and Execute (X) ACL permissions are required to create items in the targeted directory. The other tab, Default Permissions, is where you can configure permissions that will be applied to the child items created below the root directory, which is the one in focus, as shown under the Set and Manage Permissions For heading. Now that you have some insights into data security concepts, features, and products, continue to the next section, where you will implement some of what you just learned.

Implement Data Security

Until your security objectives and requirements are finalized, you should not proceed with any form of implementation. You need to first know specifically what your goal is before you begin taking action. If your business model requires that your data complies with industry regulations, the requirements to meet those rules must be part of your design. Remember that Microsoft Purview and Azure Policy are helpful tools for guiding you through regulatory compliance. Those tools are also helpful for discovering your data sources and determining which sensitivity levels they require. Those sensitivity levels provide guidance into the level of security to apply to the dataset. After completing those steps, use something like the layered security diagram shown in Figure 8.1 as a guide for implementing security. The following sections cover the information protection, access management, and network security layers. The threat protection layer, which includes features like anomaly activity detection and malware detection, is best designed and implemented by security professionals. Note, however, that Microsoft Defender for Cloud is specifically designed for securing, detecting, alerting, and responding to bad actors and malicious activities preventing them from doing harm.