Configure and Perform a Data Asset Scan Using Microsoft Purview – Keeping Data Safe and Secure-2
If you have not created the SUBJECTS table on your dedicated SQL pool, create the table using the SUBJECTS.sql file located in the Chapter08 directory on GitHub.
The first action you took after accessing the Azure portal was to add the Microsoft Purview account identity to the Reader role of the Azure Synapse Analytics workspace. Note that adding this role assignment at the workspace level results in the Reader permissions being granted to all resources that exist in the workspace. This is a good level of access for Microsoft Purview to perform proper governance and auditing activities. It is also possible to provide this level of access specifically to a SQL or Spark pool using the same approach via the Access control (IAM) role assignments feature while those analytics pools are in focus. Next, you navigated to the Manage hub on the Azure Synapse Analytics workspace and bound the Microsoft Purview account together with the workspace. This provided easy access to the Microsoft Purview Governance portal.
Until you configure the new credential, as shown in Figure 8.21, you may receive a Failed to load serverless databases from Synapse workspace error message. Once you select the new credential (for example, sqladminuser), the error will go away. In this example, the username and password are the same for both the serverless and dedicated SQL pools.
Once in the Microsoft Purview Governance portal, you registered a collection named ASA‐csharpguitar into the R&D parent collection. After the collection that targeted your Azure Synapse Analytics workspace was completed, you began with the configuration of an asset scan. A credential that can access both the serverless and dedicated SQL pool is required at this point. Selecting the + New item from the Credential drop‐down list box provided the option to do this. You added a connection to the Azure Key Vault connection that targets the secret created in Exercise 8.1. The secret contains the password of your dedicated SQL pool, which is, in this example, the same as the built‐in serverless database SQL pool. Once configured and selected from the Credential drop‐down list box, you were able to select the dedicated SQL pool as the target data source of the scan.
When you selected to use the System Default scan rule set, you chose to use all the supported classification rules. While configuring the scan, you might have noticed the View Details link below that value. Clicking the View Details link currently renders a list of 208 classification rules grouped together with names such as Government, Financial, Base, Personal, Security, and Miscellaneous. You also have the option to create a custom rule that allows you to include your own additional set of items to scan for. The Security scan checks for passwords that match common patterns; the Government scan checks for values that match an ID; and the Personal scan checks for birth dates, email addresses, and phone numbers, for example. If you didn’t look at that, go back and check it out for the full set of attributes that are searched for when running an asset scan. The next window gives you the option to schedule the audit scan weekly or monthly. In a live scenario, where you have a lot of activity on your data sources, this would be a good idea. Lastly, you ran the scan, viewed the results shown in Figure 8.22, and then stopped the dedicated SQL pool. In Exercise 8.5 you will use those results to classify and curate the data existing in the SUBJECTS table.
Azure Synapse Analytics includes an Auditing feature for dedicated SQL pools. Complete Exercise 8.4 to configure and implement Auditing on an Azure Synapse Analytics dedicated SQL pool.