Category: Microsoft Certifications

Create an Azure Key Vault Resource – Keeping Data Safe and Secure-2

The Networking tab enables you to configure a private endpoint or the binding to a virtual network (VNet). The provisioning and binding of Azure resources to a VNet and private endpoints are discussed and performed in a later section. After the provisioning the key vault, you created a key, secret, and a certificate. The keys in an Azure key vault refer to those used in asymmetric/public key cryptography. As shown in Figure 8.2, two types of keys are available with Standard tier: Rivest‐Shamir‐Adleman (RSA) and elliptic curve (EC). These keys are used to encrypt and decrypt data (i.e., public‐key cryptography). Both keys are asymmetric, which means they both have a private and public key. The private key must be protected, as any client can use the public key to encrypt, but only the private key can decrypt it. Each of these keys has multiple options for the strength of encryption. For example, RSA has 2,048, 3,072, and 4,096 bits. The higher the number, the higher the level of security. High RSA numbers have caveats concerning speed and compatibility. The higher level of encryption requires more time to decrypt, and not all platforms can comply with such high levels of encryption. Therefore, you need to consider which level of security is best for your use case and security compliance requirements.

A secret is something like a password, connection string, or any text up to 25k bytes that needs protection. Connection strings and passwords commonly are stored in configuration files or hard coded into application code. This is not secure, because anyone who has access to the code or the server hosting the configuration file has access to the credentials and therefore the resources protected by them. Instead of that approach, applications can be coded to retrieve the secret from a key vault, then use it for making the required connections. In a production environment, the secret can be authenticated against a managed identity or service principal when requested.

Secrets stored in a key vault are encrypted when added and decrypted when retrieved. The certificate support in Azure Key Vault provides the management of x509 certificates. If you have ever worked on an Internet application that uses the HTTP protocol, you have likely used an x509 certificate. When this type of certificate is applied to that protocol, it secures communication between the entities engaged in the conversation. To employ the certificate, you must use HTTPS, and this is commonly referred to as Transport Layer Security (TLS). In addition to securing communication over the Internet, certificates can also be used for authentication and for signing software. Consider the certificate you created in Exercise 8.1. When you click the certificate, you will notice it has a Certificate Version that resembles a GUID but without the dashes. Associated to that Certificate Version is a Certificate Identifier, which is a URL that gives you access to the certificate details. When you enter the following command using the Azure CLI, you will see information like the base‐64–encoded certificate, the link to the private key, and the link to the public key, similar to that shown in Figure 8.6.

FIGURE 8.6 Azure Key Vault x509 certificate details

The link to the private key and public keys can be used to retrieve the details in a similar fashion. The first Azure CLI cmdlet retrieves the private key of the x509 certificate, which is identified by the kid attribute. The second retrieves the public key using the endpoint identified by the sid attribute.

The ability to list, get, and use keys, secrets, and certificates is controlled by the permissions you set up while creating the key vault. Put some thought into who and what gets which kind of permissions to these resources.

Summary – Design and Implement a Data Stream Processing Solution

This chapter focused on the design and development of a stream processing solution. You learned about data stream producers, which are commonly IoT devices that send event messages to an ingestion endpoint hosted in the cloud. You learned about stream processing products that read, transform, and write the data stream to a location for consumers to access. The location of the data storage depends on whether the data insights are required in real time or near real time. Both scenarios flow through the speed layer, where real‐time insights flow directly into a consumer like Power BI and near real‐time data streams flow into the serving layer. While the insights are in the serving layer, additional transformation can be performed by batch processing prior to consumption. In addition to the time demands on your streaming solution, other considerations, such as the data stream format, programming paradigm, programming language, and product interoperability, are all important when designing your data streaming solution.

Azure Stream Analytics has the capacity to process data streams in parallel. Performing work in parallel increases the speed in which the transformation is completed. The result is a faster gathering of business insights. This is achieved using partition keys. Partition keys provide the platform with information that is used to group together the data and process it on a dedicated partition. The concept of time is very important in data stream solutions. Arrival time, event time, checkpoints, and watermarks all play a very important role when interruptions to the data stream occur. You learned that when an OS upgrade, node exception, or product upgrade happens, the platform uses these time management properties to get your stream back on track without losing any of the data. The replaying of data streams is possible if you have created or stored the data required to replay them. There are no such data archival features on the data streaming platform to achieve this.

There are many metrics you can use to monitor the performance of your Azure Stream analytics job. For example, the Resource Utilization, Event Counts, and Watermark Delay metrics can help you determine why the stream results are not being processed as expected or at all. Diagnostic settings, alerts, and Activity logs can also help determine why your stream processing is not achieving the expected results. Once you determine the cause of the problem, you can increase the capacity by scaling, configuring the error policy, or changing the query to fix a bug.

Design Security for Data Policies and Standards – Keeping Data Safe and Secure

When you begin thinking about security, many of the requirements will be driven by the kind of data you have. Is the data personally identifiable information (PII), such as a name, email address, or physical address, that is stored, or does the data consist of historical THETA brain wave readings? In both scenarios you would want some form of protection from bad actors who could destroy or steal the data. The amount of protection you need should be identified by a data security standard. The popular Payment Card Industry (PCI) standard seeks to define the kinds of security mechanisms required for companies who want to transmit, process, and store payment information. There are numerous varieties of PCI that can be helpful as a baseline for defining your own data security standards based on the type of data your data analytics solution ingests, transforms, and exposes. For example, the standard might identify the minimum version of TLS that consumers must use when consuming your data. Further examples of data security standards are that all data columns must be associated with a sensitivity level, or that all data that contains PII must be purged after 120 days.

A data security policy contains the data security standards component along with numerous other sections that pertain to security. A data security policy can be used to identify security roles and responsibilities throughout your company so that it is clear who is responsible for what. For example, what aspects of data security does a certified Data Engineer Associate have? At what point does data security merge into the role of a Security Engineer? A data security policy also accounts for the procedures when there are security violations or incidents. How to classify data, how to manage access to data, the encryption of data, and the management and disposal of data are all components that make up a data security policy. Some basic security principals—such as that there be no access to customer data by default and to always grant the lowest level of privileges required to complete the task—are good policies to abide by. Most of these security aspects are covered in more detail later in this chapter. After completing this chapter, you will be able to contribute to the creation of a data security policy and strategy.

As you begin the effort to describe and design your data security model, consider approaching it from a layered perspective. Figure 8.1 represents a layered security model. The first layer focuses on network security. In this chapter you will learn about virtual networks (VNets), network security groups (NSGs), firewalls, and private endpoints, each of which provides security at the networking layer.

FIGURE 8.1 Layered security

The next layer is the access management layer. This layer has to do with authentication and authorization, where the former confirms you are who you say and the latter validates that you are allowed to access the resource. Common tools on Azure to validate that a person is who they claim to be include Azure Active Directory (Azure AD), SQL authentication, and Windows Authentication (Kerberos), which is in preview at the time of writing. Managing access to resources after authentication is successful is implemented through role assignments. A common tool for this on Azure is role‐based access control (RBAC). Many additional products, features, and concepts apply within this area, such as managed identities, Azure Key Vault, service principals, access control lists (ACLs), single sign‐on (SSO), and the least privilege principle. Each of these will be described in more detail in the following sections.

The kind of business a company performs dictates the kind of data that is collected and stored. Companies that work with governments or financial institutions have a higher probability of attempted data theft than companies that measure brain waves, for example. So, the threat of a security breach is greater for companies with high‐value data, which means they need to take greater actions to prevent most forms of malicious behaviors. To start with, performing vulnerability assessments and attack simulations would help find locations in your security that have known weaknesses. In parallel, enabling threat detection, virus scanners, logging used for performing audits, and traceability will reduce the likelihood of long‐term and serious outages caused by exploitation. Microsoft Defender for Cloud can be used as the hub for viewing and analyzing your security logs.

The last layer of security, information protection, is applied to the data itself. This layer includes concepts such as data encryption, which is typically applied while the data is not being used (encryption‐at‐rest) and while the data is moving from one location to another (encryption‐in‐transit). Data masking, the labeling of sensitive information, and logging who is accessing the data and how often, are additional techniques for protecting your data at this layer.

Table 8.1 summarizes the security‐related capabilities of various Azure products.

TABLE 8.1 Azure data product security support

FeatureAzure SQL DatabaseAzure Synapse AnalyticsAzure Data ExplorerAzure DatabricksAzure Cosmos DB
AuthenticationSQL / Azure ADSQL / Azure ADAzure ADTokens / Azure ADDB users / Azure AD
Dynamic maskingYesYesYesYesYes
Encryption‐at‐restYesYesYesYesYes
Row‐level securityYesYesNoYesNo
FirewallYesYesYesYesYes

Azure data products enable you to configure each layer of the security model. The Azure platform provides many more features and capabilities to help monitor, manage, and maintain the security component of your data analytics solution. The remainder of this chapter provides details about these features and capabilities. But before you continue, complete Exercise 8.1, where you will provision an Azure Key Vault resource. Azure Key Vault is a solution that helps you securely store secrets, keys, and certificates. Azure Key Vault comes with two tiers, Standard and Premium, where the primary difference has to do with hardware security module (HSM) protected keys. HSM is a key protection method, which is a physical device dedicated to performing encryption, key management, authentication, and more. HSM is available in the Premium tier only. A software‐based key protection method is employed when a Standard tier is utilized. HSM provides the highest level of security and performance and is often required to meet compliance regulations. This product plays a very significant role in security, so learning some details about it before you continue will increase your comprehension and broaden your perspective.

Create a Microsoft Purview Account – Keeping Data Safe and Secure

  1. Log in to the Azure portal at https://portal.azure.com ➢ enter Purview in the search box in the upper middle of the browser ➢ select Microsoft Purview account ➢ select the + Create menu option ➢ select the subscription ➢ select the resource group ➢ enter a Microsoft Purview account name (I used brainjammer) ➢ select a region ➢ leave the managed resources as the default ➢ navigate through the other tabs ➢ leave the defaults ➢ click the Review + Create button ➢ and then click Create.
  2. Once provisioning is complete, navigate to the Microsoft Purview Overview blade ➢ select the Open link to open the Microsoft Purview Governance Portal ➢ select the Data Map hub ➢ select Collections ➢ and then select the Role Assignments tab, as shown in Figure 8.7. Make sure your account is within the Collection Admins group; if not, add it.

FIGURE 8.7 Microsoft Purview default root collection

  1. Click the + Add a Collection menu button ➢ enter Data Engineering in the Display Name text box ➢ enter your account into the Collection Admins group ➢ click the Create button ➢ and then select the root collection (for example, brainjammer) to do the same again, but this time enter R&D in the Display Name text box.
  2. Select the Sources navigation link ➢ click the Register menu button ➢ select the Azure Data Lake Storage Gen2 resource ➢ click Continue ➢ enter a name (I used ADLS‐csharpguitar) ➢ select the subscription that contains the ADLS container you created in Exercise 3.1 ➢ select the storage account name ➢ select Data Engineering from the Select a Collection drop‐down list ➢ click the Register button ➢ click the Register button again ➢ select Azure Synapse Analytics ➢ click Continue ➢ enter a name (I used ASA‐csharpguitar) ➢ select the subscription that contains the Azure Synapse Analytics workspace you created in Exercise 3.3 ➢ select R&D from the Select a Collection drop‐down list ➢ and then click Register. The result should resemble Figure 8.8.

FIGURE 8.8 Microsoft Purview Map view

  1. Navigate to the Azure Key Vault you created in Exercise 8.1 ➢ select Access Policies ➢ click the + Create menu button ➢ check the Get and List operations in the Secret permissions / Secret Management Operations section ➢ click Next ➢ search for and select the Microsoft Purview account name you just provisioned ➢ click the Next button twice ➢ and then click Create.

The additional registration of the Power BI workspace and an Azure SQL database are for effect only at this point. Feel free to register additional or different resources to your collections. The provisioning of the account was straightforward. You were again confronted by the concept of a Managed Resource group, which you experienced in Exercise 3.3. As a reminder, this resource group contains Azure products required by the provisioned resource. In this case, an Azure storage account is required and was stored in the provisioned Managed Resource group. In Exercise 8.2 you configured two collections, Data Engineering and R&D. The Data Engineering collection has the Power BI workspace and the ADLS container associated with it, while the R&D collection has the Azure Synapse Analytics workspace and an Azure SQL database. The structure of the collection hierarchy and associated sources provides some context to approach the policies, compliance, and governance constraints placed on them. Sorting together which datastores are necessary per collection provides you the means for setting policies on those resources based on the individuals affiliated with those groups. You did not perform the activity of scanning in Exercise 8.2 because of the requirement of managed identities. This will be discussed in the “Implement a Data Auditing Strategy” section.

Create an Azure Key Vault Resource – Keeping Data Safe and Secure-1

  1. Log in to the Azure portal at https://portal.azure.com ➢ click the menu button on the upper left of the browser ➢ click + Create a Resource ➢ select Security from the Categories section ➢ select Key Vault ➢ select the subscription ➢ select the resource group ➢ enter a key vault name ➢ select a region ➢ and then select a pricing tier (I used Standard). Leave the remaining options as the defaults.
  2. Click the Next button ➢ leave the defaults on the Access Policy tab ➢ select the check box next to your user identity in the Access Policies section ➢ click the Edit button ➢ observe the default Key, Secret, and Certificate default permissions ➢ click Next ➢ leave the defaults on the Networking tab ➢ click the Review + Create button ➢ and then click Create.
  3. Once the key vault is provisioned, navigate to it ➢ select Keys from the navigation menu ➢ select the + Generate/import menu link ➢ and then enter a name (I used brainjammerKey). The configuration should resemble Figure 8.2. The Elliptic Curve Name radio buttons show the available algorithms.

FIGURE 8.2 Creating an Azure Key Vault key

  1. Click the Create button ➢ select the Secrets navigation item ➢ click the + Generate/Import menu option ➢ enter a name (I used azureSynapseSQLPool) ➢ and then enter a secret value (I used the password of my Azure Synapse Analytics dedicated SQL pool). The configuration should resemble Figure 8.3.

FIGURE 8.3 Creating an Azure Key Vault secret

  1. Click the Create button ➢ select the Certificates navigation item ➢ click the + Generate/Import menu option ➢ enter a certificate name (I used brainjammerCertificate) ➢ and then enter a subject value (I used “CN=brainjammer.net”). The configuration should resemble Figure 8.4.

FIGURE 8.4 Creating an Azure Key Vault certificate

  1. Click Create.

Exercise 8.1 is straightforward in that you should recognize most of the options and understand what they mean. A few features and concepts, however, are worthy of discussion. On the Access Policy tab, you likely noticed the option to manage access by either Key Vault access policy or Azure role‐based access control (RBAC). The Key Vault access policy enables you to grant service principals, users, applications, or user groups access to specific operations on the keys, secrets, and certificates hosted in the key vault—for example, those shown in Figure 8.5. Figure 8.5 is similar to what you saw in step 2 of Exercise 8.1 when viewing the default permissions.

FIGURE 8.5 Vault access policy operations

With the RBAC approach, you grant a user or group access to the key vault using a role. There are numerous built‐in key vault roles, such as Key Vault Administrator, Key Vault Reader, and Key Vault Secrets User. If any of the built‐in roles do not meet your requirements, you can create a custom role, using a JSON document similar to the following:

Design Row‐Level and Column‐Level Security – Keeping Data Safe and Secure

In a relational database, a table is made up of rows of data. Each row can have many columns. Once your data has been ingested and transformed and is ready for consumption, you may need to apply some additional security at the row or column level. Row‐level security (RLS) is very similar to a filter commonly implemented using the WHERE clause. This works fine as long as no one has direct access to the data and can run queries that circumvent this level of projection. In this case, if you need to restrict access on a row level, and you have clients that connect directly to your database, then you need to apply RLS. Implementing RLS requires what is called a FILTER PREDICATE, which is applied using theCREATE SECURITY POLICY statement. Consider a scenario where you have implemented a global brain wave repository that allows anyone to upload their brain wave readings to your datastore for processing and data analytics. Consider taking it one step further and provide the option for those individuals to perform analytics on their scenarios. As the platform owner, you would want to see all readings from all those who have uploaded data; however, you would want to restrict individuals’ access to only their data. RLS is a means for achieving just that. Consider the following SQL statement:
 CREATE SECURITY POLICY BrainwavesFilter_ext
 ADD FILTER PREDICATE Security.fn_securitypredicate(brainwaveProducer)
 ON dbo.Brainwaves_ext
 WITH (STATE = ON);

The statement creates a policy named BrainwavesFilter_ext and adds a predicate based on a value stored on a table named Brainwave_ext. The value in column brainwaveProducer is the user account ID of the person who uploaded brain waves. When CRUD queries are executed against the Brainwave_ext table, the policy uses the predicate to filter, change, remove, insert, and/or deliver the data for the user who uploaded it.

Another feature that you can apply to the data is called column‐level security. Recall from the SUBJECTS table a few columns are worthy of a Confidential sensitivity level. Columns like USERNAME, ZIPCODE, EMAIL, and BIRTHDATE contain data that should not be accessible to the public or generally available. If you have a user in your Azure Synapse Analytics dedicated SQL pool named brainjammer and do not want that user to have access to these columns, you can execute the following command to exclude the columns from the list:

 GRANT SELECT ON SUBJECTS
   (ID, FIRSTNAME, LASTNAME, EMAIL, COUNTRY, CREATE_DATE) TO brainjammer;

If that user then attempts the following SELECT statement, an error would be rendered stating that access is denied:

 SELECT * FROM SUBJECTS

In the Implement Row‐Level and Column‐Level Security section you will perform an exercise and experience column‐level security firsthand.