Saturday, September 12, 2020

Dynamic Data Masking feature of Snowflake


         Data security over cloud is a concern for many organizations while moving their data to cloud. With newer and ever growing regulations and laws on data privacy and protections, it is ever more challenging landscape for organizations to protect their customer's data.

        In July 2020 Snowflake announced availability of new set of column level security features available called dynamic data masking. This lets organizations control the exposure of Personally Identifiable Information (PII), Protected Health Information (PHI). There are a number of security products such as SecuPi provide these features as their product services, Now snowflake offers its own built in feature with many advantages compared to third party products. This features offers

  • Ease of use
  • No additional performance hit
  • Seamless integration with rest of snowflake's features & roles
  • No additional cost
Dynamic Data Masking is one the new feature introduced in Snowflake to cater to data security and privacy both from external and internal users.

Creating an effective data protection policy

In order to create an effective data masking and protection policy we need to orchestrate policies through roles instead of individual users. Following steps can be created to do the same.



Create Role

Commands to create a user, a role. You could also manage these by

Create user PII_USER password=piiuser default_role = masking_admin must_change_password = FALSE;
Create Role masking_admin;

Assign Role

Now assign the data masking role to a user who can mask the data and define all rules around how data should be presented to different roles or users.

grant role masking_admin to user PII_USER;

Apply Policy

You can assign masking policies at DB object levels, in this example I am applying to account and a schema.

grant apply masking policy on account to role masking_admin;

Create Policy

grant create masking policy on schema STAGE to role masking_admin;

Apply policy to columns

Now we create a masking policy for function or UDF that implements masking of data. In this first example we return the original Social security number if the user's role is Masking_Admin, returns a masked data for everyone other role including account admins.
In the second example of masking email ids, if the user belongs to ACCOUNTADMIN role, email id is partially masked and domain name part is returned as is and rest of the details of email id are masked with a function.
Finally we need to apply these masking policies to individual columns as needed. You can apply these policies to thousands of columns and they will all mask the data for different roles as expected 
--Create a new policy
create or replace masking policy STAGE.SSN_Policy as (SSN string) returns string ->
case when current_role() in ('MASKING_ADMIN') then
SSN
ELSE
'**Masked SSN**'
END;
---Associate it
alter table customer modify column ssn set masking policy Stage.SSN_Policy;
SELECT * FROm stage.CUSTOMER;
create or replace masking policy STAGE.Email_Policy as (email string) returns string ->
case when current_role() in ('MASKING_ADMIN') then email
when current_role() in ('ACCOUNTADMIN') then regexp_replace(email,'.+\@','*****@')
else '**Masked Email**'
END;
alter table customer modify column emailid set masking policy Stage.Email_Policy;

When implemented with right set of policies, visibility of your data can be controlled at a granular level.

Check out my YouTube video highlighting the feature in detail with a live demo



3rd Party tools

There are a number third party security offerings such as SecuPi that provide overarching security suites that cater to many other tools of your IT ecosystem, but Dynamic data masking feature is a built in feature that does not have any negative impact or security concerns or additional installation requirements.

Conclusion

Another great feature by snowflake that sets is a leader that addresses every aspect of cloud data ware housing needs.