Saturday, September 12, 2020

Dynamic Data Masking feature of Snowflake


         Data security over cloud is a concern for many organizations while moving their data to cloud. With newer and ever growing regulations and laws on data privacy and protections, it is ever more challenging landscape for organizations to protect their customer's data.

        In July 2020 Snowflake announced availability of new set of column level security features available called dynamic data masking. This lets organizations control the exposure of Personally Identifiable Information (PII), Protected Health Information (PHI). There are a number of security products such as SecuPi provide these features as their product services, Now snowflake offers its own built in feature with many advantages compared to third party products. This features offers

  • Ease of use
  • No additional performance hit
  • Seamless integration with rest of snowflake's features & roles
  • No additional cost
Dynamic Data Masking is one the new feature introduced in Snowflake to cater to data security and privacy both from external and internal users.

Creating an effective data protection policy

In order to create an effective data masking and protection policy we need to orchestrate policies through roles instead of individual users. Following steps can be created to do the same.



Create Role

Commands to create a user, a role. You could also manage these by

Create user PII_USER password=piiuser default_role = masking_admin must_change_password = FALSE;
Create Role masking_admin;

Assign Role

Now assign the data masking role to a user who can mask the data and define all rules around how data should be presented to different roles or users.

grant role masking_admin to user PII_USER;

Apply Policy

You can assign masking policies at DB object levels, in this example I am applying to account and a schema.

grant apply masking policy on account to role masking_admin;

Create Policy

grant create masking policy on schema STAGE to role masking_admin;

Apply policy to columns

Now we create a masking policy for function or UDF that implements masking of data. In this first example we return the original Social security number if the user's role is Masking_Admin, returns a masked data for everyone other role including account admins.
In the second example of masking email ids, if the user belongs to ACCOUNTADMIN role, email id is partially masked and domain name part is returned as is and rest of the details of email id are masked with a function.
Finally we need to apply these masking policies to individual columns as needed. You can apply these policies to thousands of columns and they will all mask the data for different roles as expected 
--Create a new policy
create or replace masking policy STAGE.SSN_Policy as (SSN string) returns string ->
case when current_role() in ('MASKING_ADMIN') then
SSN
ELSE
'**Masked SSN**'
END;
---Associate it
alter table customer modify column ssn set masking policy Stage.SSN_Policy;
SELECT * FROm stage.CUSTOMER;
create or replace masking policy STAGE.Email_Policy as (email string) returns string ->
case when current_role() in ('MASKING_ADMIN') then email
when current_role() in ('ACCOUNTADMIN') then regexp_replace(email,'.+\@','*****@')
else '**Masked Email**'
END;
alter table customer modify column emailid set masking policy Stage.Email_Policy;

When implemented with right set of policies, visibility of your data can be controlled at a granular level.

Check out my YouTube video highlighting the feature in detail with a live demo



3rd Party tools

There are a number third party security offerings such as SecuPi that provide overarching security suites that cater to many other tools of your IT ecosystem, but Dynamic data masking feature is a built in feature that does not have any negative impact or security concerns or additional installation requirements.

Conclusion

Another great feature by snowflake that sets is a leader that addresses every aspect of cloud data ware housing needs.

Saturday, March 16, 2019

Snowflake Data Sharing - Complete video tutorial

Snowflake's data sharing feature lets you share your data with other account holders. Since data is shared as readonly, consumers do not incur any charges but they only pay for the compute resource they use.
snowflake-data-sharing

Checkout the detailed instructions video on my channel


LInk to this video - https://www.youtube.com/watch?v=GcrEK-VFEB8&list=PLUxVA4dlJgcz90Vbufh6o6p8Rrew-lnE3

Snowflake Secure Views

Snowflake's secure views let you abstract the logic and hide the underlying tables structures and their relations. These secure views help you manage data security and privacy.
snowflake-secure_views

In this video i reviewed all the details related to secure views and demo of using them.



Commands used in the video to demonstrate snowflake's secure views. Checkout the detailed video on my channel - https://www.youtube.com/watch?v=CRZQjxP10G0&list=PLUxVA4dlJgcz90Vbufh6o6p8Rrew-lnE3&index=3


--LIST PATIENTS
SELECT * FROM PATIENTS;


CREATE OR REPLACE SECURE VIEW ORG_PATIENTS AS
SELECT * FROM PATIENTS ;


SELECT * FROM ORG_PATIENTS ;

SHOW VIEWS;


SELECT FULLNAME,CITY,USER_ACCESS.ORG_ID,USER_ACCESS.ACCOUNT_ID FROM patients 
INNER JOIN USER_ACCESS ON PATIENTS.ORG_ID=USER_ACCESS.ORG_ID;

insert into USER_ACCESS values ('Baylor Hospital','');
insert into USER_ACCESS values ('Mayo Clinic','');




create table USER_ACCESS
(
    ORG_ID VARCHAR(50),
    ACCOUNT_ID varchar(50)
);

insert into USER_ACCESS values ('Baylor Hospital','MN30689');
insert into USER_ACCESS values ('Mayo Clinic','PO51568');

SELECT * FROM USER_ACCESS; 

TRUNCATE  TABLE USER_ACCESS SET ACCOUNT_ID='PO51568' WHERE 
ORG_ID='Mayo Clinic';

SELECT FULLNAME,CITY,USER_ACCESS.ORG_ID,USER_ACCESS.ACCOUNT_ID FROM patients 
INNER JOIN USER_ACCESS ON PATIENTS.ORG_ID=USER_ACCESS.ORG_ID;


select CURRENT_ACCOUNT();
select CURRENT_ROLE ();


--CREATE A SECURE VIEW
CREATE OR REPLACE SECURE VIEW ORG_PATIENTS AS
SELECT FULLNAME,CITY,USER_ACCESS.ORG_ID FROM patients 
INNER JOIN USER_ACCESS ON PATIENTS.ORG_ID=USER_ACCESS.ORG_ID
WHERE ACCOUNT_ID=CURRENT_ACCOUNT();



select * from ORG_PATIENTS;

SHOW VIEWS;
describe view ORG_PATIENTS;

describe view mycustomerview;
describe view ORG_PATIENTS;


select get_ddl('view', 'ORG_PATIENTS') view_defn;






Snowflake Time Travel - A complete guide



In the recent years IT Roles such as Database administrators are becoming  less and less relevant as most cloud based databases and datawarehouses are managed by IT engineers needing very little or no administration.

Snowflake already has very little administration. You do not need to do typical DBA tasks like index tables, partitioning or deal with space related issues. Time travel is another feature that lets developers self serve dba related task of reverting or restoring the database after you do issue some inadvertent commands.




In this detailed hands on video I explained about parameters that affect time travel and how to utilize this feature to revert the accidental or intentional database changes. 
Here are the list of commands used in the video

-- DATA_RETENTION_TIME_IN_DAYS

show parameters;
show parameters in database SALES;
show parameters in warehouse XMALLFORFINANCE;
show parameters in account;
show parameters like '%DATA%';
show parameters like 'DATA%' in account;
--CHANGES DATA RETENTION FOR THE DATABASEshow parameters in database SALES;alter DATABASE SALES set DATA_RETENTION_TIME_IN_DAYS = 1 ;

--CHANGES DATA RETENTION FOR THE WHOLE ACCOUNTshow parameters like 'DATA%' in account;alter ACCOUNT set DATA_RETENTION_TIME_IN_DAYS = 1 ;

select * from PATIENTS;
INSERT INTO PATIENTS values  (5,'TEST PATIENT','AMSTERDAM','Mayo Clinic');select * from PATIENTS at(offset => -60*5);
delete from PATIENTS where id=5;drop table sales.snow.PATIENTS ;
--Over a dayselect * from PATIENTS at(offset => -60*500000);

select * from PATIENTS at(timestamp => 'Sat, 16 Mar 2019 16:20:00 -0700'::timestamp);

select * from user_access;
--Before changes make by a specific queryselect * from patients before(statement => '025e545d-fc23-4e8d-9ac5-335943a1bec2');
create table restored_table clone user_access  at(timestamp => 'Mon, 09 May 2019 01:01:00 +0300'::timestamp);    create table restored_table clone user_access  at(offset => -60*10);    select * from restored_table;  drop table restored_table;    --Schema as existed 1 hour before  create schema restored_schema clone snow at(offset => -3600);  drop schema restored_schema;    create database restored_db clone sales  before(statement => '025e545d-fc23-4e8d-9ac5-335943a1bec2');  drop database restored_db;  show tables history  in sales.snow;show tables history  in sales;---Restoring objects
undrop table patients;
undrop schema snow;
undrop database sales;



More details are available in snowflake's documentation page - https://docs.snowflake.net/manuals/user-guide/data-time-travel.html

Check out my youtube channel for more videos - https://www.youtube.com/channel/UCT3bqK2QL93j-IFYFYbvjWQ

Thursday, April 12, 2018

Amazon S3 (Simple Storage Service) - object storage service

One of the Amazon's core service offering was highly scalable S3 storage. In this video i explained about its quick overview and hands on explanation most of its features including  Security, versioning, static website hosting, accelerated transfers, event handling, tags etc.
I covered this topic as I intend to do more videos in future that integrate with such services of cloud computing. Let me know your feedback.

Wednesday, May 03, 2017

Creating Azure Sql server database

Microsoft's products have always been known for having very user-friendly interfaces. Azure is not different. Seemingly complicated PaaS, IaS and SaaS are all available under one Microsoft Azure umbrella. In this video of BigData Technologies, I explore creating a new cloud computing based Azure Sql server database and then connect it to most common tools SSMS and Visual studio.



Code used in the video

Create table  cpt_codes(cptcode int, cpt_desc varchar(2000));

insert into cpt_codes values (62320,'Injection(s), of diagnostic or therapeutic substance(s) (e.g.,anesthetic, antispasmodic, opioid, steroid, other solution),not including neurolytic substances, including needle orcatheter placement, interlaminar epidural or subarachnoid,cervical or thoracic; without imaging guidance');


select * from cpt_codes;


As always, please feel free to comment on this blog or the video, and I will be happy to answer.

Sunday, February 05, 2017

MongoDB - A NoSQL, Json, RockMongo, RoboMongo

MongoDB's popularity has been soaring in the recent years. The first video explores the details of Mongo DB, a sample document/table is created and we also see how to insert a sample Json data and extracting the it.
The second video explores a few popular tools, their installation, and usage. RockMongo and RoboMongo are explored as an example.

Extracting HL7 data using Talend and storing in Cassandra

HL7 is a set of international standards used by clinical and healthcare providers to exchange information. This video explores using this HL7 data, extracting a few sample fields from it and then saving them into one of the popular NoSql database Cassandra.

Route53 of Amazon webservices is used for all hosting/Routing related needs. This video explores various options for setting up your domain name servers, how to redirect it to AWS and other settings inside AWS to make it redirect to your content on AWS. 
In this example i am demonstrating how to redirect your domain to a simple html page stored on AWS, process is the same even if you are redirecting to any app hosted on Amazon web servervices.

Tuesday, January 17, 2017

Talend Project - Child jobs, Functions and Variables

This concluding part of Talend project video explores creating sub/child jobs, how to pass data between the two and different strategies to share the data.
Code in subroutines
// Code from https://www.youtube.com/watch?v=a7-HUU4js9E
package routines;

public class formatEmails {

    public static String formatEmail(char oldRating, char newRating, String typeOfRating) {
       String fEmailText="";
    if (newRating>oldRating) {
    fEmailText=fEmailText+""+typeOfRating +" rating reduced from " +Character.toString(oldRating) +" to "+Character.toString(newRating)  +"
" ;
        }
    if (newRating
    fEmailText=fEmailText+""+typeOfRating +" rating improved from " +Character.toString(oldRating) +" to "+Character.toString(newRating)  +"
" ;
        }
    return fEmailText;
    }
}


It also explores subroutines/functions that helps you reduce the code and modularizes it. 



Check out the detailed video - https://www.youtube.com/watch?v=a7-HUU4js9E

tjavarow component code
/* -- Code from https://www.youtube.com/watch?v=a7-HUU4js9E ---- */ 
String wholepage;  
String ratings;
wholepage=input_row.document.toString(); 
int pos=wholepage.indexOf("composite_val"); 
ratings=wholepage.substring(pos,pos+250).replaceAll("[\\[\\]\"]", "").replaceAll(" \n", " ").replaceAll(" composite_val_vgm","");    String allratingsonly="";
String[] splitratings = ratings.split("composite_val>"); 
int i=0;
context.EmailText=context.EmailText+"

Ratings for : " +context.stock +"
" ;


 for (String eachratingrow : splitratings) 
  {    
   if (eachratingrow.length()>0)  
    { 
  
   if (i==0){  
     output_row.z_growth_rating=Character.toString(eachratingrow.charAt(0));   
        context.EmailText=context.EmailText+formatEmails.formatEmail(input_row.growth_rating.charAt(0), eachratingrow.charAt(0), "Growth");
   }    
   if (i==1)
   {
    output_row.z_momentum_rating=Character.toString(eachratingrow.charAt(0));   
       context.EmailText=context.EmailText+formatEmails.formatEmail(input_row.momentum_rating.charAt(0), eachratingrow.charAt(0), "Momentum");  
     }  
   if (i==2)
  
   output_row.z_value_rating=Character.toString(eachratingrow.charAt(0));   
   context.EmailText=context.EmailText+formatEmails.formatEmail(input_row.value_rating.charAt(0), eachratingrow.charAt(0), "Value"); 
   }     
   
   if (i==3)
   {   
    output_row.z_vgm_rating=Character.toString(eachratingrow.charAt(0));  
    context.EmailText=context.EmailText+formatEmails.formatEmail(input_row.vgm_rating.charAt(0), eachratingrow.charAt(0), "VGM"); 
    }  
    i++;
      
      }
  } 
output_row.EmailText=context.EmailText;
  /* - End of Code from https://www.youtube.com/watch?v=a7-HUU4js9E --*/

Saturday, January 07, 2017

Talend and Cassandra - Storing data and retrieving

This video explores saving data in Cassandra NoSql database. This video is a continuation of previous one. Apart from saving the data, it also discusses about quick tips on strategies for creating, storing and retrieving data in Cassandra.

Code used in the video

Select * from ss.stockratings where stock='TSLA' order by ratingtime desc limit 1;


Insert into ss.ratings(stock ,ratingtime,id,growth_rating,momentum_rating, value_rating,vgm_rating) values ('TSLA',dateof(now()),now(),'A','D','A','F');

CREATE TABLE ss.stockratings (
 id uuid,
 stock text,
 ratingtime timestamp,
 growth_rating text,
 momentum_rating text,
 value_rating text,
 vgm_rating text,
 PRIMARY KEY ( stock, ratingtime,id)
);