Data governance is a sine qua non to protect your data in the cloud. Data governance is of particular importance for the cloud service delivery model which is philosophically different from the traditional IT product delivery model.
In a product delivery model, it is difficult for a corporate IT group to quantify asset value and data security value at risk over time due to changes in staff, business conditions, IT infrastructure, network connectivity and software application changes.
In a service delivery model, payment is made for services consumed on a variable basis as a function of volume of transactions, storage or compute cycles. The data security and compliance requirements can be negotiated into the cloud service provider service level agreement. This makes quantifying the costs of security countermeasures relatively straightforward since the security is built into the service and renders the application of practical threat analysis models more accessible then ever.
However – this leaves the critical question of data asset value and data governance. We believe that data governance is a primary requirement for moving your data to the cloud and a central data security countermeasure in the security and compliance portfolio of a cloud customer.
With increasing numbers of low-priced, high-performance SaaS, PaaS and IaaS cloud service offerings, it is vital that organizations start formalizing their approach to data governance. Data governance means defining the data ownership, data access controls, data traceability and regulatory compliance, for example PHI (protected health information as defined for HIPAA compliance).
To build an effective data governance strategy for the cloud, start by asking and answering 10 questions – striking the right balance between common sense and data security requirements:
- What is your most valuable data?
- How is that data currently stored – file servers, database servers, document management systems?
- How should that data be maintained and secured?
- Who should have access to that data?
- Who really has access to that data?
- When was the last time you examined your data security/encryption polices?
- What do your programmers know about data security in the cloud?
- Who can manipulate your data? (include business partners and contractors)
- If leaked to unauthorized parties how much would the damage cost the business?
- If you had a data breach – how long would it take you to detect the data loss event?
A frequent question from clients regarding data governance strategy in the cloud is “what kind of data should be retained in local IT infrastructure?”
A stock response is that obviously sensitive data should remain in local storage. But instead, consider the cost/benefit of storing the data in an infrastructure cloud service provider and not disclosing those sensitive data assets to trusted insiders, contractors and business partners.
Using a cloud service provider for storing sensitive data may actually reduce the threat surface instead of increasing it and give you more control by centralizing and standardizing data storage as part of your overall data governance strategy.
You can RFP/negotiate robust data security controls in a commercial contract with cloud service providers – something you cannot easily do with employees.
A second frequently asked question regarding data governance in the cloud is “How can we protect our unstructured data from a data breach?”
The answer is that it depends on your business and your application software.
Although analysts like Gartner have asserted that over 80% of enterprise data sets are stored in unstructured files like Microsoft Office – this is clearly very dependent on the kind of business you’re in. Arguably, none of the big data breaches happened by people stealing Excel files.
If anything, the database threat surface is growing rapidly. Telecom/cellular service providers have far more data (CDRs, customer service records etc…) in structured databases than in Office and with more smart phones, Android tablets and Chrome OS devices – this will grow even more. As hospitals move to EMR (electronic medical records), this will also soon be the case in the entire health care system where almost all sensitive data is stored in structured databases like Oracle, Microsoft SQL Server, MySQL or PostgreSQL.
Then. there is the rapidly growing use of MapReduce/JSON database technology used by Facebook and Digg: CouchDB (with 10 million installations) and MongoDB that connect directly to Web applications. These noSQL databases may be vulnerable to some of the traditional injection attacks that involve string catenation. Developers are well-advised to use native APIs for building safe queries and patch frequently since the technology is developing rapidly and with large numbers of eyeballs – vulnerabilities are quickly being discovered and patched. Note the proactive approach the the Apache Foundation is taking towards CouchDB security and a recent (Feb 1, 2011) version release for a CouchDB cross-site scripting vulnerability.
So – consider these issues when building your data governance strategy for the cloud and start by asking and answering the 10 key questions for cloud data security.