As I mentioned in an earlier blog, among the sessions on big data at RSA Conference China was Samir Saklikar’s presentation on Embedding Security and Trust Primitives in Map Reduce. Samir is in the RSA Office of the CTO and has been focused on big data security for more than a year, exploring the security and privacy issues for big data, the application of current security technology to those security requirements and the definition of new capabilities that would provide significant benefits in addressing those issues.
Samir’s presentation was a great follow-on to the session that Branden Williams and I did, framing up the security requirements for big data but then drilling down into a new opportunity to significantly improve the security and privacy of data analytics. Samir’s proposal is to introduce security introspection capabilities into Hadoop Map/Reduce, such that critical security properties, such as legitimate access to information, can be evaluated during the analysis process.
In particular, Samir proposes an extensible introspection framework that provides the ability to interject hooks and callbacks at various points within the Map/Reduce processing flow. These callbacks would support both blocking and non-blocking introspection of security properties. The introspection points would reflect the typical flow for map/reduce, in which a master program creates and coordinates various worker threads, some of which are designated to Map tasks and the other to Reduce tasks. The Map tasks work on individual chunks of the input data and do individual processing on them to produce intermediate results which are written to local disks. The Reduce tasks work on the collected and sorted intermediate results to produce the final outputs.
The figure below from Samir’s presentation shows the introspection points within the Map/Reduce processing flow that Samir proposes: program initiation and end; thread creation and deletion; processing input; writing intermediate data; reading intermediate data; and writing final results to output files.
(diagram adapted from Google’s original paper on Map Reduce)
This granular approach provides a great deal of flexibility regarding where to interject introspection capabilities, what security properties to check or security operations to perform, and what actions to take as a result of the security introspection.
Samir ‘s presentation has much more detail about the proposed introspection model, as well as lots of great information on other aspects of achieving effective security and privacy for big data. Samir and Dennis Moreau will be exploring some of these questions further in their session on big data security at RSA Conference Europe, in mid-October. But this particular idea of security introspection in Map/Reduce, a simple and elegant approach to addressing security and privacy during the analytics process, is one that deserves to be known and discussed widely. It points the way toward a major improvement in Hadoop, an important step forward in securing big data.