Bleeding Edge: 2010

Thursday, July 8, 2010

Management Thoughts

I get asked about my management and leadership style often. It’s an important question, one that often gets short shrift in silicon valley where engineering execs are hired based on their apparent technical competence (and, when fired, it’s always for other reasons – exercises poor leadership, misses deadlines, or simply doesn’t get the job done).

I’ve had a long-standing interest in questions of motivation, morale, and teamwork: How can a manger inspire a team to a higher level of performance? What motivates people to "be extraordinary" when that is what's required to meet a significant challenge? Many books have been written on that subject and I discuss some influences on my approach to management and leadership below.

The Classics

My influences start with Peter F. Drucker, whose timeless principles still apply in today’s environment. Drucker especially emphasized the value of a company’s human capital and the need to maximize those assets through effective management. He was writing about such core concepts as metrics and management by objectives over fifty years ago. His foundation principles include fairness, and deep respect for everyone in an organization.

Drucker never experienced the tsunami of information today’s managers must cope with. Stephen R. Covey had a lot to say about instilling a discipline – one’s habits – into the practice of management, and how important it is to understand the context within which we work (especially to begin with the end in mind!). But what I like most about Covey is that he brings ethics and principle-centered leadership into the equation. After all, we spend most of our waking hours working; and if we can’t feel good about it other than bringing home a paycheck, then we lose balance in our lives. (There’s a lot of that going around these days.)

There’s a category of books about leadership and achieving high performance from the world of sports, music and other domains which recognizes superior achievement that I think is relevant to business. It’s somewhat ironic that ideas for how to instill principles of unselfishness, making sacrifices to achieve a higher goal, and reaching a mental state (often referred to as “getting into the zone”) in which the highest performance is possible seem to get more attention outside of the business world. But there may be parallels in how, for example, Bill Walsh instilled his “Standard of Performance” on a team widely regarded as one of the worst in sports, so that they became perennial champions. Maybe business leaders would benefit from learning how Michael Tilson Thomas inspires the players in the San Francisco Symphony to perform much better than they ever had under his predecessors, although it’s mostly the same musicians as before. And when Bill Russell describes how he got in a zone and outperformed his competitors to the tune of 11 NBA Championships, aren’t there potentially lessons to be learned from that?

Jim Collins and Level 5 Leadership

Jim Collins is another significant influence. I got a lot out of BHAG (big hairy audacious goals) and "try a lot of stuff and keep what works" in Built to Last; but what inspires me the most is his research on characteristics of an effective leader, summarized in this article in Harvard Business Review, Level 5 Leadership: The Triumph of Humility and Fierce Resolve. Collins identifies a paradoxical combination of personal attributes of the most successful executives: a deep personal humility (giving credit to others for successes, having a calm demeanor); and an intense personal will (unwavering resolve, utterly intolerant of mediocrity). You can think of this balance as the yin and yang of level 5 leadership:

My takeaway is: Get my ego out of the equation, and always set the bar high. The worst thing a leader can do to a team is expect too little. In my opinion, the right formula is this: Start by building a world-class team, communicate the vision, remove roadblocks, maintain high professional standards – and expect great things from the team.

Servant Leadership

Over 40 years ago Robert Greenleaf, founder of the Greenleaf Center for Servant Leadership, published some essays on what he coined servant leadership, a practical philosophy that replaces traditional autocratic leadership with a holistic, ethical approach. Here’s how Greenleaf defined servant leadership:

"The servant-leader is servant first… It begins with the natural feeling that one wants to serve, to serve first. Then conscious choice brings one to aspire to lead. That person is sharply different from one who is leader first, perhaps because of the need to assuage an unusual power drive or to acquire material possessions…The leader-first and the servant-first are two extreme types. Between them there are shadings and blends that are part of the infinite variety of human nature."

Servant leadership applies at the team or departmental level, too. Treating other departments as customers creates the right mind-set for macro-level servant leadership. When this is reciprocated by other departments, great things can happen.

Robert Greenleaf died in 1990, but servant leadership has experienced something of a revival of late, in part due to its association with agile software development methodologies as I’ll explain in greater detail below.

Theory U

C. Otto Scharmer, author of Theory U: Leading From the Future as It Emerges, is a senior lecturer at MIT and a consultant who has helped develop leadership programs for companies such a Google, HP, Daimler, PricewaterhouseCoopers and Fujitsu (as well as non-profits including World Wildlife Fund and African Public Health Leadership Initiative). In his book Scharmer synthesizes and distills ideas from management sciences thought leaders, psychologists, sociologists, captains of industry, and other thoughtful sources on the nature of thinking, social dynamics, motivation, and communication. Scharmer’s work was influenced by, among others, Rudolf Steiner (whom I have read extensively). Because of the depth and breadth of his work, Scharmer is not a particularly easy read but one is well rewarded for the effort invested.

Scharmer’s main thesis is that we have a “blind spot” in our understanding of leadership and transformational change. Scharmer calls it the “invisible dimension” of leadership, even though it is our source dimension, as shown in the figure below:

Scharmer describes a conversation he had with the CEO of a major financial services company. The CEO said that, after years of organizational learning projects and facilitating corporate change, the probability of success of a major project depends on the inner condition, the inner place from which they operate or the source from which all of their actions originate. How can that be improved?

We know a great deal about what leaders do and how they do it. But we know very little about the inner place, the source from which they operate. In professional sports this is an area that gets a great deal of attention. But not so much in the business world.

Think of this example: If we try to understand how Leonardo da Vinci creates a masterpiece such as the Mona Lisa, we can study it in comparison to all other paintings to see if we can apprehend its most salient qualities (the results, or the “what” of his work). Alternatively, we might hope to observe him at work while he’s painting (the process, or the “how”). But what about when he is staring at a blank canvas, just as he’s beginning to paint. What is the source of his art? There’s no question it’s an inner quality—a source dimension. What can we learn about that?

At its core, leadership is about shaping how individuals working as a group attend to a challenge or problem. This is key: Albert Einstein advises us “No problem can be solved from the same level of consciousness that created it.” Creating an environment wherein teams collectively achieve a higher level of consciousness that can be directed towards the challenge at hand is imperative for groups to reach their highest potential. We need help attaining the higher level of consciousness, as Einstein advises, to solve difficult problems. Theory U is about helping teams achieve a higher level of consciousness, which enables them to co-inspire each other to higher levels of performance.

What is Theory U? It’s a process that ultimately enables a team to co-evolve – grow as a group – to achieve the highest collective performance possible. It begins with how we communicate. Typical business conversations are conducted in “downloading” mode: habits of thought are re-confirmed; people talk nice, but generally stick to a predefined script (I knew you were going to say that). The level of communication needs to progress from downloading to factual, object-focused talking and then to the next level – empathetic listening that’s based on inquiry and deep listening. Ultimately that enables the progression through the U-Process, as depicted below:

When one considers the subtitle of the book – leading from the future as it emerges – it’s easy to imagine it as a new-age inspired manifesto rather than the thoughtful, practical source of leadership wisdom that it is. However, the concepts of co-inspiring, co-creating and co-evolving – when applied in an environment of a smart, well-motivated team – can represent the source of a true business advantage.

My Management Style

In describing my management style I am specifically discussing how I apply many of the principles above to the task of being an engineering executive.

Outside of the technical domain, my primary areas of focus are people; and process.

People

The first question of leadership is not what, or how, but who. Jim Collins says about his research into great leaders:

We expected that great leaders would start with vision and strategy. Instead, they attended to people first. They got the right people on the bus, moved the wrong people off, ushered the right people to the right seats – and then figured out where to drive it.

My mandate is to put the best team on the field that I can. Building a world-class team is necessary (but not sufficient) for building a world-class company. In fairness to investors, executives and the rest of the company, we set a high standard and ensure that everyone is performing up to that standard. If they can’t – or won’t – then it’s not a good fit for them and we will replace them with someone who can and will meet our Standard of Performance.

People are my first priority, always. Human capital is the company’s most valuable asset and should be dealt with accordingly – by demanding a high level of contribution, of course, but also by treating them respectfully, fairly and honestly. People won’t be motivated unless they are honored as professionals and human beings worthy of our highest respect. That helps foster the type of company culture that I want to work in, and one that other high performers gravitate towards as well. (One way to foster this is company-wide is to encourage your team to treat other departments as customers.)

But it’s called “work” for a reason, and my attitude towards the people on my team is to ensure we’re performing at as high a level as is reasonably possible. To this end, I focus on the “four C’s”: Communication; Commitment; Competition; and Customers.

Communication: It’s easy to say “communicate more” or “there’s never too much communication” but the truth is more complicated. If one person “communicates” an essential fact to another, but it’s buried on page 27 of a 40-page document and it hasn’t been highlighted or singled out, then effective communication probably hasn’t taken place. On the other hand, if someone is communicating a point of view and the listener is in “downloading” mode, then effective communication probably hasn’t taken place in this case as well either. The only measure of communication is its effectiveness. To communicate well is not just a style issue, it requires thought and effort. We always try to establish processes that support required communications, but in dynamic environments more is required than can be anticipated and defined by standard procedures.

Good communication is hard. Since we mostly work on teams and each team member has access to different information sources, one needs to consider context and other perspectives to determine what information should be communicated. The mode of communication is important as well: much if not most technical information should be communicated in writing, and email is the typical mode (although wikis, blogs and forum discussions may be more effective at times). Sometimes a phone call or face-to-face interactive discussion is the better option, especially if there’s unexpected information, a sensitive topic to discuss, or immediate feedback is required.

Many processes are established to further effective communications, including: status meetings, daily Scrum meetings, distribution lists for key events and updates, product backlog updates and transparent access to development status artifacts, use of and notification rules for tracking databases, performance reviews, and so forth.

A quick word about performance reviews: I’m not a big fan, but they’re mostly necessary – especially when they are tied to annual salary increases. The main problem I have with reviews is the notion of providing feedback on an annual basis; that’s not enough of that kind of communication, and it comes too late. Instead of feedback I prefer to focus on feedforward: It’s more valuable to an employee to be told in advance what is required -- and in near real time how well the objectives are being met -- than to wait until year-end to find out what should have been done.

Commitment: In general, we make two types of commitments to each other: informal, routine commitments (such as “I’ll send you that report tomorrow”); and more critical, “Capital C” Commitments (such as “this will be released on July 21”). The distinction is important: As we work together, we rely on each other for various tasks, reports, deliverables, pieces of information or other workplace artifacts. Lowercase c commitments are subject to the usual unpredictable events and stresses of the work environment: Of course we intend to meet these commitments, and in most cases we do. But it’s not the end of the world if an unanticipated complication or higher priority interrupt inhibits our ability to meet that commitment (although we should always communicate a change in status if there’s the expectation of a deliverable).

“Capital C” Commitments are different. When such Commitments are made, it must also be made clear (to both parties) that this is critically important. There is a legitimate and urgent need to the business for such Commitments. When unanticipated complications arise or we get interrupted, we are expected to overcome the impact of such inevitable instances of Murphy’s Law. Failure is not an option. We make personal sacrifices, we call upon additional resources if possible (including our own reserves), we work nights and weekends, we do whatever it takes to meet such Commitments. We are measured by our ability to meet and honor such Commitments.

There’s another category of “Capital C” Commitment, and that is in honoring our understandings. In this case, the Commitment is not related to an event or tangible deliverable, but to how we work together, or how we treat each other. For example, I may Commit to directors or managers who report to me that I will not “go around” them in communicating with their staff except under urgent circumstances when they are not available, and if I do I will always give them a heads up. Many such examples could be made; those Commitments need to be taken seriously.

Competition: It’s a tough world, and we have competition: competition for customers; competition for limited venture capital; competition for the best and brightest people; and competition internally for budget dollars, support resources, and so forth. We don’t often deal with competition directly, but it’s a motivating force for much of what we do. Competition is behind why we have a limited budget, why we have a tough deadline, why we add a customer-requested feature late in the release cycle, and why we work so hard.

Competition presents a unique challenge to the development organization: achieving defensible product differentiation. What gets much of our attention in building products and services is the set of prioritized requirements -- referred to as the product backlog in many agile methodologies -- that represents the combined wisdom of customers, sales & marketing, industry analysts and thought leaders, and relevant standards. But great products typically boast of innovative concepts, breakthrough technology, or some form of engineering-conceived special sauce. Engineering needs to fully understand the competitive landscape from a product technology point of view, and should consistently challenge itself to innovate in a way that achieves a measurable and relevant benefit to the customer. That’s a tough standard, but it’s what we like about competition – it brings out the best in us.

Customers: Peter Drucker stated that the purpose of business is to create customers. I had to think about that for a while, but it’s a perfect way of expressing a company’s fundamental objective. And although we expect to do so with great products, positioning, sales, support, operations and financing, there’s one other thing that’s almost always required: good customer references. But why aim so low? The objective, in my opinion, should always be to have great customer references – outstanding references, rave references -- from customers whose trust and respect you’ve earned to the degree that they will go out of their way to tell others about it, emphatically and with conviction. The kind of reference that if a prospect hears, they will almost certainly become a customer. But such a reference can't be asked for, it must be earned.

Ironically, the customers from whom I've received the highest praise in the past were those where something went wrong, sometimes really, embarrassingly wrong. But that’s not what they seemed to remember: What they wanted to talk about – for years, in some cases – was the amount and level of support the customer got when things did go wrong. What we heard from these customers was that they expect things to go wrong from time to time, sometimes seriously so. What they don’t expect is the vendor to pull out all the stops in order to fix whatever’s wrong – in some cases, whether it was our product or not.

Communication; Commitment; Competition; and Customers. If it sounds like a formula, it’s not. People, especially technical professionals, are complex, multi-dimensional and sometimes capable of extraordinary feats. Software development is a group endeavor and therefore a social process. Smart people thrive in an environment where they are surrounded by other smart people with a common goal, where their contributions are honored, where they’re treated with respect, and where they can have fun while they’re accomplishing great things.

Process

Processes should be defined that expedite team efforts rather than getting in the way. There are many aspects to this, but here I focus on agile software development methodologies.

Agile is the term used to refer to a class of development methodologies that are incremental and iterative. Scrum is by far the most popular agile method; others include XP (eXtreme Programming); Unified Process (UP, Agile UP, or Rational UP); Evo; Feature-Driven Development (FDD); Test-Driven Development (TDD); Dynamic Systems Development Method (DSDM); Crystal; and others. There is no “best” approach other than what works for a given team; in many cases, a hybrid is used (for example, Scrum + XP is widely used).

I have used Scrum in multiple settings, and have been impressed by it’s positive impact – especially where requirements are uncertain. For larger-scale developments I would be tempted to blend in aspects of Agile UP since it incorporates more SDLC artifacts into the process. In either case I think it’s important to incorporate test-driven development (TDD) concepts into the agile process.

Why do organizations adopt agile development methods? Generally, it’s for improved visibility to the project status from stakeholder, better adaptability in a dynamic environment, greater business value sooner in the development process, and overall risk reduction according to VersionOne as shown in the chart below:

While these are great reasons, I would say the biggest benefit is that engineers love it compared to traditional approaches, primarily because they are in much greater control over what they do and how they do it. And that’s a Good Thing: There’s no question in my mind that the best outcomes result from a team of well-motivated engineers who feel in control over their environment. But another key reason engineers love agile is it enables them to be heroes in customers eyes, since customer prize responsiveness more than anything. And agile allows development teams to be much more responsive to customer needs.

With Scrum, developers are fundamentally empowered to do their jobs in a team that collaborates together and makes its own technical decisions. According to Ken Schwaber, one of the Scrum founders and author of Agile Software Development With Scrum in 2002, Scrum teams are “self-managing, self-organizing, and cross-functional” and therefore control their own destiny.

But isn’t that like anarchy? Not at all. The Team (Capital T for a Scrum Team) is self-directed, but works from a prioritized Product Backlog that represents the interests of all stakeholders, including executive management. Scrum calls for three roles that work collaboratively to ensure alignment with business objectives and transparency to all stakeholders: Product Owner; ScrumMaster; and Team (typically 6-9 people, but can vary). The Product Owner represents customers’ requirements as well as requirements of other business stakeholders, and determines what are the highest priority features for each sprint (called the Sprint Backlog). On any given sprint it’s the Product Owner’s responsibility to ensure that the team is working on the highest priority items at that time. This is sometimes referred to as “just in time” requirements. And because requirements are dealt with on a just-in-time basis, feature creep and requirements uncertainty are minimized.

The agile project manager for Scrum is referred to as the ScrumMaster. He or she is responsible for ensuring that the Scrum process is followed, and for removing roadblocks as identified by Scrum Team members. Because the nature of agile project management is one of facilitation rather than top-down control, the ideal characteristics of an agile project management role such as ScrumMaster are those of a servant leader. Servant leadership, described as an executive model for large companies in the latter decades of the previous century, has now emerged as the preferred management style for agile software development.

The Scrum process involves one or more time-boxed sprints to a release (typical sprints are 4 weeks or 30 days). Ideally, each sprint should result in potentially releasable code – functionally complete (for the backlog items included in the sprint), refactored if necessary, tested, and documented. Within each sprint, there are daily scrum meetings (sometimes referred to as “stand-ups” – meetings so short and focused that they can be done standing up). At the end of a sprint, a review meeting takes place which usually includes a demo to the Product Owner and any interested stakeholders: One of the tenets of the Agile Manifesto is “working software is the primary measure of progress”.

The following diagram from TargetProcess shows the overall flow:

A variety of Scrum-specific project management tools exist, including from companies such as VersionOne, Atlassian, and Agilebuddy. These tools generally include dashboards for “at a glance” project status as well as burndown charts, defect tracking, velocity trends, analytics and a variety of reports.

Scrum – or for that matter, any agile methodology – can’t be plugged in to an organizational environment as a cookbook formula. The amount of what Craig Larman defines as "ceremony" in Agile and Iterative Development -- what we typically think of as structure (development phases), deliverables (artifacts of the development process), and process (workflow and authorizations) -- are unique to each development, based on dozens of factors. This requires wisdom and judgment, and, like the software being developed, should evolve incrementally and iteratively in every organization.

Summary

There is no single “best” management style; what works for any executive is a function of his or her values, character, and inner condition. In this post I have outlined some of the practices and leadership themes that have worked for me in over 20 years of executive experience.

Tuesday, July 6, 2010

Bartók

Up until a year or so ago, I didn't know much about Béla Bartók other than what I had learned from doing crossword puzzles. Apparently the letters B-E-L-A are in pretty big demand. For example: Elba (clue "ere I saw..."). Or: Able (clue "...was I"). I had seen crossword puzzle clues such as "Hungarian Bartók", or "20th century composer Bartók" so I knew that there was a Bela (actually, Béla) Bartók who was a Hungarian composer in the 1900s.

I noticed about a year ago that I enjoyed music with "Hungarian" in the title by Haydn, Schubert, Brahms and others. But I wasn't sure what it meant, and I got curious. If I were to listen to Schubert's "German Dances" (D.820) followed by his "Hungarian Melody" (D.817), what precisely would I hear in the latter that made it Hungarian?

Questions such as that will ultimately lead one to Béla Bartók. Born in 1881, Bartók gained early fame as a virtuoso concert pianist in the center of western music -- Vienna. Home to Haydn, Mozart, Beethoven, Schubert, Brahms and Strauss, Vienna was the top destination for serious musicians. Bartók was awarded a scholarship in Vienna by the Emperor, but he shocked the music establishment by choosing instead to attend the Academy of Music in Budapest.

Bartók's intense interest in authentic Hungarian folk music is what kept him in Hungary. Working closely with his life-long friend and collaborator Zoltán Kodály, Bartók sought to establish a truly Hungarian national style. In order to do so, they decided to collect, catalog, and analyze authentic Hungarian folk music. How did they know it was authentic? They went out into rural Hungary, into the small towns and villages, and asked people to play for them for their recording device. (Bartók was a quiet reserved man, of an urban bent, usually fastidiously dressed. That he went out into these remote towns, and "let his hair down" in order to earn the trust of the villagers is testament to his strong interest.)

Bartók held this interest in folk music his entire life, which expanded from its initial focus on Hungarian peasant music to include Romanian, Slovak, Lithuanian, Polish, Russian, Turkish and even Arabic. One of Kodály and Bartók's initial findings: The so-called Hungarian music of earlier composers, referred to as the verbunkos style, was actually Gypsy music. Authentic folk music was far older, and was often played on native instruments that Bartók had never seen before. He found a similarity to ancient Greek music in these folk songs, to some extent because they are largely based on the pentatonic scale.

This was no passing interest. Bartók collected over 13,000 Hungarian folk songs in his lifetime; including the other ethnic strains, he (and Kodály) amassed over 20,000 folk songs. In doing so, what had previously been a strong nationalistic interest turned into a passion for music of the people, music that might bring nations together rather than drive them apart. His later compositional style was referred to as "Synthesis of East and West".

Bartók and Kodály were among most significant early figures in the field of ethnomusicology, the study of social and cultural aspects of music and dance in local and global contexts. Russian composers, led by Rimsky-Korsakov, were also trying to define and promote a true understanding of their native music at about the same time. One can hear the deep echoes of native lands in their music.

Here is what Bartók had to say about how he incorporated folk and peasant music into his compositions:

The question is, what are the ways in which peasant music is taken over and becomes transmuted into modern music? We may, for instance, take over a peasant melody unchanged or only slightly varied, write an accompaniment to it and possibly some opening and concluding phrases. This kind of work would show a certain analogy with Bach’s treatment of chorales. Another method is the following: the composer does not make use of a real peasant melody but invents his own imitation of such melodies. There is no true difference between this method and the one described above. There is yet a third way... Neither peasant melodies nor imitations of peasant melodies can be found in his music, but it is pervaded by the atmosphere of peasant music. In this case we may say, he has completely absorbed the idiom of peasant music which has become his musical mother tongue.

Bartók's post-romantic music doesn't appeal to everyone, especially to those not fortunate enough to have Hungarian blood flowing through their veins. But his unique blend of native folk music, rooted to the ancient lands, combined with the modern sound of the 20th century that was to bring unprecedented horror and dislocation, strikes a deep chord in our modern sensibilities.

Tuesday, June 22, 2010

Top Threats to Cloud Computing

The CSA has recently issued a report called Top Threats to Cloud Computing in which they identify and discuss seven general threat areas:

Abuse and Nefarious Use of Cloud Computing
Insecure Application Programming Interfaces
Malicious Insiders
Shared Technology Vulnerabilities
Data Loss/Leakage
Account, Service and Traffic Hijacking
Unknown Risk Profile

No priority is implied in the ordering of the top threats; the advisory committee felt that further research and greater industry participation would be required to rank the threats. My view is that ranking is less important than applying a risk management discipline to the specific requirements of an organization considering cloud services.

As we consider the seven threats individually, we should keep in mind that the CSA considers this document as a first deliverable that will be updated regularly to reflect expert consensus on probable threats to cloud services:

Abuse and Nefarious Use of Cloud Computing
Because the Cloud Service Providers (CSPs) business model is based on rapid scalability, they have emphasized ease of adoption. Therefore, in most cases anyone with a valid credit card can register for and begin using cloud services in a matter of minutes. In other words, an attacker can materialize inside your CSP's infrastructure at any time, including on the same physical hardware your cloud-based application is running on, and you need to be prepared. The best policy is one of calculated paranoia: Assume your virtual environment includes all of your competitors as well as hackers, botnets, malicious users, clueless resource hogs, and other "nefarious users." Although as a user of cloud services you need to employ a layered defense strategy to protect critical resources, you also need to rely on your CSP's onboarding and technical surveillance processes: How effective is your CSP's registration and validation process for screening new users, and how well does your CSP's monitoring of internal traffic work?

Insecure Application Programming Interfaces
The same investor and market pressures that motivate CSPs to streamline the onboarding process also apply to how they support the configuration and use of their services by large numbers of users. The more these services can be enabled in a frictionless manner, the more profitable the CSP will be. Therefore, it's worth focusing on the APIs provided by CSPs for manging, deploying and maintaining cloud services. As the report points out, the "security and availability of general cloud services is dependent on the security of these basic APIs." Furthermore,

"From authentication and access control to encryption and activity monitoring, these interfaces must be designed to protect against both accidental and malicious attempts to circumvent policy."

One key question to ask: Does the CSP require use of X.509 certificates to access APIs? Besides being used to support the TLS protocol and WS-Security extensions to SOAP, X.509 certificates are used for code signing -- critical for secure use of APIs.

It's essential that users understand the security model of the CSP's APIs, especially to ensure that strong authentication and access controls are implemented.

Malicious Insiders
The threat from malicious CSP insiders is a threat that organizations have always had, except the threat was (and still is!) from someone they know rather than someone they don't know. An organization should compare its own policy with regard to insiders with that of the CSP, ensuring that controls such as the following exist:

State of the art intrusion detection systems
Background check on new hires (where permitted by law)
Authorized staff must pass two-factor authentication
Immediate deprovisioning of admin when no longer has business need
Extensive background check of staff with potential access to customer data
All admin access logged and audited, with suspicious actions raising a real-time alarm

Organizations should require transparency of CSP security and HR practices as well as all compliance reporting, and should refer to controls such as listed above as part of any legal agreement with the CSP.

Shared Technology Vulnerabilities
The foundation of the cloud service provider's business model is sharing of computing resources: CPU; memory; persistent storage; caches; and so forth. This sharing results in a multi-tenant environment, where great trust is placed in all virtualization technologies -- especially hypervisors that enable sharing of server hardware. Hypervisors must effectively isolate multiple guest operating systems while ensuring security and fairness. The CSA paper lists five remediation tactics for shared technology vulnerabilities, but the fact they they're generic recommendations (implement security best practices..., etc) serves to reinforce the point that at the end of the day, we need to be able to rely on the assumption that the CSP employs a secure hypervisor.

One potentially useful resource is a recently-released vSphere Security Hardening Guide from VMware. Overall, the guide contains more than 100 guidelines in a standardized format, with formally defined sections, templates, and reference codes that are in alignment with formats used by NIST, CIS, and others. The guide itself is split into the following major sections:

Introduction
Virtual Machines
Host
vNetwork
vCenter
Console OS (for ESX)

While the document is mostly applicable to CSPs using VMWare, many of the guidelines are generic and might apply to other hypervisors. In evaluating CSPs and shared technology vulnerabilities, it would be worthwhile having the CSP respond with how they've incorporated applicable recommendations from the hardening guide into their environment.

Data Loss/Leakage
The concept of defense in depth, or a layered security strategy, comes into play when we consider the threat of data loss or leakage. All of the above threat vectors can result in data loss or leakage. Data encryption, then, becomes the last line of defense against the data loss threat.

While encryption is easy enough conceptually, in practice it's a challenge -- especially in a multi-tenant environment. The authors of Cloud Security and Privacy dedicated an entire chapter to Data Security and Storage (as I previously discussed here). In particular, the authors warn of CSPs that use a single key to encrypt all customer data, rather than a separate key for each account (see pg. 69). Best practices for key management are provided in NIST's 800-57 "Recommendation for Key Management"; your CSP should comply or have an equivalent guideline that they use.

Of course you should know whether your CSP uses standard encryption algorithms, what they key length is, and whether the protocols employed ensure data integrity as well as data confidentiality. And since encrypted data at rest can't be operated on without being unencrypted you'll want to know whether memory, caches and temporary storage that have held unencrypted data are wiped afterward. The same set of questions (and answers) apply to the issue of data migration and to processes by which failed or obsolete storage devices are decommissioned.

Many regulatory frameworks focus on protecting against data loss and leakage. If you need to comply with PCI DSS or any other set of financial controls you will need to ensure adequate threat protection that includes encryption of data at rest.

Account, Service and Traffic Hijacking
In the online payment space there's a segment called "card not present" (CNP). That's analogous to cloud computing, where service is provided to a "user not present". All of the threats in an enterprise environment -- including phishing, fraud, shared or stolen credentials and weak authentication methods -- become magnified in the cloud. Remediation suggestions are fairly obvious: prohibit sharing of credentials; leverage strong two-factor authorization where possible; employ proactive monitoring to detect unauthorized activity; and understand CSP security policies and SLAs. I would add to CSA's recommendations that organizations should routinely check for excessive access rights to ensure there are no unused (and unmonitored) accounts that would be vulnerable to highjacking.

Unknown Risk Profile
CSPs, hypervisor vendors, other cloud technology providers, application developers, security experts, and customers are all pushing the envelope when it comes to cloud services. The compelling economics of cloud services are driving adoption rates higher than is typical for new technologies. All together, this adds an element of technical uncertainty to the question of what are the top threats to cloud computing.

In general, a strategy of pragmatic paranoia is recommended. Be on the alert for the unexpected. Review logs, set up monitoring and alerting systems where practical, and re-evaluate the security implications of your cloud service periodically. Most importantly, select a CSP you can trust and back it up with a strong agreement specifying all areas of concern and including SLAs -- with penalties for non-compliance.

Monday, June 14, 2010

The Growth of Web Services

eBay published their first web API in 2000. It took another 8 years to get to 1,000 APIs on the web; it only took 18 months to get to the next 1,000 APIs.

ProgrammableWeb was founded in 2005, when they tallied 105 APIs. The current count is 2,016 and the rate of new APIs is doubling year over year.

What segments account for these APIs? Social networking sites are high on the list, followed by mapping, financial, reference and shopping. The single most popular API is Google Maps, used in 1,978 mashups.

Even more dramatic are the stats for how often APIs are called. Here's the Internet's new billionaire club:

74% of the APIs are REST and 15% are SOAP; the remainder includes JavaScript, XML-RPC and AtomPub. Over the past two years the use of REST APIs has increased as an overall percentage of net APIs, mostly at the expense of SOAP. Another trend is the increasing use of JSON; 45% of all new APIs support JSON. And on the authentication front, OAuth continues to pick up steam as over 80 APIs now have OAuth support.

The web is evolving from providing access to information, to providing access to services, to providing access to complex services -- also known as mashups. The popular and somewhat trivial example is the number of sites that call Google Maps API to show a map to their location. Links to Flickr, YouTube and Twitter are also popular. But what is the real business potential of these complex services?

APIs enable further leverage in systems development; that's why we can think of the web as a platform. Object-oriented software development is giving way to service oriented architecture (SOA) , which allows interfaces to be specified and their web services to be made available to any system with web access. This allows development organizations to focus on their core competencies, and leverage web services for the rest.

An example of how this is playing out is in monetizing the web. A new generation of web-based services has emerged, and many of these services are based on subscription revenue models rather than single transactions (aka shopping carts). Subscription billing is hard: while it's tempting for the many new developments in digital publishing, gaming, telecommunications, health care, consumer electronics and renewable energy to include a do-it-yourself billing system, there's no need to. Companies such as Zuora, Vindicia and Aria Systems provide sophisticated billing systems through APIs, providing advanced functionality such as currency conversation, tax calculation, invoicing, fraud control, collections, reporting and analytics for a fraction of the time and expense that it would take to self-develop such capabilities. As we evolve towards a subscription economy with a variety of payment models, APIs providing web billing services will be leveraged to ensure secure, reliable billing.

Tuesday, June 8, 2010

User Activity Monitoring

Gartner recommends that organizations implement user activity monitoring as part of a strategy to manage external and internal threats, and for regulatory compliance. Gartner suggests integrating Identity and Access Management (IAM) capabilities with a SIEM system to achieve user activity monitoring, but other approaches work as well if not better as I explain below.

Why is user activity monitoring needed? Since all major regulatory frameworks -- including SOX, PCI DSS, GLBA, and HIPAA -- require least privilege access controls, thousands of companies are obligated to prevent excessive access rights and yet, according to Deloitte, have failed to adequately do so. The reason this is a hard problem has to do with the dynamic nature of the enterprise-especially in an economic downturn -- with layoffs, restructurings, aggressive use of contractors and other service providers, along with the need for federated identity and access management as enterprises collaborate.

Conventional wisdom holds that the best practice for resolving this issue is to adopt an IAM system with role-based access control (RBAC) capabilities. Unfortunately, such systems provide no user activity monitoring or other assessment mechanisms and as a result are notoriously ineffective. While these systems ensure that only authorized users may log in to critical resources, they fail to consistently determine which users should be authorized to access those resources. As a result, as reported by a Dartmouth field study and by IDC, over-entitlement is the norm. In many organizations over 50% of access rights are dormant, representing a huge security vulnerability as well as a significant compliance exposure.

This is where user activity monitoring comes in. Organizations can assess user privileges, or entitlements, through user activity monitoring in order to identify excess entitlements. That few organizations do so is indicated by the high rate of audit findings for such access controls. Two additional methods of implementing user activity monitoring, besides the SIEM+IAM integration suggested by Gartner, are network-based activity monitoring and log-based activity monitoring.

Many organizations collect NetFlow data for IP traffic analysis reasons, and analyze this data for user activity monitoring. While NetFlow shows source and destination IP address and port number, it doesn't show authenticated user names nor application names (applications can in many cases be deduced with destination IP address and port number, but it's practically impossible to link source IP address to user names). NetFlow is therefore inadequate in most cases for tracking user access to audited applications.

Some organizations have adopted a network-based user activity monitoring system which goes beyond NetFlow to record, not just source and destination IP addresses, but authenticated user names and which application was accessed. While far superior to a NetFlow-only approach, network based activity monitoring has several challenges:

Span port scarcity - span ports are used for a variety of applications, and without a network monitoring system such as one from Gigamon span port availability could be a constraint;
Span port data loss - most switches are vulnerable to packet loss on their span ports during peak traffic bursts. Even a data loss rate of under 1% can render such a solution inadequate for forensic purposes;
Application-side scalability - network activity monitoring requires a probe on every ingress span into the application infrastructure;
User-side scalability - a probe must be placed in every subnet with its own AD or other authorization system, which can make for a very expensive deployment in a distributed environment or one with many remote offices;
Encryption - as the percentage of encrypted sessions inside the data center increases, it leaves a larger blind spot for network-based approaches;
Technical challenges with today's DPI silicon in monitoring 10G links - the latest generation network processor with DPI (deep packet inspection ) capabilities can monitor 4-5 Gbps, far short of the 20 Gbps required for full-duplex traffic monitoring of a 10G link; and
No visibility to access from behind the monitored span port - network activity monitoring is blind to local access, e.g. from the application server's console port. It also can't see application-to-application access.

Despite these challenges, enterprises are deploying network-based access activity monitoring system because they otherwise do not have effective solutions for preventing excessive access rights.

An alternate approach to network-based access activity monitoring is log-based user activity monitoring, also known as Identity and Access Assessment (IdAA), which does not suffer from the limitations and constraints listed above. Cloud Compliance, my prior company, read log files for audited applications in order to prevent excessive access rights and other access audit violations. The log-based approach precludes the need for hardware to be deployed, is scalable, detects 100% of access activity (regardless of encryption, 10G links, and source of access) and, when deployed as a SaaS solution, eliminates the need for installation, software maintenance, and a large upfront capital outlay.

Tuesday, June 1, 2010

Visualizing Security Metrics

This is the third and final post discussing Security Metrics: Replacing Fear, Uncertainty and Doubt by Andrew Jaquith. As I noted, Jaquith makes some intriguing and vital points about the need for "good" metrics and "serious analytic scrutiny" to inform executive decision-making on issues of security, compliance, and risk governance. This is an especially important topic today, with organizations everywhere trying to figure out how to stay secure and improve compliance while cutting their expense budget.

Most organizations, when considering appropriate investment levels to deal with risk, are not lacking for data. But lots of data does not equate to relevant information required for sound decision-making. Jaquith's point is that information in the form of metrics -- good metrics, which he defines -- is lacking in many enterprises.

But once good metrics have been defined, how are they communicated to stakeholders? Jaquith dedicates an entire chapter to visualization. He starts by listing his six design principles for visualization of metrics:

It is about the data, not the design (resist urges to "dress up" the data)

Just say no to three-dimensional graphics and cutesy chart junk (it obscures your data)

Don't go off to meet the wizard (or talking paperclips)

Erase, erase, erase (removing tick marks and grid lines results in a crisp chart with few distracting lines)

Reconsider Technicolor (default colors are far too saturated, and should be muted. Consider a monochromatic palette)

Label honestly and without contortions (pick a meaningful title, label units of measure, don't abbreviate to the point where the meaning is not clear)

Like me, Jaquith is an admirer of Edward Tufte, author of several books about information visualization including the classic The Visual Display of Quantitative Information (1983, Cheshire, CT: Graphics Press). According to Tufte, a key to effective visual displays is understanding the goal of your presentation. In Tufte's own words:

At the heart of quantitative reasoning is a single question: Compared to what? Small multiple designs, multivariate and data bountiful, answer directly by visually enforcing comparisons of changes, of the differences among objects, of the scope of alternatives. For a wide range of problems in data presentation, small multiples are the best design solution.

Hence, we have small multiples as a visualization strategy. Here's an example:

From this display, one can look at different categories (in this case, departments) to view comparative performance over time. Once can readily imagine security/compliance applications for this approach, such as dormant accounts by resource, or excessive access rights by department.

In his book Beautiful Evidence (2006, Cheshire, CT: Graphics Press) Tufte introduces a refinement to this concept called the sparkline, which he defines as "small, intense, simple datawords". The example Tufte uses to explain the sparkline concept is a patient's medical data, taken from Beautiful Evidence:

Besides Tufte's small multiples and sparklines, Jaquith's visualization suggestions include indexed and quartile time series charts, bivariate charts, period-share charts, treemaps, and Pareto charts. The key point is that there's not a single graphic approach that works in all cases; one needs to determine the essence of what is being conveyed. The audience almost always consists of busy people, often executives, who need to have information presented clearly and in context. It doesn't do anyone any good to be able to point out after a security event that the "smoking gun" data had been seen, but it was either lost in the noise of too much data, or its significance was not clear.

P.S. It's not necessarily relevant to this post, but my favorite graphical display of quantitative information is an advertisement for one of Tufte's books that regularly appears in Scientific American and The Economist:

Bleeding Edge