rama

May 072012
 

“Until you can measure something and express it in numbers, you have only the beginning of understanding.” – Lord Kelvin.

There are lot of things we want to measure, even if they can’t be objectively measured. For example, we measure customers satisfaction – there is no scientific units to measure. We measure popularity with proxy metrics like number of Google hits. Measuring anything objectively is itself a challenge.

My first company, Savera Systems Incorporated, had stellar advisors. One of them, Jeff Ullman told us the following: “Whatever you measure, you improve that metric”. I observed that to be correct. It is our tendency to improve whatever we measure. But, it is not the complete story.

image

Let us say your CEO laid out the vision for the company:  “becoming a leader in the online sales of  hair tonic”. The goal could be “increase the sales by 100% in North America”. Now that we measure the sales, do they automatically increase? Unfortunately no.

There were a series of tests conducted by education psychologists in 2005.  In some cases, they paid students money for getting good grades. In some cases, instead of grades, they measured the numbers of days absent, number of assignments turned in, number of books read etc. What they realized was that there was no effect on the grades when they paid for the grades. But, there was positive correlation when they paid for attending classes and reading books.

image

When we measure people on what they control , that has a positive impact. When we measure on what they can’t control, there was no impact. Attending classes is something that the students controlled; getting grades was not something they controlled. So, we convert what we want to measure to proxy metrics.

So, let us say that we measure what they can control using these proxy metrics. Surely that helps our cause, right?

In the last few years, we realized that measuring the company performance without regarding the constraints (legal and ethical) had lead to long term damage to the companies and the world. Suppose we pay students for grades, what prevents them from cheating? Unless we impose the constraints, the metrics will become meaningless.

image

Let us say that we impose the proper constraints, will it be sufficient? Again, no. The problem is that sometimes by the time we got the measurements, it is already late. For example, let us say that we are measuring the sales performance to increase the revenue. By the time we measure the performance, we lose the ability to control the outcome. There is no chance for course correction.

image

So, what we do is to measure the leading indicators. These indicators portend what is going to come. For example, building permits for new construction is a leading indicator for the economy. In contrast, average prime rate is a lagging indicator. The indicators that move with the main metric are called coincident indicator, which we don’t have to bother about now.

The issue with leading indicator is that there is it is not always a good indicator. They are often not well-defined; their correlation with the main indicator is not well-understood; there is a chance that the correlation could vary. Still, with all uncertainty, leading indicators offer better chance of reaching our goals than coincidental indicators.

We are not done yet. Suppose we are measuring the school performance by attendance. The principal incentivizes the students for attending school. Then, even the people who are not interested in school attend it. In fact, it may be possible that they disrupt the school so much that the school performance may go down.

Or, consider the case of paying for the number of bugs fixed. This payment may lead to perverse incentive for introducing trivial bugs or even breaking down a large bug as several sub-bugs. In fact, it is fairly common in medical industry to break down a single problem as a series of several ailments, each of which is separately treated and billed.

image

That leads us to selection of several metrics that covers the desired outcome. These metrics collectively support the desired outcome without leading to unintended consequences. Since there can be many such metrics, we can stipulate the following rules:

  1. The metrics should be independent: Think of as vectors which are orthogonal. Otherwise, these metrics end up duplicating the effect. This is easy enough to validate, either empirically or even sometimes through modeling the problem.
  2. The metrics should be few in number: If we measure too many, the complexity of measuring overwhelms the people to understand what is being measured. This is often the cause for people not showing enthusiasm for metrics.
  3. The metrics should cover the original desired outcome: How can we be sure that we got all the metrics covered? That is a difficult problem to solve. It is more of a craft than science. I suppose we can choose large enough number of metrics, but that messes up the earlier rule of having few metrics.
Note for enterprise architects

When enterprise architects plan the IT activities for an organization, they create a strategy, a plan to realize the strategy, and a program to execute the plan. Unless they create the metrics at every stage they will not have traceability for the entire program.

image

There are several artifacts that we use in this process:

  1. A model to translate the metrics from one stage to another.
  2. A correlation mechanism to establish the relationship from one stage metrics to another.

Without going into full details, here are some tips for this methodology:

  1. Goals to KRI (Key Result Indicator)’s: While the organization goals can be nebulous, KRI’s have to be precise. They need to establish what we are measuring, how we are measuring, and who the people are that are responsible for those KRI’s are.
  2. KRI’s to program metrics: While KRI’s are precise to measure, they tend to be lagging or coincidental indicators. What we need are leading indicators. At a program level, we can establish leading indicators, simply by identifying the indicators that we can measure while the program is in progress. For instance, when we construct a program, we typically break down the program into multiple, simultaneous projects. By establishing the metrics that can be measured throughout the life cycle, we are creating the simplest leading indicators.
  3. Establishing code level metrics: Eventually, we should focus on automated creation of metrics. By incorporating into code, we ease the process. For instance, if we are trying to improve customer satisfaction, we might measure the number of interactions that a customer had to have to resolve the issue. Or, the duration of the open issue. Or, number of exceptions that service reps had to take. Of course, the earlier part of the essay described the process of choosing the metrics: for completeness, independence, and with constraints.
  4. Establishing the feedback loop: As long as a we have a model that correlates the events at each stage, we can keep refining based on the information from the ground. The idea is that we end up with manageable number of metrics that can predict the final outcome reliably.

I have not found a good book on this subject that provides a good mixture of management processes, Enterprise architecture processes, mathematical models, and behavioral psychology. I wrote this piece entirely from my experience. If anybody knows some good resources on this topic, please inform me via the comments.

 Posted by at 9:59 am
May 022012
 

Long time ago, I was blissfully unaware of IT technology. I designed programming languages, developed compilers, architected new database technologies, and dabbled in operating systems. Life revolved around all the things I learnt in school and in the labs: computers, algorithms, applied science.

Somehow, after I ended on this side of the fence, all that changed. Usually, it is implementing a packaged application, rolling out an ERP, supporting large number of IT applications. It is challenging – it brings in knowledge of finance, business, planning, organizational behaviors and whole lot of other things to make IT work well. Even though I never signed up for doing an “IT” job, I drifted into it and found myself enjoying connecting what I learnt and what I see.

Recently, I have been seeing a great deal of changes in IT. It is very similar to the changes in IT from late 90’s when the internet hit the scene. Yet, it is much more – this time, there is a fundamental shift in the way IT is being done. Perhaps, these words are too easily thrown around. Let me make my case.

Traditional IT technologies

First let us take a look at the traditional IT technology:

image

This is what IT has been doing in the enterprises. Once some need in the business is well-established, IT builds an application (preferably customizes a packaged application or builds on top of some vendor technology product) and operates in their data center using their standard rollout and support processes.

Influences on IT technologies

Think about the history of modern organizations. Just when organizations couldn’t grow profitably, computers made the growth possible with information technology. The constraining factor in growth was information management (gathering, organizing, retrieving, and analyzing). Once tamed, it allowed the organizations to grow and take advantages of their size.

IT, of course, helped that growth. However, now, it is holding back the enterprise. CIO’s have very short tenures. They preside over moribund enterprises. The users see that their IT is offering poor services at higher price, with the illusion of security and control. [Think gmail and your organization mail. Who offers more storage? Who offers better searches, better backup, and better access?]

There is a specific way of understanding the impact on IT technology from three directions (the idea for this classification came from Betsy Burton – a shout out for her classification):

image

Each of these technologies (from business, consumer and operations) impact the IT technologies in a specific way (and they are influencing one another, and in particular, the business technology as well).

Business Technology

When wall-street journal starts writing about technology for business people, you know that there is something up. These days, you can’t turn a page without seeing something about big data and how business is trying to make sense out of this data. Traditionally, IT did not deal with this problem. This esoteric field of decision support systems is managed by a close group of business and experts.

Normally, these class of applications undermine an IT department. If a marketing program needs to be run, if sales data needs to be analyzed, if a partner needs to be engaged, business reaches out to its own resources, instead of IT department. There are specific reasons why:

image

There are several kinds of applications that business uses that IT has reclaimed over time. For some applications, IT does not have a good hold yet. Here are a sample applications:

image

A trend from the last few years has been “Business aligned IT”, where these kind of business applications have been a large focus. Already BPM, CRM, E-commerce applications have become an integral part of the business systems. Some of the new kind of applications such as big data, advanced analytics are only being talked about. The marketing applications are still too much in a flux to become a part of IT yet.

Consumer Technology

There are several times consumer technology made a big impact on the IT. Remember PC? Internet? All of those started as consumer technologies. Of course, It was reluctant to embrace those technologies at first. But as the IT users were using those tools at home, IT itself had to offer them as a part of their portfolio.

The current consumer technologies that are impacting the IT are many:

image

These are not just the application categories, but aspects of applications. Take instagram, for instance: it has mobile computing; it uses social networking; stores the images in the cloud; it is a customer centric application. Or, take Quora: it is a social networking application; it is for knowledge management; it is collaborative.

As mentioned earlier, while the main purpose for IT was information management, over the time, the bottleneck ended up being the way people work together as a team. While the computers are good at information processing, they are so good at capturing the information from people and letting them work as a team. Scaling of the human productivity was aided by different techniques overtime: using energy in industrial revolution (steam engine); using assembly line in manufacturing. The current techniques let the corporations harness the power of large crowds, which is exactly what consumer IT offers.

There is another impact (consumerisation of IT) – where the consumer devices end up being used in IT (personal computers, cell phones, and smart devices). That has profound impact on one of the core functions of IT (desktop support etc).

Operational Technology

I like to tell a tale how electricity distribution happened. At one time, can you believe that factories ran their own power sources? Once a reliable grid came up, it was far cheaper to produce and distribute energy. Perhaps our future generations will tell the tale same way: can you believe that companies ran their own datacenters?

Cloud and related technologies are impacting the IT in transformational way. There are several themes to this transformation:

image

One of the more exciting things to come out of operational technologies is “devops” which integrates the continuous development with operational view of the applications. Even if the companies are not moving to cloud, these best practices transform the IT.

Summary

IT is trying to address the needs of business by offering more and more applications built on business technology. To get better adaptability and to increase cooperation, they are adapting consumer technology. To reduce the costs, and focus on the core competency, they are looking to operational technology such as devops and cloud.

 Posted by at 8:40 pm
Apr 292012
 

People say that history is written by the victors. Actually, history is written by the scribes. I suppose they can portray themselves as the victors – which makes the sentence true, in a way.

What I am leading to is this: if you are the note taker in any meeting, you control the outcome. Nobody remembers what happened in the meeting after two months. If you are the person who took the notes, you have the ability to change the perception of the meeting – without falsifying the information. You can create the right perception of the outcomes, action items and so on.

My pet peeve in most meeting is that people do not take notes. They rely on faulty memories to recreate the moments later on. They miss out the essential points. They miss out the nuances. They only remember what they think happened. It is like an Agatha Christie novel – nobody will remember all the conversations.

Here is my prescription for those problems, especially in the meetings where the agenda is fluid. There are two problems in meetings, especially, exploratory meetings:

  • Mismatch in the understanding: The meeting may not offer us a chance to understand and restate the problems as we see them.
  • Lack of follow up: While we understand what that is said then and there, as soon as we walk out of the meeting, it is jumbled up. Even action items may not help, as the context is lost.

I find it useful to use note taking as a way to take care of these problems. Here are a few simple note taking best practices:

Always take notes on a computer

In this day and age, you don’t want to use a notebook and then transcribe. If you are anything like me, you will never end up writing and enhancing. In fact, the more you practice taking notes on the computer, the better you will become. Imagine the effect you can create by sending out the notes and your thoughts immediately after the meeting!

You can use any tool for taking notes, but I prefer freemind (or XMind) as it is free. It is a mind-mapping tool, that provides a hierarchical view of the information. It is especially useful, when the information is hierarchical instead of being linear. For instance, if you are listing out all the team members and their role and other details– this information is not linear (you don’t care which order you get to them), but hierarchical (you list out the name and under the name you list the role and other details).

Start with a simple template

As I said, I prefer the mind map tool for this job. Before the meeting, I jot down the various aspect of the meeting that I am supposed to gather information about. (If I am providing the information, that is a different kind of meeting).

Take this example: Suppose we are having a meeting about a project we want to start.  You are meeting the customer as a potential project manager.  A possible information template could be:

mindmap-sample

Of course, your headings could change. The beauty of such arrangement is that you can see the items of similar importance at the same font size and same distance from the center (Radial Hierarchy).

Keep the hierarchy in mind – use it to guide your questions

As the meeting progresses, you find yourself doing the following:

  1. As you get information about any topic, you will put it under that topic. Good – this is the way to go.
  2. You are getting details about a topic, but the context is not there:  What it means is that you are at level 1 and you are getting details about level 3. You will establish the context at level 2. Example: You may be given details about the project users, without categorizing the details. That means you will supply the categorization (like business users, consumers etc.).
  3. You are getting new top level topics: That may mean two things: you may not have thought about the meeting to give a good starting point. Or, the meeting is going off direction. In the first case, you add the topic to your mind map. In the second case, you nudge the meeting to the right topic: (“Before getting to those details, can you please tell me who are the people involved so far in the project?”).

If all things go well, you will get a rich description of the meeting. And, you will look like a genius for providing a structure and context to the information.

One advantage with mind-map tool is this: By looking at the picture, I can tell if we did a good job on information gathering or not. If we are too deep in one topic, the image shows the imbalance. If we did not cover a topic or excessively covered a level 2 topic, the picture clearly shows.

Publish it immediately

You think you will refine the mind-map, create a document and then publish it. Trust me, you won’t get around to it. I suggest you correct the typos, enter any contact information, clear out any questionable material, pay attention to the action items, and if needed, add your perspectives (make sure that is marked as your take on the meeting), and then publish it immediately. I publish my mind-map meeting minutes in less than 30 minutes after the meeting, in general.

Summary: Always take notes if you are participating in an exploratory meeting. Use a computer and a program to take the notes. Prepare an outline and use it to guide the meeting. Publish the finished outline.

Next steps

Here are several ways you can innovate on the basic theme:

  • You can create standard templates for different kinds of meetings. For instance, pre-sales, sales, new customer, existing customer, project status – meetings for all these different contexts can be driven effectively with customized templates for mindmap.
  • For different kinds of meetings (say you are attending a conference) you would use different tools. For instance, if it merely reporting need, I resort to Word or simple text editor.

Always try to see how you can capture in the computer itself, as it reduces the barriers to making the notes available digitally.

 Posted by at 9:22 am
Apr 232012
 

I am a developer with questionable credentials in aesthetics with one saving grace – I recognize when I see a good one in design. In developing web sites, I have explored a few options for web design and benefited from some of them. A few of my readers have asked me questions about these options and I decided to write a note as a response. I am also structuring the note as a lesson plan so that any newbie can benefit from it. (If I had time, I could have created a course at Udemy).

Objectives

I am assuming that these are the objectives that you have.

  • You want to create good looking websites.
  • You do not have access to designers from beginning. You may be able to rope in UX designers later in the game, but to start with you still want a good looking site.
  • You prefer tried and well understood interfaces instead of creating a unique look and feel.
  • You don’t mind learning a bit of new technologies to go with code-Fu.
Prerequisites
  • You should be able to develop websites – you should know basics of HTTP, HTML and so on.
  • You should know a programming language to program the web (I assume you are a J2EE developer or PHP/python/ruby developer).
Motivation

Let us say that you are a developer out of college. You went through your courses and graduated with a good bit of CS knowledge. You are working for a large company. You are developing a web application that is heavy on the backend with lot of potential to save the company some money through process optimization. This application is for internal employees.

After the first prototype, you are asked to show the prototype to the big boss. You sweat it out and create an end-to-end scenario and show it him. Unfortunately, the first thing that catches his attention is the color scheme or the misplaced logos. Then, the complex flows and intricate forms. From your point of view, you keep thinking “why is he focused on these? Doesn’t he know that we will get a UI designer and fix it before the release?”

At the same time, a college intern working for the summer creates a wordpress site. Naturally, it didn’t have the end-to-end scenario that you have in mind, but it has lot of other niceties that the big boss cares about: colors, nice flow, ability to create a social group, interactions and so on. The big boss asks you why you can’t put together a professional demo as good as the one by the intern? Can’t a team beat a single intern?

You look around and you realize that the web has amazing number of websites that are well-designed (at any rate, better than yours). Seemingly, they are done on shoe-string budgets, without using costly skills. Your manager looks at this market and sees that typically this process uses the following kind of people:

  • UX designers: These people design the pages in traditional media like PSD (Photoshop files).
  • HTML designers: They convert PSD to well-designed HTML

From there onwards, the developers can use the HTML templates and code to their specs. Turns out that both these two skills are commodity skills (You can get a a 10 page – consider each page to be a template – site done under $2000).

But then, the manager recognizes these issues:

  • He needs these skills through out the project. Outsourcing doesn’t work as well.
  • There is a considerable amount of interaction. Unless somebody in the core team doesn’t understand these skills, they end up paying for the mismatched execution.

So, realizing that he needs some of these skills in-house, the manager asked you to look into it. Your next task is to understand enough to create a good looking site and involve the designers when you need them, and get the most out of them. You also decided to use couple of junior developers in this process as well.

Where do you start?

Skills you need to learn

To develop good websites you need to know the basics of:

  • HTML5: This skill helps you to design clean HTML sites.
  • CSS: This skill lets you design good looking websites.
  • JavaScript (JS): This skill lets you program the web to be interactive.

Of course, knowing these will not help you to make good looking websites. You want to understand the reusable patterns, standard usages, and the conventions. Thankfully, several frameworks help you to learn those:

  • JQuery: It lets you control the webpage look, content from JS easily.
  • Sass + Compass + ZURB Foundation: The first two technologies help you to tame CSS – not that they are important, but if you want to use some advanced frameworks you might need them. ZURB helps you to create a good looking web page templates from some simple specs from you. Very popular with Rails crowd.
  • Less + JQuery + Twitter Bootstrap: Less is an alternative to Sass which lets you generate CSS from simple macros. Twitter Bootstrap is amazing template for developing websites.

At the end of all this, you will easily put together pages like:

image

and

image

without having to worry about a lot of details.

Learning the pre-requisites

Currently, the best source to learn (by practice) is code academy. If you don’t skip your lessons and practice all the projects, you will not only learn, but also retain the knowledge about JS, CSS, and HTML. Once you go past, here are some specific resources for each of the topics:

HTML5
  •  Paul Irish’s amazing HTML5 Boiler plate. Not that you want to start your web pages from there, but you will understand the anatomy of a HTML5 page.
  • http://www.htmlfivewow.com – be sure to use it from a modern browser like Chrome or FireFox 11 or IE 10.

Pointers about learning HTML5: Don’t spend too much time on it. You will learn as you practice, once you get the hang of basics.

Exercises

For practice, do the following:

  • Create a slide deck using the template provided in HTML5wow.
  • Take your company webpage and redo it in HTML5. You don’t have to adjust CSS or anything yet. Just, restructure it to use HTML5 boilerplate.
JavaScript

If you practiced using code academy, that is teaches you enough about the language basics: primitives, loops, functions, prototypes, objects and so on. Here are some additional resources:

Exercises

For practice, do the following:

  • Solve 8-queens problem in JS. Print the results to HTML screen. For additional marks, learn how to display it properly on a chess board.
  • Solve the same problem, using underscore.js as the utility library.

Learning the frameworks

JQuery:

The power of JS is that it can be used to manipulate DOM (a structured representation of HTML) in the browser. JQuery makes this manipulation simple and succinct. Here are the resources to learn JQuery:

Still, the best way to learn JQuery is to develop a site with it, which is what you will do next.

Exercises:
  • Take your corporate website (or any generic site, that is not interactive). Redo it in JQuery in the following way:
    • Assume that the users are logging in (so, you have their username).
    • Get their twitter feed via Ajax and display it in a different tab.
    • Implement the navigation, screen design and such in JQuery (and JQuery UI).
  • Do the 8-queens problem. This time, as the search is happening, animate the board slowly to show the inner workings trial and backtrack model.

Twitter Bootstrap

You can learn twitter bootstrap by playing with it. Still, the following are good resources:

If you went through the earlier lessons, you do not need to know much about bootstrap. In fact, you don’t even need to learn JQuery to start with bootstrap. So, if you want to start with bootstrap, you can do it along with JS learning.

Exercises:
  • Take your corporate website. Redo it with bootstrap.
  • Take plus.google.com – do it with bootstrap and see how it looks.
  • http://ajkochanowicz.github.com/Kickstrap/ – try this with the corporate site. In particular, use the icon fonts.

Note: Any webpage you developed using this framework renders well on mobile devices. It follows the principles of “responsive UI”.

Final test

If you made it so far, you know:

  • basics of HTML5 and CSS
  • Good knowledge of JS and JQuery.
  • Good website starter toolkit (including less, coffeescript(!) and so on).

To test yourself, consider developing one full website using the following technologies:

  • Use bootstrap for the page layout and the page design.
  • Use JQuery for the DOM manipulation
  • Use underscore.js for JS utility library
  • Use Require.js to manage the JS inclusion.
  • If you are serious about it, you can try to use backbone.js as the MVC framework on the browser.
  • You probably will need some sever side services to implement any decent website.

For problems, you can pick any of these:

  • A reporting system where people enter reports on their activities. Different people get different views. Different reports are generated for mailing periodically.
  • A chess game web site where two people play against each other. Make any assumptions that you need to, to make the system simple.
 Posted by at 4:34 pm
Apr 102012
 

Ok, I can’t tell you how to do that, but here is how Instagram did it. I am not going to talk about how they understood their customers and how they created something customers loved. That is all marketese – I can’t tell you how to replicate it.

What I can tell you is that they only have three engineers supporting all their webops to take care of billions of photos, terabytes of data, and millions of users. The numbers are mind-boggling. If these numbers are thrown at any CIO of an enterprise, they would come back with a budget for 100 people and 3 year plan to implement a program to manage the data.

Here are a few simple things they did right:

  • They only focused on essentials – they did not focus on keeping anything in-house that did not belong there. Yes, they are entirely cloud based. They are heavy users of Amazon EC2.
  • They used open source extensively and hacked it when needed.
    • They use Ubuntu 11.04 on EC2
    • Django for app server (stateless web – means horizontal scaling).
    • Stripped down web server (normally it is apache + mod_wsgi for python, but for their needs they needed low CPU webserver and therefore, they used ‘Green Unicorn’ (a Python WSGI HTTP Server)).
    • PostgreSQL for database (sharded cluster with 12 replicas in different zones)
    • Amazon S3 for photo storage
    • Amazon CloudFront for CDN
    • Redis as in-memory storage for feeds
    • Memcache for caching web service support (not sure why did not use Redis here also – most likely the software already works with memcache).
    • Apache Solr for searching (with JSON interface)
    • Twisted for pushing billions of notifications
  • Good focus on DevOps
    • They used nginx for load balancing (see my proposal for earlier).
    • They used Amazon Elastic Load balancer (though, they could do without it).
    • Munin for monitoring
    • Outsourced services for incident notifications(Pingdom for monitoring and PagerDuty for incidents)
    • Sentry for App server reporting in real-time

     

Slide21

The picture is a rough approximation (most of the information is taken from the wonderful site: http://instagram-engineering.tumblr.com)

What lessons lie for us poor enterprise developers, who are stuck using Java, and forced to use in-house resources that are neither flexible not scalable? Unfortunately, we will have to wait until the IT people let go of their cold dead-fingers off the inflexible IT.

Nevertheless, here is what an architect could do:

  1. Architect the systems such a way that parts of the resources (data, especially) lies outside the enterprise.
  2. Use open standards like REST and JSON to quickly pull together different systems
  3. Focus on DevOps from the beginning. Assume that your application needs to be maintained.
  4. Keep a consistent set of tools (most of the tools used in Instagram are popular in Python community)
  5. Most importantly, focus on getting the job done!
 Posted by at 11:20 am
Mar 092012
 

Last weekend, I was optimizing a site, to demonstrate what we can do, for a customer. I thought I would write about some of the basic tools that I used, since most of the junior developers do not know understand the full potential of these tools.

The two tools I reach out to test the web applications are Firefox and chrome. Chrome has good basic tools for performance testing, but Firefox shines with its many plugins.

First start by downloading the basic Firefox and installing it. Then, install the following plugins:

Firebug and friends

By far the most useful plugins are based on Firebug. Here are the ones I use:

  1. Firebug: Coolest plugin. You can see any element of the web page and understand what it is, what is the style applied, how the layout works etc. You can even dynamically edit the page to see how it looks before making any HTML changes back on the server.
  2. Friends of firebug: These ones are mostly cosmetic plugins that depend on firebug.
    1. friendly bug
    2. firebug autocompleter
    3. Firecookie
    4. Firefinder
    5. Fireflow
    6. FireRainbow
  3. Yslow: Understand why a web page is slow
  4. Pagespeed: Google’s version of why slow plugin

Here are the typical use cases:

Say, you want to see what styling a particular element has (for example, you want to see what font is being used). Here is what you can do:

image

Select the element and choose “Inspect element”. You get to see the HTML in one pane and style in another. Notice how the style part shows which css has impacted the style and how. If you want to see what is being overridden, this is the best tool.

Or, you want to examine how the HTML corresponds to the elements of the page, fire up Firebug and you will see the details in a separate pane. You can select an element and see the style, computed style, layout, and the DOM.

image

Here is another little trick. You can edit any of the HTML or CSS you see and instantly figure out the impact on the page.

Suppose you want to know why a page is slow. All you need is to run the Yslow plugin (which makes use of Firebug). Here is what you see:

image

You will see several suggestions to improve the performance of the site. Most of these suggestions are valid – but, it takes some expertise to interpret the results.

A similar view is provided by Google’s pagespeed:

image

I am not showing all the details, needless to say. Except for a couple of differences, it is similar to Yslow.

Developer Toolbar

While Firebug is interactive and nice, there is an older plugin that fulfills different needs (Web Developer). Here is what it can do.

  1. You can disable the elements of the page selectively – Javascript, CSS, etc. That tells you more about how the page looks.
  2. You can manage cookies (there are other plugins that specialize in cookie management – but this one suffices for most needs).
  3. You can see CSS about an element, edit it, and disable it etc. It is similar to firebug, but a little simpler to use.
  4. You can manage the forms – you can see each element description, layout, make the hidden fields visible, make the password fields readable etc.
  5. You can check out the images – sizes, positions, alt tags etc. Useful to see those web bugs.
  6. You can get various pieces of information about the page: layout, block details etc. It actually overlays the information on the page making it easy to understand for the novices.
  7. You can validate the page against several rules: Sec 508 compliance etc. You can validate HTML, CSS, links, images etc.

It is a neat little toolbar that is easy to use (firebug is more developer friendly; this one is user friendly).

HTTP protocol analyzers

Suppose you want to know all the requests going back from a page (especially these days with all the AJAX calls, you can see all those requests using these tools. I use the following tools:

  1. Live HTTP headers
  2. HTTPFox

I like HTTPFox for the better control. Here is how a facebook request might look like (View –> HTTPFox is the way to invoke it):

image

You can see the summary, and the details of each request. This is wonderful way of seeing how the CDN is working, caching is working, and the cookies are working.

Misc

Here are miscellaneous ones that I use:

  1. FireShot: You don’t need the Pro version. The free version is good enough for us. And, it lets us annotate the screenshot too!
  2. LightShot: This is a simple tool – not a commercial one.
Things Indian

These two plugins are no way needed for development. However, I use them often to help read and write Indian languages. Considering that they use something I invented (Telugu Transliteration scheme called RTS) 20 years back, I am particularly gratified. Here they are:

  1. Indic Input Extension – it lets you type Indian languages in English in normal keyboard.
  2. Padma – it lets you transform a page written in in romanized Telugu to Telugu script.
Summary

With these plugins, we can analyze a page, optimize a page, and transform a page. I have setup a collection called: https://addons.mozilla.org/en-US/firefox/collections/kramarao/deve/ so that you can download all of them in one go.

 Posted by at 10:28 am
Dec 202011
 

Consider the simple web program. For the sake of convenience, we will assume it is written in Java. In its simplicity it most likely is as the following:

simple-db-backed-site

What are the implicit assumptions we are making?

  1. The database is supporting the reads and writes into the system.
  2. If other systems need this information, they access it from the database.
  3. Any writes happen directly to the database.
  4. The data fits in the database – both structure wise and size wise.

Now, let us see what the issues with these assumptions are. Turns out that we can relax or change each of these constraints to get a different kind of system. For now, let us take a specific path so that I can illustrate a point. I will cover the rest of the roads in other posts.

So, consider the case of database not being very convenient for access. That is, structurally, the data, while it fits well in the RDBMS (as relational tables), needs to be viewed in other forms, most notably as objects. And, of course, going to database for each and every access is costly.

Putting it another way, you need to have object interface to the database to handle the data well.

Enter ORMs (Object Relational Mappers). A typical layout would be:

simple-website-with-orm

What does this mean?

  1. The Java applications handle the data through POJO’s (plain old java objects).
  2. The ORM translates between the POJO’s and the relational tables of the RDBMs, using JDBC for communication.

Some popular choices for ORM are hibernate and OpenJPA.

It all looks hunky dory, until you start seeing performance hit. You realize that for a simple object look up, you end up doing a bunch of queries. Of course, you want to do some caching to take care of this problem. Now, the picture looks like this:

simple-website-with-orm-cache

Summary: ORM could benefit from a cache.

Now let us look into this cache a little bit more deeply. If all that we are looking to speeding up the ORM, then we are fine with handcrafting a cache solution. Turns out that there are a lot of choices for Cache, as caching is a general-purpose paradigm.

Some popular choices for Cache include Ehcache etc.

Now, an interesting phenomenon is this: instead of cache, we can even use a main-memory database! After all, it is tuned to be in main-memory, with fast accesses and well debugged.

If we have a main memory database as a cache, the picture would look like this:

simple-website-wth-orm-mmdb

Now, look at how it looks: You have a main-memory DB and you have an RDBMS. You have your ORM synching between them, a job that it never signed up for. In fact, if you have main memory, can you get away with eliminating ORM completely, rendering the following picture?

simple-website-with-mmdb

Of course, it is possible only if the following hold:

  1. The main memory database should support all the data structurally.
  2. The main memory db should support all the data size wise as well.
  3. The synching from MMDM and RDBMs should be possible.
  4. If MMDB, in conjunction with the RDBMs should support transactions (ACID properties and perhaps roll backs).
  5. The Java program should be able to access MMDB fitting with the way Java programming is done.

Let us look at each of these areas and see how the picture changes:

MMDM and data structures

The standard RDBMs support only tables. If you want represent a matrix or a sparse array, you know the length of troubles you have to go through to represent in RDBMs. So, before you think, RDBMs are structurally superior, you should pause and think about that.

Enough dissing of RDBMs. Is there a way that MMDB’s operate to support different structures? There is good news and bad news. Perhaps bad news is good news.

The bad news is this: There is no one ubiquitous way that all the MMDB’s represent the data. That means, you cannot assume that the data is going to be in tables, neatly described by metadata. The side effect is that no way you can standardize the access to the MMDB. After all, you are storing different structures.

The good news, as a corollary is this: You can pick and choose the DB that is most suited for your purpose. That means, your program’s view of data and your MMDB’s view of data does not differ. That is a good state to be in.

There are a few choices that MMDB’s offer. Without being comprehensive, let me give you a couple of choices.

Key, value databases

kv-db

In here, you can assume the whole database to be a giant table. (Think BigTable – in fact, google’s BigTable is a good example). Just like you access a hash table, you access your data in your MMDB.

A few things to observe:

  1. You can store anything that you want to in the KV store. In this picture, I showed storing as strings – but you can use other than the strings, say XML or Object serialization.
  2. You can retrieve only based on the key.
  3. It is not a general-purpose database – you cannot do general-purpose queries. The storage types are quite primitive.

There are several examples of this kind of databases: MemCache, Redis, and so on.

Document style databases:

One of the problems of KV style store is the opaqueness of the values. That is, the value is treated as just a value – you cannot peer into it, query it, see it as a complex structure. Document style db’s address that problem.

Let me illustrate it in the context of Mongo db, which is a document style:

mongodb-schema

Key points to note are:

  1. You are using very structured way of storing data. In fact, you are using JSON to store the data, which is nice to use.
  2. You don’t have a fixed schema. You can keep any data associated with the object.
  3. I did not show you the queries – you will see that you can query based on just about any column.

This kind of data storing is amazingly simple. Consider the alternative of storing in a relational database. You will store either metadata in the database or you will create a large schema. Either one will have problems. [This topic deserves a note of its own – someday.]

If you notice we are conflating two issues:

  1. Performance that comes from keeping the data in main memory.
  2. Supporting the right kind of data structures in the database.

For example, we can even support relational data in main memory. A properly tuned Oracle keeps lot of table data pinned in the memory.

So, for our needs, we can think of the structure of the data is primary.

Distributing data: key to scalability

So far, we have been assuming that all the data resides in memory. What if the data doesn’t fit in the memory (of one machine)? For instance, JVM has well known limitations on the memory (on 32bit machines, it can support, at best 4GB, and in reality considerably less; on 64bit machines with 64bit JVM, too many issues to maximize the heap), we may want to keep the database out of the JVM process.

There are other reasons why you want to keep outside the process as well. For instance, what if you want to use the same db from multiple apps? What if we want to independently backup and synchronize the database?

The best way to do that would be:

mapper

That is, you partition the data and go to different data source (node) for querying. The mapper determines where to go for which data – think of it as a hash function.

For example, let us say all the customer information is partitioned based on the first letter of their name. You don’t need a mapper explicitly – each application can map the query to the node and get the information via a web service or something equally simple.

Of course, with this kind of partitioning a few questions will come up:

  1. What is the effective way of partitioning the data? (think: There are not that many people whose name starts with Q).
  2. How do we create partition tolerance? What if one of the nodes goes down? Can we have a different node answer the same query? If we have a different server, how does that synchronize with the original one?
  3. How do we support transactions? Can we support Atomicity? Concurrency? Isolation? Durability?

Turns out that there is a theorem (CAP theorem) that shows that you can’t have everything: consistency of the data across all the nodes, availability of the data even if some nodes may fail, and partition tolerance (failure in syncing the data between nodes will not cause problems for the systems, in the short term). The real theorem is more complex – read it up if you want to really know what it says.

So, what do people do? Actually, the compromise in the quality of the database supports interesting real-life situations. For example, let us say that you uploaded a photo from your home and your friend in India doesn’t see if for 2 minutes. Would you be upset? Or, you commented on some thread and somebody else doesn’t see it for a while. Or, your bulletin board doesn’t respond occasionally throwing open a page saying “Please visit in a little while”?

All of these are examples of specific situations that even with limitations of CAP, can be addressed usefully with these kind of databases.

What is the lesson for us?

Here is what we have seen so far:

For some specific situations, NoSQL databases can offer simpler mode of development and higher performance for web development.

If you are developing some expertise for yourself or your group, these are the areas that could help:

  1. Understand and classify the situations for which NoSQL databases work well. Create a flow chart for choosing the right technology components suitable for the problem context.
  2. Most enterprises are nervous about the risk – which NoSQL DB is going to survive? They recall the days of betting on Sybase only to ask for migration funding years later. So, create a risk mitigation strategy:
    1. By creating an API layer that is more semantic in nature and is implemented on more than one NoSQL db.
    2. By creating a way of syncing the data with the backend Oracle or DB2.
  3. Tune the performance by adjusting the level of C or A or P that we need. In fact, you will find out different databases support different CAP. This knowledge is useful to recommend and tune a NoSQL database for a specific need.

In the later posts, I will drill down into these topics to see how we can put together the right solutions for the problems that we face.

Nov 072011
 

I am not the one to say follow the crowds. I am all for listening to a different drummer. But, in our field, we rely on so much publicly available code, knowledge, and support, it makes sense to understand where the world is.

Consider programming languages.

image

(Courtesy: TIOBE).

How to interpret this list? This is the compiled list based on several different web properties with some weightage. How many searches, how many pages, how many blogs, how many tweets etc. Obviously, it doesn’t tell the real story. Still, you know where the buzz is. It either means that there is lot of innovation is happening or there are lot of issues with that language for people to seek help. Btw, I can’t explain why Ruby and Python are on the lower trajectory. Perhaps there is a consolidation of the information that is going on.

Let us take a look at another site: www.indeed.com which measures the job trends (not absolute numbers). You can go to http://www.indeed.com/jobtrends and try it yourself.

Here is the graph for Java, C#, Objective C, JavaScript, and Python.

image

This is in absolute scale how the growth is occurring. That may be unfair to some languages that start out on smaller basis. Let us take a look at relative trends:

image

What can you conclude from relative trends? That iOS is fuelling Objective-C growth. Still, based on my experience, these relative growth stories are short living.

So, what are the top trends in Job market these days? According to indeed.com, these are the top trends:

  1. HTML5
  2. Mobile app
  3. Android
  4. Twitter
  5. jQuery
  6. Facebook
  7. Social Media
  8. iPhone
  9. Cloud Computing
  10. Virtualization

Of course, as usual, this should be understood in the context. It is trend, not the absolute need. That is, today there may be 5 twitter developers and they may need 10 tomorrow. On the other hand, they may have 100,000 Java developers and they need 120,000 tomorrow. So, twitter has 100% growth while Java has only 20% growth rate. Where would you specialize in, if you want a surety of job? Java, of course, considering that they need 20000 people vs. 5 more people in twitter.

Coming back, let us see how to understand these trends again. The way I read it as is this: These are the skills that are gaining popularity because of a market reason. Therefore we have to understand the why the market is going that way so that we can see how we can solve those problems.

Let us focus only on web development for a second and see what the story is there. Looking at the site www.builtwith.com. They too have a wonderful http://trends.builtwith.com as well as a report for 2011.

Quoting from that report, here are a few things:

I am a fan of JQuery mobile. I know it is not as good as Sencha Touch, but it is getting there. jQTouch is popular because of the small size – but that advantage will not last too long.

If you are a young developer who is looking to specialize in an area, don’t worry about any of these: Become a world class expert – that is good enough. If you are a manager and want to play it safe, you can use these trends to formulate your staffing strategy.

 Posted by at 11:51 am
Nov 022011
 

I keep reviewing so many documents in several stages. In more than once document, people keep on assuming that Multi-tenancy is not only desired, but also mandatory for applications to move to cloud. People, that is wrong notion. We are not in dark ages any more. If any multi-tenancy idea is old and should be moth-balled (except in specific cases).

Let me make the case for you.

Let us take a look at original server applications: your sendmail, your apache server, your ftp server etc. Each one of them supported some sort of multi-tenancy. Technically, we can assign multiple “names” to a single computer and we can make each of the services respond to that name it their own way. That is a virtual server comprised of all these virtual service. Of course, there is no partitioning of the computer.

What is wrong with it? Nothing, if all you want is to provide the services to different servers. In fact, you can take a look a virtualmin, a program that automatically creates the virtual services on various application servers like mail server, web server, db server and so on.

image

Let us look at the down side. You can configure the server with disk space and some other quotas. But, for most part, the sharing of the service is a really cooperative plan. If one of the virtual servers take up all the computing or networking resources, then the other servers will have to starve. Yes, some applications do offer support for fair scheduling, but it is awfully difficult to do that.

Then there are other issues: security, for one thing. Any bug in the app may leak the information to untended parties. If one of the servers want a new version of the server, no such luck.

When we say an application is a multi-tenant, this is the property we are referring to. That is, a single instance (or a set of instances) serving multiple clients, especially, each client getting their own virtual service.

Considering the difficulty of building a multi-tenant application, there is no reason to put that much effort into making an application multi-tenant. A better option is to create a virtual appliance. It is simpler to manage, maintain, and secure such an appliance.

image

For instance, these are the appliances http://www.turnkeylinux.org provides. All free and easy to use. And, there is no need for these applications to be multi-tenant. If you need to support one more client, just fire up one more virtual appliance. (An appliance is a virtual server, that is stripped down to minimal installation so that all it does well is just run that application). A simple virtual appliance can range from 20 MB(!) to 250 MB.

Summary: Don’t worry about multi-tenancy, unless you are building a blazingly fast or performance oriented, or special purpose software like an enterprise database. Otherwise, you are best served by a virtual appliance. This is the trend and get with it!

 Posted by at 11:46 pm
Oct 172011
 

I grew up with Unix (RIP, dmr!) for a long time. Originally, my idea of computing is playing with Unix. In fact, I still say that Linux taught me more about practical aspects of programming than any of my courses.

Two years back, I became involved in a large project where I was closely working with Infra and ops teams. It was a very valuable experience where I could see the impact of each architectural and programming decision we take. I knew all that before, of course, but that project made it much more vivid.

SNAGHTML37e6549

Since then, I have been spearheading a movement in my organization about nextgen application platform which brings the operational and infrastructure view to development. Virtualization, of course, is the key technology that can deliver the synergy.

The big revelation for me is how much of infrastructure knowledge is required to engineer a proper system. We think machines as an abstraction, where we put the code in and they run magically. We ignore the true costs of deploying and running the code. In fact, the wall between the development and operations is the one causing lot of application instability or inflexibility.

Incidentally, if you look at most modern computer companies (like Google, Amazon, Facebook), they embrace this philosophy of erasing the boundaries. The devops people straddle the multiple universes – see the picture below:

From Wikipedia

What is fascinating to me is this: How the trends become part of collective consciousness. One day the word doesn’t exist and then it seems to be part of every conversation. I think devops is one becoming one such word.

To be concrete, what is devops? Well, there is no one single definition. The goal, though, is clear: to align development and operations so that they can support business agility. How does devops do it? Through a a set of practices addressing the following issues:

  1. Last mile problems in development: Deployment, testing, configurations, roll outs, integration …
  2. Non-functional problems in applications: HA, DR, performance, caching…
  3. Operational aspects of applications: Log file management, log correlation, upgrades, roll backs …

I may be missing some, but by and large, this is the main focus.

What are the tools that support it? Well, as it is a nascent field, there are lot of tools that are being built to support devops. I can only list out a few:

  1. Configuration management: Puppet/Chef – so that we can automate most of the configurations, including those of applications. Extensible and declarative. It is instructive that these tools came out of people that worked at cloud companies like Amazon.
  2. Source code control systems: Git and Github seems to be the way most of these systems are developed.
  3. Monitoring tools: Hyperic etc.
  4. […]

Looks like I can’t enumerate these ones as they keep growing. Just Google and follow the list.

Meanwhile, this is what I can suggest for any developer/sysadmin who wants to become good at devops thing:

  1. Learn a modern scripting language, or even two: Python and Ruby are the popular ones.
  2. Learn the existing tools: Start with Git. Use Puppet and so on. Play with a hypervisor.
  3. Learn standard datacenter components: Learn about storage (Standard SAN). Learn about network.
  4. Learn some standard deployment architectures: Say,using varnish, HA-Proxy, and caching. Or, site-to-site replication. Or, database backups.
  5. Learn some web servers: I would suggest in-depth understanding of one web server, say apache, would help a great deal.
  6. Learn about VMs: Not merely VMware based VM’s, but JVM as well. Understand a little basics on how to monitor, manage, and optimize.

I am also new at devops – but most of the skills that are needed for devops is what an average Unix programmer used to learn in 90’s. It is just old wine in new bottle. Still vintage!

 Posted by at 1:06 pm