Feb 182014

I attended Strata last week (Feb 11-13) in Santa Clara, CA, a big data conference. Over the years, it has become big. This year, it can be said to become mainstream — there are lot of novices around. I wanted to note my impressions for those who would have liked to attend the conference.

Exhibitors details

The conference exhibitors can be distributed into these groups:


As you can see Hadoop is the big elephant in the room.

Big picture view

Most of the companies, alas, are not used to the enterprise world. They are from the valley, not the from the plains where much of these technologies can be used profitably. Even in innovation, there are only a few participants. Most of the energies are going in minute increments of usability of technology. Only a few companies are addressing the challenge of bringing Big Data to main stream companies that already invested in plethora of data technologies.

The established players like Teradata, Greenplum would like you to see big data as a standard way of operating along with their technologies. They position big data as relevant in places, and they provide mechanisms to use big data in conjunction with their technologies. They build connectors; they provide seamless access to big data from their own ecosystem.


[From Teradata website.]

As you can see, Teradata’s world center is solidly its existing database product(s).

The new comers like Cloudera would like to upend the equation. They compare the data warehouse with a big DSLR camera and the big data as a Smartphone. Which gets used more? While data warehouse is perfect for some uses, it is costly, cumbersome, and doesn’t get used for most places. Instead, big data is easy, with lot of advances in the pipeline, to make it easier to use.  Their view is this:


[From Cloudera presentation at Strata 2014].

Historically, in place of EDH, all you had was some sort of staging area for ETL or ELT kind of work. Now, they want to enhance it to include lot more “modern” analytics, exploratory analytics, and learning systems.

These are fundamentally different views: While both see big data systems co-existing with data warehouse, the new companies see them taking on increasing role to provide ETL, analytics, and other services. The old players see it as an augmentation to the warehouse when unstructured or large data volumes are present.

As an aside, at least Cloudera presented their vision clearly. Teradata on the other hand, came in with marketese which does not offer any information on their perspective. I had to glean through several pages to understand their positioning.

A big disappointment is Pivotal. They ceded the leadership in these matters to other companies. Considering their leadership in Java, I expected them to extend Map Reduce to multiple places. That job is taken up by Berkeley folks with Spark and other tools. With lead in Greenplum HD, I thought they would define the next generation data warehouse. They have a concept called data lake, which is merely a concept. None of the people in the booth were articulate about what it is, how it can be constructed, what way it is different, and why it is interesting.

Big data analytics and learning systems

Historically, analytics field is dominated with descriptive analytics. The initial phase of predictive analytics was focusing on getting the right kind of data (for instance, TIBCO was harping on real-time information to predict events quickly). Now that we got Big data, it is not so much as getting the right data, but computing it fast. And, not just computing fast, but having the right statistical models to evaluate correlations, causations and other statistical stuff.


[From Wikipedia on Bigdata]

These topics are very difficult for most computer programmers to grasp. Just as we needed understanding of algorithms to program in the beginning, we need the knowledge of these techniques to analyze big data these days. Just as the libraries that codified the algorithms made them accessible to any programmer (think when you had to program the data structure for an associate array), new crop of companies are creating systems to make the analytics accessible to programmers.

SQL in many bottles

A big problem with most big data systems is the not having relational structure. Big data proponents may rile against the confines of relational structures, but they are not going to fight against SQL systems. Lot of third party systems assume SQL like capabilities from the backend systems. And, lot of people are familiar with SQL systems. SQL is remarkably succinct and expressive for several natural activities on Data.

A distinct trend is to slap on SQL interface onto non-SQL data. For example presto does SQL on Big data. Or, impala does SQL on Hadoop. Pivotal does Hawq. Hortonworks does Stinger. Several of them modify SQL slightly to make it work with reasonable semantics.


Big data conference is big on visualization. The key insight is that visualization is not something that enhances analytics or insights. It itself is a facet of analytics; it itself is an insight. Proper visualization is the key to so many other initiatives:

  1. Design time tools for various activities, including data transformation.
  2. Monitoring tools on the web
  3. Analytics visualization
  4. Interactive and exploratory analytics

The big story is D3.js. How a purely technical library like D3.js has become the de facto visualization library is something that we will revisit some other day.



I am disappointed with the state of big data. Lot of companies are chasing the technology end of the big data, with minute segmentation. The real challenges are adoption in the enterprises, where the endless details of big data and too many choices increase the complexity of solutions. These companies are not able to tell businesses why and how they should use Big data. Instead, they collude with analysts, media, and a few well-publicized cases to drum up hype.

Still, Big data is real. It will grow up. It will reduce the costs of data so dramatically to support new ways of doing old things. And, with right confluence of statistics and machine learning, we will see the fruits of big data in every industry. That is, doing new things in entirely in new ways.

Nov 042013

The usability of banking took a big leap, with the invention of branch office. In modern times, in the 12th century, the Templars created a network of branches, taking the banking business to where there is a need – for instance to middle east and England. They allowed the movement of funds, currency conversion, and ransom payments to happen smoothly in those days.

In recent times, a prominent feature of wild west town is an imposing bank building. This branch office provided a life line of credit to the local citizens. Along with rail road, post office, church, school, and news paper, bank branch provided the underpinnings of civilization. The building is meant to convey stability, a much needed quality, as fly-by-night operators running off with depositor’s money were common in those days.  Bank of America is supposed to have spearheaded the growth of satellite branches in the US.

Copyright: www.jaxhistory.com

In the 20th century, with the advent of telephone, traditional banking got extended slightly. Unlike before, you do not need to go to the bank to carry on certain kind of transactions. You could call and enquire about the status of a transaction. You could call and even initiate certain transactions. Still, if you needed cash, you needed to go to the bank.

Credit cards changed the situation quite drastically, starting in the 60’s. You could charge the card for various purchases. In a way, it created an alternative to traditional cash. You are not using the cash, but the credit letter that the bank gave you in place of cash, in a fashion.

ATM’s changed even that situation. You can get cash when you need it – in a mall, in a super market, at a train station, and even in a casino. It truly untethered us from the branch office.

Internet: How it changed the banking

Considering that we have a way of carrying on transactions without even a branch office, do we really need branch office? That would the natural question that we may ask, looking at the trends.

As soon as internet became reliable, traditional banks have taken a different approach. They did not see it as a replacement to the existing channels, but yet another channel to serve customers. They created websites, exposing the online transactional systems and querying systems to the consumers. As technology, and adoption of technology improved, they improved the websites. They added even mobile apps.


Today, a stated policy of innovation in banking might go something like:

  1. Enabling users to conduct lot more transactions on their website.
  2. Enabling mobile users – that means, mobile apps, to conduct transactions.
  3. Offering lot of analytical tools: analysis of transactions, planning.
  4. Gamification to get users to behave in certain ways – for instance, improving saving rates, planning properly and so on.
  5. Adding new products such as group banking etc.

In most situations, banks see these efforts augmenting their traditional channels. In fact, the biggest effort these days is to reconcile these different channels. Integration of data (for example, getting the same amount of balance on iPhone app or ATM), integration of processes (for example, starting a wire transfer online and finishing at the branch) are some of the challenges in this channel unification effort.

Modern banks have taken a different route. Since they have not established branch offices, they bypass that infrastructure, and make it a virtue. They offer better interest rates, better usability of the applications, and better customer service. For example, check out http://www.bankrate.com/compare-rates.aspx to see the best rates – they are offered by banks with no local branches. Bank Simple, which tries to offer superior technology service, has gained more $1B deposits within an year of opening, without any track record.


[Simple.com’s mobile application].

Surprisingly, a bank’s ability to attract customers is directly proportional to the number of branch offices they have in the neighborhood. [See: http://www.economist.com/node/21554746]. However, with the changing demographics, wider adoption of technology, and the pressure from different industries, the situation is changing.

Web 2.0: How it will change the banking

Whether banks view internet applications as an another channel, or the primary channel, the focus has been always about improving their applications: websites, mobile applications, internal applications. Yet, the biggest financial innovation of the early internet, PayPal, does none of that.

[Using PayPal’s payment services: A workflow from PayPal developer site].

Technology wise speaking, PayPal succeeded in taking the ball where the game is, instead of insisting people come to its playground. It successfully integrated into several online store fronts. It is almost like it setup ATM’s all over the internet, at the moment of purchase.

When we look at other industries, we see the same trend. Instead of assuming the burden of developing the applications consumers want, they allow others to develop apps. With extreme segmentation, they allow multiple groups to develop and serve different segments as those groups seek to serve. In fact, several companies use API’s are a way to increase awareness with internal departments, external partners, and potential employees. They embrace it to such an extent, that they even hold hackathons to create apps.

In mid 90’s, I read a paper called, “It’s bits, stupid”, a take-off on Clinton’s “It’s economy, stupid”. The concept is that the telephone companies controlled the telephone applications from beginning to end. Want to introduce three way calling? You need to go and change the switch code, change the telephone handsets etc. Want to have call hunting? Again, you need to change code in the switch etc.

Compare it with internet, where it was only interested in pushing bits. Building the actual apps was left to the ecosystem. Internet, web, VOIP, Google hangouts – all these were result of that innovation. To think that SS7 could have been TCP/IP or even could have assumed the same openness as TCP/IP is unimaginable these days.

In fact, even in the staid old world of telephony, one of the most successful companies in creating an ecosystem is twilio. Using its API’s people have crafted different applications ranging for customer service apps, SMS apps, and helpdesk apps.

[Twilio has the ability to analysis of the calls – this app is put together on top of Twilio API’s. Copyright: Twilio.]

If Banks have to embrace this way of participating in a large ecosystem, they need to change the way they develop applications. They could take cues from successful companies like Twitter and Facebook. Twitter built its entire business through API’s allowing users to share stories, comment from within the applications. So did Facebook. Let us see how companies are embracing this philosophy of separation of core API’s and apps, .

API economy

When we look at companies that are successful at fostering an ecosystem where others can participate in developing applications, we find the following:

  1. They make it easy for others to use the API’s.
  2. The standard, routine, or the core portion of the logic is managed by  the company. The customization, specialization etc. are delegated to the ecosystem.
  3. They allow the users to integrate into their workflows and ways of working.

Even if the companies are not interested in exposing the APIs to general public, they are interested in going this route at least for internal audience. For one thing, in several large companies, different groups behave as perfect strangers – therefore, all the standard techniques of getting developers to adapt your platform and API’s apply here. For another, the technical and engineering advantages are increasingly in favor of this approach.

[How Netflix develops even internal apps using REST API’s. Copyright: Netflix].

We can analyze the API economy from two different trends:

Banking trends

For banks, the API’s offer an interesting mix of privacy, convenience, security and trust. For instance, PayPal offers privacy (they need not know my cc number), trust (they can trust that PayPal will pay out and do any dispute management). The most popular with new web companies, stripe, offers both, without the burden of keeping track of payments, or regulatory compliance of keeping the CC numbers.

The tug-of-war we see these days is between these two: trust and privacy. Lot of people hate PayPal because they do not trust its track record as the arbitrator. That is, it is protecting privacy, even at the expense of trust. Cash for example, offers a good balance between trust and privacy. However, it is not convenient. Bitcoin offers perfect anonymity, and little less of trust. Banks offer great deal of trust, but little less anonymity.

[Does popularity = trust? At least in Bitcoin case, it seems to be so.]

The current generation is losing its trust in governments. With the rise of citizen journalism, governments are seen as cynic at best, or corrupt at worst. Banks, aligned to government through fiscal policies, are tainted by the same guilt. While the current business does not suffer, and even the future business – commercial and high net-worth business may not suffer, individuals may eventually find alternatives to banking.

Hopefully, with the right API’s banks will relinquish some of the power they hold, for which they are blamed. If all I am doing is facilitating a payment, then, I cannot be held responsible for the application built on it, correct? While the laws catch up to the creative frenzy of the internet, banks will end up focusing on providing safe, proven, trusted, and secure services.

Incidentally, banks already offer API’s, whether in proper technical form or not. They work with tax prep software to get the tax details. They work with aggregators like mint.com, sigfig.com, yodlee.com for the  get the details of the user accounts for analytic purpose. Most of these aggregators built solutions to get the account details from banks, but lot of those solutions are brittle, without support from banks.

Mint.com example

[Mint.com got the information from two accounts here: Etrade, Fidelity and showing the analysis].

Technical trends

Loosely speaking, APIs are SOA for the easy app development. Most modern API’s are simply JSON over HTTP. Typically, they are used directly from the web by:

  • including the js library
  • calling out the API (via HTTP protocol)
  • parse the result and display the result. (sometimes, js library may have standard display library as well).

For instance, consider this API for Stripe, a payment company:

<form action="" method="POST">
    src="https://checkout.stripe.com/v2/checkout.js" class="stripe-button"
    data-name="Demo Site"
    data-description="2 widgets ($20.00)"

Here, we included the stripe checkout.js library. We are including all the needed information with that call. The result should look like this:


In this scenario, the credit card number doesn’t even touch the local system. That means, PCI compliance does not apply to this site. The credit card information is handled by Stripe.

Architecturally, applications are converging to this broad pattern:


In this picture, the backend services are exposed by the API’s. With the rise of HTML5 and the front-end MVC, the architecture will look like this:


What it means is this: The API’s can directly be consumed by the browser based application. We do not really need server side page creation at all. For instance, I can develop a static shopping mall application with ability to track users, send mails, take payments, integrate with warehouse, all from within the browser, without writing any server side code!

This paradigm is becoming so successful, there are several companies that are catering to developing, documenting, managing, and delivering the API’s:

  1. apigee: API management and strategy company. They raised close to $107 million dollars so far.Their strategy especially focuses on mobile application development on API’s.
  2. Mashery: Competition to apigee. They only (!) raised $35 million dollars. They have been at this game far longer.
  3. Layer7: They are extending their SOA governance to API management and governance.
  4. Apiary: This company offers services to collaboratively design and develop services. They generate documentation, test services from the API description. They have a nice site, http://apiblueprint.org/ that describes API development and offers several services free.
  5. Apiphany: Acquired by Microsoft, this company is going to serve API management within Azure family.

There are several other companies that have entered this already crowded market. If history is any indication, eventually, the technologies, tools, and skills that these companies are developing will become available for enterprises at competitive prices.

Other industries: How they embracing API’s

These API management companies provide only limited perspective on API development. To truly embrace API based technologies, solution design, we should look at the current generation technology companies. The website http://leanstack.io/ describes how cutting edge technology solutions are built, using API’s offered by several other companies. For instance, highly successful Pinterest uses the following services:


As you can see, several of these cloud services are available as API’s to integrate into applications. Google analytics lets apps track users. Qubole is used for big data services. Sendgrid lets apps send mails.

In the current crop of companies, there are several services that are cheap enough and modern enough for banks to be able to integrate into their applications. They can reduce the effort in developing comprehensive solutions and increase customer satisfaction. For example, Rightsignature offers easy way to get docs signed, with support for integration via API’s. Hubspot provides API’s to make use of its inbound marketing services. Qualaroo lets you design, target, and host surveys for your users easily. Spnnakr lets you offer segmented pricing.


Banking is evolving. By focusing on the essential services, it can foster new innovations from the community of users and companies. Currently, technology is embracing API’s as a way to integrate services from different providers to create new consumer applications. Banks may not be able to create such an ecosystem by themselves, but they can participate already existing ecosystems. By creating the right technology support via API’s, banks can offer the solutions that meets the needs of diverse audience with different demands on privacy, convenience, security, and trust.

Oct 212013

This post has nothing to do with whether “obamacare”  is good or bad. It is only about the discussion of the technology stack and the details of it.

At $634M, it is one of the costlier government projects. At its launch, it ended up failing for several users. Even now, they estimate 5M lines of code change to fix the system. What is going on?

The problem, viewed from one angle is simple. Let the users discover and onboard to a particular plan. And, it should cater to large number of users. Since the users do not have other options, you can dictate terms to users (you could say that they need to download a specific version of browser to work with it). Looks easy enough.

On the other hand, the problem is complex. There are any number of plans. There are several different exchanges. The eligibility criteria is complex. There are different agencies, databases, vendors involved. And, the integration is bound to be complex. So over all, it is, well, complex.

To top it off, it is overseen by government agency. These people are good at sticking to rules, procurement, checklists etc. If they check for the wrong things, the site meets all the requirements, and yet, fail.

Tech stack details:

The tech stack is modern. They rely on JS on the browser.

Performance issues

Summary: They are doing lot of rookie mistakes in optimizing the page for performance. With minimal effort, they can take care of most of the issues.

  1. They seem to use 62 JS files in each page. They need to use fewer files and minified as well to reduce the round trip. With 62 files, and that too, without expires headers, we are looking at 15 round trips and that means around 5 seconds of loading time itself (assuming .3 sec for round trip and processing).
  2. The page is heavy! The JavaScript is 2MB and the css is .5 MB and images are .25 MB. So, over all, the site needs to download 2.75MB just to start working.
  3. For the returning user, the situation is only marginally better. They still need to make 85 round trips (that is the number of components); but they only need to download .5 MB.

If experienced folks developed this site, they can reduce the round trip time to less than 1 second (5 fold improvement), easily.

Code quality issues

First the tech stack details.

  1. Bootstrap
  2. JQuery
  3. Backbone
  4. Underscore
  5. JSON
  6. JQuery UI (Why?)
  7. Parts of Scirptaculous

Their stack is not to blame. They are making use of API’s heavily. They use bootstrap (version 2.3), Jquery, backbone, underscore and JSON. I think backbone is too complex a technology (I am partial to Angular, or lot of other modern JS technologies), and the rest are simple enough. In the hands of novice developers Backbone can get very convoluted. In fact, the same can be said off JS. 

Let us take look at code quality (what we can see in JS files):

  1. Backbone is complex for these kind of apps. That too for average developers, BB tends to be difficult to use.
  2. Checkout the file: https://www.healthcare.gov/marketplace/global/en_US/registration.js – to understand how the code is laid out. They are not doing template driven development or metadata driven development. This is too much of hard-coded stuff. And, look for “lorem ipsum” too, while you are at it (that shows poor test coverage, or unnecessary code). (this file may be auto generated…).
  3. Use of too many technologies: Shows sub-contracting and no architectural oversight. For instance, if you are using Bootstrap, might as well stick to it, instead of getting JQuery UI stuff. Also, lot of JS files like Carousel etc are built in Bootstrap – why have separate ones any more?
  4. Lot of code in JS seems to have been generated by MDA – that may account for some of the mess. Check out the MDA tool: http://www.andromda.org/index.html

At this point, I don’t have much information: The github pages seem to have some code, but that may not be the one used here. The github uses static html generator (a good strategy) – but that is not what the current website has (Github code seems have been removed now).

Overall, it is looks like high concept, reasonable execution, bad integration, and terrible architectural and design oversight.

I will return with more, when I get some time to analyze the site.

 Posted by at 9:58 am
Aug 212013

We don’t talk numbers enough, in software business. You walk into a store to buy a gallon of milk and you know the price. You ask a software consultant how much the solution costs and he says, “it depends”. You cannot even get him to state assumptions and give a reasonable price.

At IITM, I heard a story about a German professor, and his style of evaluating the exam papers. In one of the questions, the student made a mistake in some order calculation. His answer was accurate but for the the trailing zero. The student got “0”.

Now, in general, if the student understood what he needs to do, and applied the formula, even if made some simple calculation mistake, he would get partial credit, in other courses. Getting zero marks was a shocker to him. But the German professor wanted was for the students to develop a good feel for the answer. For instance, if I were to calculate the weight (er…, mass) of a person and it came to 700 Kg, it should ring some warning bells, right?

In my job, lack of basic understanding about numbers holds back people, unfortunately. This ignorance shows up in places like sizing the hardware, designing the solutions, and creating budgets.

My inspiration for this post is a wonderful book “Programming pearls” that talks about these kind of back of the envelope calculations.

Network Latencies

Suppose you are hosting a web application in San Francisco. Your users are in Delhi. How much minimum latency can you expect?

The distance is around 7500 miles or 12000 Km. Light can travel 186000 miles per second. So, it takes around .04 seconds. But light travels around 30% less speed in fiber than in vacuum. Also, it is not going to travel in a straight line – it may go through lot of segments. Besides, there are relays and other elements that delay the signal. All in all, we can say our signal will have an effective speed of say .15 seconds. Now, we need to a round trip, for any acknowledgement – so that makes it .3 seconds. A simple web page will have 15 components (images, fonts, CSS, and JS). That means around 4 round trips (Most browsers do four components at a time).

So, just in speed of light basis alone, your network is going to take up 1.2 seconds. That is going be the number you are going to add to your testing on your laptop.

Developing Java code

I met a seasoned developer of Java code. He has been developing systems, and well versed with optimization. We were reviewing some code and I noticed the he was optimizing for creating less number of objects.

How much time does it take to create an Object in Java? How much with initialization from String? How much time does it take to concatenate strings? To append to strings? Or, to parse an XML string of 1K size?

The usual answer from most people is “it depends”. Of course, it depends on the JVM, OS, machine, and so many other factors. My response was “pick your choice – your laptop, whatever JVM, and OS that you use, when the system is doing whatever it normally does”. It is surprising that most people have no idea about performance of their favorite language!

In fact, most of us who work with computers should have an idea about the following numbers.

Numbers every should know about computers

In a famous talk by Jeff Dean, he lays out the standard numbers that every engineer should know:


These may appear simple enough, but the implications of developing code are enormous. Consider some sample cases:

  1. What if you want to design a system for authentication for all the users in Google (ignore the security side of the question – only think of it as a lookup problem), how would you do it?
  2. If you want to design a system for a quote server for the stock market, how would you speed it up?
  3. You are developing an e-commerce for a medium retailer with 10000 SKU’s (Stock keeping Units). What options would you consider for speeding up the application?

Naturally, these kind of questions lead to some other numbers. What is the cost of the solution?

Cost of computing, storage, and network

There are two ways you can go about constructing infrastructure: the lexis-nexis way or the Google way. Remember that Lexis-Nexis is a search company that lets people search for legal, scientific, and specialty data. Their approach to build robust, fail-safe, humongous machines that serve the need. On the opposite spectrum is Google, which uses white boxed machines, with stripped down parts. I suppose it gives a new meaning to the phrase “Lean, mean machine”.  (Incidentally, HDFS etc, take similar approach).

Our situation lies more towards Google. Let us look at the price of some machines.

  1. A two processor, 32 threaded blade server, with 256 GB is around $22000. You can run 32 average virtual machines on this beast. Even if you are going for high-performance machines, you can run at least 8 machines.
  2. If you are going for slightly lower end machines (because you are taking care of robustness in the architecture), you have other choices. For instance, you can do away with ECC memory etc. You can give up power management, KVM over IP etc, for simpler needs. [Example: setting up internal POC’s and labs.]. If that is the case, you can get 64 GB machine, with SSD of 512 GB and 5 TB storage at around $3000.

So, you have some rough numbers to play with, if you are constructing your own setup. What about the cost of cloud? There, pricing can be usage based, and can get complex. Let us take a 64GB machine with 8 hyper-threads. If we are running it most of the time, what is its cost?

Amazon’s cost tends to be slightly on the high side. You pay by the hour and it costs around $1.3 per hr. That is roughly equivalent to $1000 per month. If you know your usage patterns well, you may optimize it down to say, $500 per month.

Or, you could use one of my favorite hosting sites: OVH. There, a server like the above, would cost around $150 per month. Most of the others fall somewhere in between.

Now, do a small experiment in understanding the costs of the solutions: Say, to create an e-commerce site that keeps the entire catalogue cached in memory, what is the cost of the solution?

To truly understand the cost of solution, you also need to factor in people cost as well. That means, what is the effort to develop solutions, operate and support them.

TCO: Total cost of ownership

To understand the cost of operations and support, here is a rule of thumb. If you are automating tasks, reducing human intervention, you can assume that cost of design and development can range from $50 to $250. The variation is due to location, complexity of systems, effort estimation variations, choices of technology, and a few other details.

A few details worth noting:

  1. You can get a good idea of salary positions and skills by looking at the sites like dice.salary.com. Try indeed.com, for a more detailed look at, based on the history of postings.
  2. To get the loaded cost to the solution for labor, multiply the cost per hour  by 2. For instance, if you need to pay $100 K salary, the hourly cost is going to be $100 (2000 hrs per year).
  3. By choosing a good offshore partner, you can get the ops cost as low as $15 to $30.
  4. The cost of good design and architecture – as they say, is priceless!

As per technology choices: which technology gets you low TCO? What kind of numbers can you use in your back of the envelope calculations?

That will be the topic for some other day.

 Posted by at 4:51 pm
Aug 082013

Suppose you developed a website to show to others. You have it running on your laptop. What choices do you have?

You can setup a livemeeting:  This is what we seem to do in most companies.

  • That means you are going to show the demo. Unless you are going be physically there, they cannot access it.
  • If the audience wants to try out the website, you need to hand over control.

It is not a problem with livemeeting, per se. Even other paid ones like webex or free ones like join.me or Teamviewer suffer from the same issues.

After all, your goal is not just to demo the site by yourself, but to let others play with it.

Here is one alternative, using ngrok:


The beauty of this software is that it needs no installation. Just one executable. You can even carry it in a thumb drive! It is around 1.7MB zipped and 6MB sized executable.

The next step is to run it. Suppose I have a server running on port 8000. Here is the way I expose it to the world:


Now, what you can see is this: Your port 8000 is proxied through the open web under the name: http://65dc9ab7.ngrok.com

Before going there, here is how the website looks on the local page:


Notice that the host is localhost. If your friends try on their machine, they will not see these files.

Now, let us go to the http://65dc9ab7.ngrok.com that ngrok gave us from some other machine and see it:


See! Identical!!

Here are some other fun things to do:

Suppose your users are playing around on the site. You can see all their interactions from http://localhost:4040/

Let us see how that looks. I am going to access 1.html, 2.html and a1.html via the ngrok proxy.


That is, you get to see who is accessing your site, from where they are coming and what is happening. You can replay the entire web interaction too! This can be a simple, yet powerful debugging tool.

Now, let us look at other use cases.

Serving files from your machine

Suppose you have a bunch of files that you want to share with others. You can do the following:

  1. Install python (3.3.x will do nicely).
  2. Run the following command from the folder where you want to share the files from: python –m http.server. You can optionally specify the port number as well.

Logging into your machine from outside

Now, suppose you got a Unix machine behind the firewall and you want to be able to access it. Here is what you can do.

  1. Install shell in a box: https://code.google.com/p/shellinabox/
  2. Run the shell – by default on port 4200.
  3. Now, run ngrok to proxy 4200.
  4. Presto! Instant access from the internet to login to your Unix box.

Caution: Your acceptable user policies may vary. Please verify with your system admins to find out if you are allowed to run any of these programs.

 Posted by at 12:04 pm
Jul 282013

I used to run 4 vms on my machine for my application testing. Once I discovered containers, I now run close to 40 or 50 on the very same machine. And, as opposed to 5 minutes that take to start a VM, I can start a container under a second. Would you like to know how?

Virtualization: Using virtual machines

I was using computers since 1982 or so. I came to Unix in around 84 and always was using them until 1997. I had to use Windows after that, because Linux did not have the tools needed for regular office work. Still, because I needed Linux, I turned to an early stage company called VMware.

Now, virtualization is a multi-billion dollar industry. Just about most datacenters virtualized their infrastructure: CPU’s, storage, and network. To support the machine level virtualization, technology developers are redefining the application stacks as well.

Let us see how Virtualization looks normally.


This kind of virtualization offers lot of advantages:

  • You get your own machine, where you can install your own OS. While this level of indirection is costly, over the time several advances helped:
    •  X86/X64 machine level virtualization: That means, you can run guest OS at native speeds.
    • Addition of software tools such as VMWare tools: These tools proxy the guest OS requests (I/O, network) directly to the host OS.
  • You get root privileges, so you can install whatever you want. And, offer the services just like a physical machine would.
  • Backup, restore, migrate, and other management facilities are easy to manage with standard tools. There are tools to do the same in physical machines as well, but they are expensive, and not so easily doable in self-service model.

But, let us look at the disadvantages also:

  • It is expensive: If you just want to run a single application, it looks ridiculous to run an entire OS. We made tremendous progress in developing multi-user OS’es –- why are we taking a step back towards single user OSes?
  • It is difficult to manage: It may be easier compared to managing a physical machine. But, if we are comparing to running an app, it is lot more complex. Imagine: you not only need to run the application, but the OS also.

To put it differently, let us take each of the advantages and see how they are meaningless in lot of situations:

  • What if we don’t need to run different OS’es? The original reason for running different OS’es was to test apps on different OS’es (Windows came in many different flavors, one for each language, and with different patch levels).
    • Now, client apps run on the web, a uniform platform. So, no need to test web apps on multiple OS’es.
    • Server apps can and do depend not on OS, but a different packages. For instance, an application may run any version of LInux, as long as there is Python 2.7, with some specific packages.

Virtualization: Multi-user operating systems

We have a perfect system to run such applications: Unix. It is a proven multi-user OS, where each user can run their own apps!


What is wrong with this picture? What if we want to run services? For example, if we want to run my own mail service? Web service? Without being root, we cannot do that.

While this is a big problem, you can easily solve it. Turns out that most of the services that you may want to run on your machine, mail, web, ftp, etc, can easily be setup as a virtual service, on the same machine. For instance, Apache can be setup to serve many named virtual hosts. If we can setup to provide each other control over their own virtual services, we are all set.

Virtualization: At application level

There are several companies that did exactly that. In the early days of the web, this was how they provided users their own services on the machine. In effect, users were sharing their servers – mail, web, ftp, etc. Even today, most of the static web hosting, or prepackaged web apps run that way. There is even a popular web application webmin (and virtualmin) that can let you manage the virtual services.


What is wrong with this picture? For the most part, for fixed needs, for fixed set of services, it works fine. Where it breaks down is the following:

  • No resource limit enforcement: Since we are doing virtualization for each application, we have to depend on the good graces of the application to do the resource limit enforcement. If your neighbor subscribed to lot of mailing lists, your mail response will slow down. If you hog the CPU of the webserver, your neighbors will suffer.
  • Difficulty of billing: Since metering is difficult, we can only do flat billing. The pricing does not depend on the resource consumption.
  • Unsupported apps: If the application you are interested does not support this kind of virtualization, you cannot get the service. Of course, you have a choice of running the application in your own user space, with all the restrictions that come with it (example: No access to some range of ports).
  • Lack of security: I can see what all apps all other users are running! Even if Unix itself is secure, not all apps may be secure. I may be able to peek into temp files, or even into memory of the other apps.

So, is there other option? Can we provide a level of control to the users where they can run their own services?

We can do that if the OS itself can be virtualized. That is, it should provide complete control to the users, without the costs of a VM. Can it be done?

Virtualization: At OS level (Containers)

In early days, they had VPS (Virtual private servers), which did provided a limited bit of control. Over the years, this kind of support from OS has become more sophisticated. In fact, there are several options now that elevate the virtual private servers into containers, a very light weight alternative to VM’s.


If you want to see the results of such a setup, please see:

In the beginning there was chroot to create a “jail” for applications so that they do not see outside of that folder and subfolders is a popular technique to create a sandbox for applications. Features like cgroups have been added to Linux kernel to limit, account, and isolate resource usage to process groups. That is, we can designate a sandbox and its sub processes as a process group and manage them in that way. Lot more improvements which we will describe make the full scale virtualization possible.

Now, there are several popular choices for running these kind of sand boxes or containers: Solaris based SmartOS (Zones), FreeBSD Jails, Linux’s LXC, Vserver, and commercial offerings like Virtuozzo and so on. Remember that within that OS, the kernel cannot be changed by the container. It can, however, overwrite any libraries (see later about union file system).

In the Linux based open source world, there are two that are gaining in popularity: Openshift and Docker. It is the latter that I am fascinated with. Ever since dotcloud opensourced it, there were lot of enthusiasm about that project. We are seeing lot of tools, usability enhancements, and special purpose containers.

I am a happy user of docker. Most of what you want to know about docker, can be found at docker.io. I encourage you to play with docker – all you need a Linux machine (even a VM will do).

Technical details: How containers work

Warning: This is somewhat technical and as such, unless you are familiar with the way OS works, you may not find it interesting. Here are the core features of the technology (most of this information is taken from: Paas under the hood, by dotcloud).


Namespaces isolate resources of processes from each other (pid, net, ipc, mnt, and uts).

  • Pid isolation means a processes residing in a namespace do not even see other processes.
  • net namespace means that each container can bind to whatever port it wishes to. That means, port 80 is available to all containers! Of course, to make it accessible from outside, we need to do a little mapping – more later. Naturally, each container can have its own routing table, and iptables configuration.
  • ipc isolation means that processes in a namespace do not even see other processes for IPC. It increases security and privacy.
  • mnt isolation is something like chroot. In a namespace, we can have a completely independent mount points. The processes only see such file system.
  • uts namespace lets each namespace have its own hostname.

Control groups (cgroups)

Control groups, originally contributed by google, lets us manage resource allocation for groups of processes. We can do accounting and resource limiting at group level. We can set the amount of RAM, swap space, cache, CPU etc. for each group. We also can bind a core to a group – a feature useful in multicore systems! Naturally, we can limit number of i/o ops or bytes read or written.

If we map a container to a namespace and the namespace to a control group, we are all set in terms of isolation and resource management.

AUFS (Another Union File System)

Imagine the scenario. You are running in a container. You want to use the base OS facilities: kernel, libraries — except for one package, which you want to upgrade. How do you deal with it?

In a layered file system, you can only create what we want to. These files supersede the files in the base file system. And, naturally, other containers only see base file system and they too can selectively overwrite in their own file system space. All this looks and feels natural – everybody is under the illusion of owning the file system completely. 


The benefits are several: storage savings, fast deployments, fast backups, better memory usage, easier upgrades, easy standardization, and ultimate control.


Lot of security patches rolled into one called grsecurity offers additional security:

  • buffer overflow attacks
  • Separation of executable code and writable parts of the code
  • Randomizing the address space
  • Auditing suspicious activity

While none of them are revolutionary, taken together all these steps offer the required security between the containers.

Distributed routing

Let us suppose each person runs their own apache, on port 80. How do they expose that service on that port to outsiders? Remember that in the VM’s, you either get your own IP or you get to hide behind a NAT. If you have your own IP, you get to control your own ports etc.

In the world of containers, this kind of port level magic is bit a more complex. In the end, though, you can setup a bridge between the OS and the container so that the container can share the same internet interface, using a different IP (perhaps granted from DHCP source or, manually set).

A more complex case is, if you are running 100’s of containers, is there a way to offer better throughput for the service requests? Specifically, if you are running web applications? That question can be handled by using standard HTTP routers (like ngnix etc).


If you followed me so far, you learnt:

  1. What containers are?
  2. How do they logically relate to VM’s?
  3. How do they logically extend Unix?
  4. What the choices are?
  5. What are the technical underpinnings behind containers?
  6. How do we get started with containers?

Go forth and practice!

Jan 092013

A famous saying goes that you hire people who are “smart” and “get things done”. I cannot tell you if people can get things done from their innate capabilities or intelligence. But, over time, I realized that most intelligent people that I come across have the similar qualities. I struggled to characterize them until I came across this list in the book “Gödel, Escher, Bach: An eternal golden braid”. Quoting verbatim, here they are:

  1. to respond to situations very flexibly;
  2. to take advantage of fortuitous circumstances;
  3. to make sense out of ambiguous or contradictory messages;
  4. to recognize the relative importance of different elements of a situation;
  5. to find similarities between situations despite differences which may separate them;
  6. to draw distinctions between situations despite similarities which may link them;
  7. to synthesize new concepts by taking old concepts and putting them together in new ways;
  8. to come up with ideas which are novel.


I must say that this list is very satisfying. After coming across this list I started using it it my daily life. Am I exhibiting these qualities? Am I evaluating people based on these qualities? Do they correlate with other ways I evaluate the effectiveness of people?

I use this list in multiple ways. I use it to evaluate and improve myself. I use it to evaluate people that potentially work with me. I use it in my activities as an architect, as a technical evangelist, as a coder, as a pre-sales principal, and as a strategist.

(©: New Yorker)

To respond to situations very flexibly

I find several people who excelled at school fall short in life later on. People prepare well for well-set idealized scenarios. When the situation differs from what they learnt, they cannot respond flexibly.

For instance, let us see how it works in presales situation. You are prepared for a standard scenario that a customer might face – say, how automation of processes helps to save the money. You go to a customer for whom automation costs lot more than actually the savings that get accrued. If you forget why you were proposing the solution (to save costs), you end up needlessly pushing automation, when all they want is effective reduction of process execution.

In my experience, the flexibility demonstrates the ability to understand the big picture and context of the problem that only comes with a deep understanding of the subject or solution you are trying to present. I find that the following helps me greatly:

  • Understand the history: History offers a way of understanding subjects and how they come about. If we don’t understand why people are using certain tools or methods, we may not be able to offer a better way. For instance, the why of operating systems offers why we treated files differently from the network connections. Then, we might start thinking why not treat them the same way. We start seeing parallels from history and apply those as the situation demands.
  • Have a core set of logic tools: While this rule may appear generic, I find that most amount of flexibility in our thinking comes from understanding what is essential and what is not. My set of tools are from model theory – in particular higher order models. I start seeing things from the perspective of completeness and consistency, even when the situation presents in a new guise.
To take advantage of fortuitous circumstances

I am not sure if I agree that this is an essential quality of intelligence. But, still, most successful people have the ability to see opportunities that others don’t. I think it takes courage, self-confidence, desire, and deep sense of conviction all of which are not necessarily attributes of intelligence.

I am also fairly sure that I am not so good in this area. I do not know how one can “improve” here. I suppose this is some innate capability people have, either by nature or nurture.

To make sense out of ambiguous or contradictory messages

I read science fiction stories where species communicate with precision, the exact emotion or facts. I read about conlangs (constructed languages) that are incredibly precise. But, in reality, we deal with lot of ambiguity, imprecision, and sometimes deliberate obfuscation.

Most of us deal with this ambiguity daily. We find an organization pursuing unclear and multiple conflicting paths. We find people making contradictory statements. We may not see consistency in action and word.

It may be fodder for sitcom or science fiction humor – a fish out of water story of foreigners, extra-terrestrials not possessing the verbal intelligence of natives. The consequences for an organization are more disastrous: infighting, multiple directions, and inaction. It forces leaders to constantly explaining people what to do.

While it is easy to dismiss that it is innate capability, I think there is a way to develop it well, in particular, the following skills, which are strongly correlated to this capability:

  • Reading widely helps: In particular, I think reading good fiction, poetry, and even nonfiction helps us understand the ambiguity and imprecision in language. Reading classics has an advantage as they were widely interpreted from different perspectives.
  • Writing well helps: Right kind of writing can help cut through ambiguity. When I don’t understand a subject well, I try writing, just as a way of clarifying it for myself.
  • Understanding of logic helps: In particular, I go back to my first discipline, model theory and higher order logic to understand the model in terms of consistency and completeness.

(©: New Yorker)

To recognize the relative importance of different elements of a situation

One of the big milestones in growing up is understanding cause and effect. That feedback loop helps us to know the consequence of our actions. The same knowledge, when applied gives a way to assessing the importance of different elements of a situation.

Fortunately, this is easy enough to deal with. At the risk of generalizing and simplifying, the best way to deal with is through metrics and goals. If you know the end goal, and the impact of the tasks on the ability to reach the end goal, you know how to prioritize. For a full discussion of the topic, see: http://bit.ly/J8b3BW.

To find similarities between situations despite differences which may separate them

Abstraction of the core essential elements from different situations is a key human trait. Right kind of abstractions can help people understand the core situations. For instance, if we want to solve a problem, using the right abstraction, we can bring tools and processes from similar solutions.

Even in presales, customers would love to know how you solved similar problems for others. Everybody knows that exact problem is difficult to find. Finding a useful abstraction is key to creating a similar set of problems that can help solve the problem at hand effectively.

(©: xkcd.com)

To draw distinctions between situations despite similarities which may link them

The flip side of abstraction is reification or concretization. Every problem is different. Only when we understand the differences between the two different problems, we can understand what kind of customization we need to do for this problem. It may be a simple change to a tactic; for being aware of such need is key to look for such change.

For instance most transplanted solutions from a different culture do not work in another culture. The cultural context, which makes a situation vastly different requires change in tactics and strategy. Without that sensitivity, people misunderstand the situation. (“Iraq is just like Germany after the second world war – all it needs is democracy” – see how that works).
Space-time is like some simple and familiar system which is both intuitively understandable and precisely analogous, and if I were Richard Feynman I’d be able to come up with it.
(© xkcd.com)

To synthesize new concepts by taking old concepts and putting them together in new ways

There is a saying: “There is nothing new under the sun”. But, then, we see new creations from human ingenuity come every year. Most of them are novel applications of existing ideas. In my own field of programming languages, I see the ideas from 80’s becoming popular now. When I see CPS (Continuation Passing Style) in Javascript, I am home work assignments from my student days.

Creation of new concepts from old takes several pieces of puzzle:

  • Good understanding of history: Why this problem is important, what techniques were tried, what was the genesis for the final solution – all of these are important when we need to put together a new solution based on the old concepts.
  • Good understanding of the changes in situations: For instance, when we were solving main memory databases in 90’s, we had maximum of 4GB in general machines. These days, I have 64GB on my desktop machine. Naturally, the assumptions have changed, opening up the solutions that were not considered before. For example, changing economics of printing changed the way books are published.
  • Good understanding of the old concepts: Unfortunately, learning, especially in computers has become more and more understanding the details instead of concepts. Details will tie us down to an existing way of thinking. Understanding the core concepts (why, how, and what) lets us put them together in new ways.

Copied from interwebs

To come up with ideas which are novel.

I suppose, this is what classically considered intelligence and genius. I do not have any suggestions on how one goes about improving in this area. I find that I am a pastiche kind of person – I take a concept and apply in a different area. Personally, I do not think I have any ideas that can be called truly novel.

Concluding remarks

Looking at the core traits of intelligence, I can only conclude that some of them are hard to acquire. I do not know how one can train oneself to come up with new ideas. But, as for other traits, there is hope that we can practice deliberately to hone those skills. Fortunately, the world is large; there are lot of ideas coming from different places. All you need is to pick a set of tools that are good enough; and master them; and apply them consistently to learn these skills.

Happy practicing!

 Posted by at 10:00 pm
Nov 052012

An interesting article appeared yesterday in NYTimes titled “A Capitalist’s Dilemma, Whoever Wins on Tuesday” . While the title appears to be topical about elections, it is broader and more applicable to the modern times we live it. In fact, it resonates well with what I have been saying all along.

Long time readers of my blog know that I have been a proponent of innovation along three layers: Systems of record, systems of change, and systems of innovation. The first group focuses on efficiencies, the second one of better outcomes, and the third one on disruptive changes. Clayton Christensen (He is most famous for his book, innovators dilemma)  seems to say something similar, but from economic perspective.

In his worldview, as a finance professor, he sees three uses for investments:


In fact, this thought process is not new. In the classical Marxism, the concept of inherent contradiction in capitalism alludes to something similar. As capital gets accumulated, it leads to efficiencies, which reduces the need for people causing depression in demand and so on. Modern economists (Example: Galbraith) focused on how to break this depression cycle by creating disruptive industries. Fortunately, US managed to come out of depression by empowering innovations.

The current situation is this: There is lot of capital available. In fact, tons of it. The reason why is not being used is neither the regulation (which is what the right wing economists want us to believe), nor the lack of demand (which is actually true – but simulating demand for a long term cannot be a short term measure). [I may be out of my depth as an economist here – I am only rationalizing and could be entirely wrong, but it makes for an interesting narrative.]

What is the problem is systemic issue.  See how we measure the ROI similarly in all industries. In India, the real-estate ROI has sucked away any investment, which itself might have created a positive effect for real-estate. Instead, it became a short-term “slash and burn” operation that extracted ROI from real-estate at the cost of empowering innovations.

According to Clayton Christensen, the problem here is the way we have been measuring the financial returns: ROCE (Return on capital employed), RONE (Return on net assets), and IRR (Internal rate of return). They can reduce the investment and still make good ratio. Or, invest only in quick wins and get good metrics.

So, what is the prescription? According to the professor:

  1. Deal with the abundance of capital – by investing in the right skills.
  2. Change the metrics – so that we use the capital for the right tasks
  3. Change the capital gains taxes to promote the right investments
  4. Change the politics to focus on the empowering innovations.

Of course, I can see the following criticism:

  1. The right wing people will see it as a way for the government meddling in picking winners and choosers and shaping the behavior.
  2. The left wing people will see it as a violation of social contract which is to focus on helping the people in need, the poor and the elderly.

Recently, there have been growing calls for investment in infrastructure, however muted they may be. Let us see how it goes after these elections.

 Posted by at 11:18 am
Oct 252012

There was an interesting thread on reddit about money. It is not about investing or the moral values of money, but the fundamental concepts behind money. If you have not read it, I urge you to go and read it. I will wait.

Over the weekend, I read the book “Ascent of Money” by Niall Ferguson. As a Scot, he has a unique historical perspective, that comes from the local history, including Adam Smith. As a person from the erstwhile empire, he understands the role of capital in creating the empire. I did not like the book much, but still would recommend it for some amusing anecdotes that might have to be complemented with some other serious reading.

When I see lot of main stream news (http://dealbook.nytimes.com/2012/10/22/in-london-nimble-start-ups-offer-alternatives-to-stodgy-banks/) I think there is a chance that banking is ready to be transformed. For a long time, banking has been impervious to changes. Sure, there have been periodic busts and booms (remember S&L? Home mortgage collapse?), but the same banks have come up again, with very minor changes in the way business was done. This is entirely unlike other industries, which have gone through rapid changes (when was the last time you went to a travel agent? Or, referred to yellow pages?).

Of course, there have been some changes. Rise of discount brokerage, paypal as a payment mechanism are a few of those changes.

But, that all seems to change. While there are no fundamental drivers, the momentum of change seems to have picked up. The three trends I see are:

  1. Make it cheaper
  2. Make it convenient
  3. Disrupt fundamentally
Making it cheaper

Lot of banking services are costly. For example, if I have to wire some money, it costs $35. If I want to send an overnight check, it costs me $20. That is in the highly competitive markets like US, let alone places like Australia.

Of course, they pay me 0.5% interest for the money I keep with them. If I borrow, they lend at 5% or even more.

Any such highly inefficient market lends itself to disruption. Unfortunately, two things are helping financial institutions keep their edge: one is regulation. For good reasons, it is very difficult to start a bank or an insurance company. The second is the large amount of capital to start an institution. It takes credibility, history, and deep pockets to start one.

Let us see how this market is being changed, through some examples:

  1. Dwolla.com: Payment network disruption. What it does: It facilitates person to person or person to small bank payments at very reasonable cost. It still uses ACH (I haven’t understood the details, yet). It costs just 25c to receive any amount of money. Sending is always free (as is receiving under $10).
    For more details on how it is using ACH, see this:  http://www.finovate.com/spring12vid/dwolla.html
  2. Lending club; Prosper: They make P2P lending possible. Compared to banks, it is riskier, but gets you around 10% or so rate. Historically, banks too had similar risks which were addressed by FDIC. I suppose someday, this kind of lending will be well graded, bonded, insured and all that.
  3. Gocardless: London-based company that allows small businesses to set up monthly payments to suppliers at a fraction of the cost that banks charge.
  4. Wonga.com: Short term loans (think serving the bottom of the pyramid or the long tail).
  5. TransferWise:  acts as an intermediary for the money , in particular into European currencies.

If you notice the trends, making it cheaper does two things:

  1. Identify and reduce the market place inefficiencies: The inefficiencies were not attacked before for multiple reasons. May be it was the long end of tail; may be the consumers were not ready; may be the capital was not available. But, now, the situation seems to have changed for addressing the inefficiencies.
  2. Remove the middlemen and provide only needed services: On the face of it, it looks bad. Are we getting fewer services? But, then, lot of services are as useless as a fax machine. Why do we need lot of investment in the local branches? Why do we need high priced advisors, when most research is available free?

That is how lot of companies are changing the business processes and practices to alter the banking services.

Making it easier

The other class of changes in banking services are not so fundamental; yet they impact lot more number of users. They do that by combining disparate processes, tools, opportunities (a horizontal integration) to make it convenient for users. For example, what if you can deposit a check from you smartphone? Of course, it is not a fundamental change, but still very convenient. Now, multiply those kind of conveniences several folds and you start seeing lot of changes.

Some of the examples include:

  1. Stripe.com: Credit card payment disruption. Here, this site makes accepting payments via credit cards very simple over the web. It is not like Dwolla (which sets up its own ACH etc…). It makes it merely convenient for programmers.
  2. Square: Location based payments.
  3. Google wallet: Coupons + google services + payments.
  4. www.simple.com: A new kind of bank, where they will offer the experience, and keep the money at a traditional bank of their choice. So, they are creating a new value-added intermediary.

As you can see, lot of this ease of use is addressed through adding new kind of technology. It often adds a new intermediary, but it does so, by cutting away at the traditional players.

Disrupting fundamentally

It is in this context, I am referring to Niall Ferguson’s book. The history of money is replete with examples of places where different kinds of money has been created. While I am looking to see how the monopoly of the state on money can be broken, there are interesting trends:

1. Bitcoin: A fundamental way that money is created and used as a currency. Where there have been some problems, the idea is refreshingly new and takes advantage of the concept of leverage, currency, and notes.

2. <???>: I see lot of ways bypassing money fundamentally, but none of them seem to be big enough. Please leave any observations in the comments.

As a famous net.person famously said, Software is eating the world. I hope that it will eat the financial institutions. And, we should understand it so that we can participate in it.

What does it mean to us, software developers, architects, and managers:

  1. We should understand the ways the industry is being disrupted.
  2. We should create propositions around how these institutions can address these threats. Easiest is to focus on the category “making it easier”. On “making it cheaper”, we do not have the mandate to propose new business offerings.
  3. We should create some POV papers so that we can start having right conversations with the customers.
 Posted by at 12:35 pm
Jul 102012

When I joined computer science in 1982, I asked a senior of mine what computers do (they were too novel at that time). He looked at me pitifully and told me that they sort records. Eventually, by the time I graduated out of IIT Madras, I learnt about compilers, operating systems, databases, programming languages and a lot of other hard-core computer science subjects. And, I thought that’s what people use computers for. And, once I entered the real world, I knew my friend was much closer to reality than what I learnt in school.

After working with computers for 20 years, when I landed into IT accidentally, it fascinated me with infinite possibilities. In a sense, IT is still figuring out what to do with computers. We can say that originally, companies automated exactly what they were doing with computers. Only recently they are figuring out disruptive models of computers – perhaps as the use of computers in social and mobile settings is becoming ubiquitous.

I have been trying to understand this evolution of IT and how to explain it well to others. I have been talking to several people from various industries to get this perspective. Hopefully, you, the reader, appreciate this perspective.

What IT wants

What IT wants is simple: it wants to support running the business, It does so by using systems. It runs the systems on computers. These computers are hosted in data centers and user desktops. It also may run the communication systems and other facilities.

In the beginning, it was not clear what this support meant. Obviously, a business does a lot of things. It is not even clear that there are similarities between different businesses, right? So, what does it mean supporting the business?

Turns out that businesses are not all that dissimilar. Thanks to the rise of the modern enterprise, there are standard parts of a business – HR, supply chain, inventory management, finance, customer relationship management etc. In fact, these “parts of business” are so standard that people learn these subjects in college during their MBA.

Now, can IT do all these things? If it can automate, what do you need people for? Actually, in the beginning the range of problems that IT was solving were few. For concreteness sake, let us consider the insurance industry.


[If I made mistake in the actual details, please overlook.] The gist is that the state of the art permitted only some portions of the business operations to be automated. More over, some of the automatable tasks may not be automated because IT did not know that they could be, or the problems are not modeled that way.

In the beginning: Rise of ERPs

In the late 80’s and the early 90’s, there was lot of hype around object oriented analysis and development. People had access to all the new fangled technologies: data bases, compilers, cheap computing power, and networks. Add to this heady mix, the new methodology of OO. It promised that we can uncover the “What” part of IT: “What needs to be built”?

By mid-90’s dissolution set in. Most projects stalled or miserably failed. IT departments realized that they cannot build software, merely because they have the tools. Just then, ERP became very popular. Here is what they offered (For instance SAP):

  1. Standard programs that suit most needs
  2. Standard processes for most of the common tasks in different industries
  3. Common data models for most industries

Companies started signing up aggressively. They reasoned, rightly, that these ERP solutions solve many problems in their IT: They know what to automate. They know how to automate it. Standardization provided them all the efficiencies they ever wanted. SAP went from industry to industry, functional area to functional area and offered a reasonable model to support business operations.


That is, there is a comprehensive platform to run businesses using a set of modules or components from a vendor. That is obviously a big step in an IT department, in fulfilling their mission.

With the internet: Integration and differentiation

An interesting thing happened along the way in the late 90’s: internet. It suddenly added a few more expectations on the IT such as e-commerce, intranet, internet based customer programs etc. Unlike the standard HR module or finance module, these programs did not have standard models. The scope, the operations, the execution – none of this was standardized.

What it meant that it was difficult to implement in ERP systems. Remember that these ERP systems create standard models (data, process, application) for users. With internet evolving at breakneck speed, companies scrambled to implement in whichever technologies they could: Perl, php, C, java, or whatever.

Why were they implementing these new systems? Couldn’t they have done in ERP? The answer is complex: you could say that ERP systems did not have the technical capabilities that the new applications demanded. That was true, but only in the beginning. To understand main reason see the larger context:


First a little terminology: systems of record are the ones that are standard in any business. The systems of differentiation are the ones that distinguish this business from other businesses. In fact, they are sufficiently different that ERP’s have not made it a part of their standard offerings.

As you can see that there are two ways that these systems can be built:

  1. Customize the ERP systems: Naturally, ERP vendors are smart enough to offer various kind of mechanisms to customize their systems. They offer ways to customize the data models. For instance, if you want to add one more field to track your inventory, you can easily do that customization. You can add to the business logic using their programming systems (for instance, you can use ABAP to customize SAP).

    Unfortunately, while it is clear how to customize the ERP systems, they have two problems:

    • The cost of customization is high: When you hear of jokes about ERP systems being “corporate crack” you understand the high level of frustration about never-ending projects about ERP customization.
    • Upgrade costs become almost as much as implementation costs: Most companies who implemented ERP in the mid 90’s started realizing the hidden costs of customization in mid 2000’s. The cost of upgrades have several components: understanding and documenting the customizations, upgrading the customizations, increased costs of inducting new developers and analysts, not being able to use off-the-shelf components etc.
  2. Build new systems: Building new systems is easy – in fact, most developers prefer to do that. In the beginning days of the internet, thanks to rapid development frameworks offered by the scripting languages and other systems, people developed lot of applications. However, they fall into the following trap:
    1. Not integrating well into the ERP eco-system: For example, if you added a training system like Moodle into your application soup, you have to address how that system integrates with your HR system (which may be in ERP). You have to make it both use as well as contribute corporate data.
    2. Not coordinating well with other custom systems: To take the same example, your training system may have to be a part of the process with other systems as well as ERP. For instance, if you designate a course on industrial safety as mandatory, you have to use moodle to enforce that mandate.
    3. Lack of governance and control: Suppose some other department decides to implement a blogging system (say Drupal) that also has capabilities to run training programs. If you don’t have good governance, your training programs end up fragmented which make the coordination and integration difficult.

What it points out is that while custom applications may offer an advantage, they suffer from the several problems. One particular way the industry handled it was through some specific technologies.

Integration technologies


There are several different technologies that evolved to support integration. At their heart, they support integration at data level (EDI, MDM, XML…), at application level (SOA, web services), at process level (BPM), at front end (portals).

In addition to technology standardization, these tools and technologies bring governance, standard ways of bringing in differentiation, and patterns of usage. In effect, the systems of differentiation making differentiation a well-understood, engineering led process.

So, let us review what the business operating platform is:


  1. ERP’s are still the center of the IT offerings, both in terms of budget and strategic importance.
  2. Differentiation is achieved through special purpose technology platforms and tools.

Pressures of disruptive models

One of the big trends in 2010’s is the way conventional wisdom has been upended: until then, technology was supporting business. It was essentially codifying what business wants in IT systems. It was helping businesses scale. For instance, if you have an insurance company in 1950’s (I am reminded of umpteen movies set in 1950’s, starting with Apartment), however well you are doing, if you try growing, you end up with unmanageable mess.

The scaling of business systems was not particularly disruptive. It is a labor saving device, which helped the growth of organizations. The fundamental capabilities of business didn’t differ – only the ability to offer to lot more people was the new addition thanks to Technology.

Last few years we have been seeing some fundamental shifts. I suppose that Internet started it all. For the first since computers appeared in the enterprise, they started impacting the business fundamentally. The reason is simple: the consumers of the services and products, the employees that provide these services and products, the management that plans and executes the activities, the financiers that oversee the use of the assets, the investors that learn and invest in the companies – all of them are using computers in innovative ways: to learn, to socially interact, to carry out tasks, to communicate, to organize – and do all of these in highly connected devices.

For instance, when we see an Unbank like www.simple.com come up, or an Unhotel like www.airbnb.com come up, we have to question ourselves: How do these disruptive models impact the IT of the modern organizations?

Transforming the business operating platforms

Imagine if we are building a modern insurance company. How do you build it differently? First of all, the standard services IT provides are not much value. Think payroll – do they even need it? Can they get a company take care of it for them? Email system? They can use gmail. Marketing? They can use mailchimp.

In fact, for most common needs, the enterprise really doesn’t have to spend money on IT. Take a look at the kind of tools that are available to a startup today: http://steveblank.com/tools-and-blogs-for-entrepreneurs/ and http://startuptools.pbworks.com/w/page/17974963/FrontPage . Any IT department would love to offer those facilities.

Here is the way I think the role of IT is changing:


What we observe is this:

  1. The role of IT is moving away from standard operations.
  2. ERP is only partially outsourced. For the reasons of security, control, and integration, lot of intertwined functionality is retained under IT.
  3. The Business Operating Platform that IT develops and controls is steadily shifting higher.

How does this new IT address the disruptive business models? Not yet. These systems only address the differentiators –the ones that help them compete with other businesses that are similar to them.

If you look at the disruptive models, most of them that are coming out of startups or innovative spaces are around a few ideas:

  1. Reach the consumers in new and innovative ways: For instance how instagram or hipmunk do it.
  2. Get more information out of available data:  LinkedIn, Groupon, NetFlix leverage the data and precisely target the customers.
  3. Understand the market better: By working well with the social universe and creating the right market for the services. For instance, several FB based companies do that (example: Zynga).
  4. Go after the long tail: Organizations have been focusing on serving the statistical median that they are losing the growing long tail (hipsters, anyone?). Several companies, thanks to the reduced costs of services and IT, are able to create a market for the long tail.
  5. Simplify the offerings: Some companies (actually, Google did it), are able to take a moribund sector and simplify the offerings to revitalize the market. For example, staid telephony market is shaken by DIY company www.twilio.com.

This list is not comprehensive. In fact, it is continuously growing. The hallmark of these ideas is not how they are implemented, but how they are disrupting the way the world works.


As you can from the above picture, the standard operating platform business is incorporating the systems of innovation as well. It is not yet clear how this will eventually evolve, but for now, there is a great deal of excitement.

Another way to look at the picture is how the BOP transformation is evolving:


The evolution of the BOP towards the innovative space and the increasing emphasis IT is placing on it is indeed interesting. For a full description of the architecture perspective of the same, refer to: http://www.kanneganti.com/technical/it-transformation-an-architecture-perspective/.

Of course, there are lot of open questions: How do we build for something that is constantly changing? If we want to run the business, how do we make use of systems that may be in flux? If I make my web applications mobile, am I transforming the IT? Someday, I will get to all those questions in this blog.


Let’s recap once: we started with the rise of modern ERP systems. We traced the need to create flexible applications and the rise of integration. We later showed how the differentiation begat new technologies that came to address extension or customization or filling-in-the-gaps of an ERP system.


The modern business operating platform looks more interesting than before. As IT sheds unnecessary, non-value-add tasks like productivity applications, it is freed to create more innovative applications. As shown in the picture above, IT needs to learn not only new technologies, but also need to understand what needs to be done. Whether IT can step up to it or not, I cannot say.

Ob Plug: My group in HCL (Enterprise Transformation Services) has been working on the modern business operating platforms for a while. We have tools, frameworks, sample platforms etc to support the POV expressed here. In fact, BOPT™ (Business operating platform transformation) is one of the main service offerings from my group in HCL. If you would like to know more, shoot an email to me and I will put you in touch with right people.

 Posted by at 6:58 am