Apr 142014
 

I looked at the way that the corporate training happens in technology areas and I find them wanting in several respects. This is my attempt at a bringing some best practices to corporate or institutional training.

Most corporations have training programs, especially the ones that deal with IT technologies. The goal of these trainings is to train people so that they can be useful, productive members of a project.  This is meant for training competent engineers, craftsmen who can support delivery of projects.

A typical company gets two kinds of people to train:

  1. People fresh out of college: They went through latent learning, without clear goals. They learnt different subjects in college, without clear understanding of how that knowledge is meant to be used. They tried to understand concepts without real practice.
  2. People with a little experience: They worked on few projects, churning out code. Conditions in the industry is such that they were not exposed to quality code. Most of them do not understand the value of quality not do they understand what quality is.

Current training methodology: What is wrong with it?

Typically any corporate training follows a standard process: they get nominations on who to train. They hire or get instructors, who are experts in that area. They put all of them in a room, sometimes away from all the distractions. Over the course of a day or two, the instructors will take them through (with the aid of power point), the nuances and the details of the material. For example, if the training is on Java, the students will go though the static methods, public, and annotations etc. If the training is advanced, they might even cover some patterns of usage as a way of best practices. are

Typical evaluation of students are carried out through multiple choice questions which will test the users on the intricate details of the language. These questions cover a lot of trick questions to test the understanding of the language.

What are the problems with this approach? Let me count the ways:

  1. It doesn’t train the students for what they encounter in the job. It doesn’t make them a better developer or project manager or whatever.
  2. It only tests the book knowledge, which is almost one Google query away. It is that much cheaper to invest in a good quality internet connection.
  3. After a few days of training, they forget the knowledge. Of course, they can always look up a book when they need to – but that was the situation they were in, to start with, anyways.

Even if we train them using actual small projects, these problems will continue to exist.

New way of training: What we want to achieve

The education market in the US is being continually disrupted. I am going to list a few lessons from those disruptions and later describe how to apply those lessons to our training process.

image

Let us see each of these lessons in turn.

Inversion of classroom and home

Known as Flip-teaching, this method of teaching became popular because of Khan Academy. The problem with class room training is that the teachers are going at lecturing at the same pace for everybody. When the students need help with homework, they are on their own.

image

(Image via: http://www.washington.edu/teaching/teaching-resources/flipping-the-classroom/)

In the flipped learning, the instructor doesn’t teach via lecture. There are enough number of books and videos that can teach the subject. Instead, the instructor, in the classroom setting, works with the group to solve problems.

Practicing for muscle memory

Here is a quote from the book, Art & Fear:

The ceramics teacher announced on opening day that he was dividing the class into two groups. All those on the left side of the studio, he said, would be graded solely on the quantity of work they produced, all those on the right solely on its quality. His procedure was simple: on the final day of class he would bring in his bathroom scales and weigh the work of the “quantity” group: fifty pound of pots rated an “A”, forty pounds a “B”, and so on. Those being graded on “quality”, however, needed to produce only one pot -albeit a perfect one – to get an “A”. Well, came grading time and a curious fact emerged: the works of highest quality were all produced by the group being graded for quantity. It seems that while the “quantity” group was busily churning out piles of work – and learning from their mistakes – the “quality” group had sat theorizing about perfection, and in the end had little more to show for their efforts than grandiose theories and a pile of dead clay.

If you want to learn JavaScript, write lot of code. Through writing the code, you keep improving. None of those “Learn X in 10 easy steps” can get you to this level of deep learning.

Hacking community realizes this aspect of learning well. Look at the following approach to learning by coding a lot: LxTHW (Learn X the hard way). The approach started with a popular book on Python, called Learn Python the Hard Way. If you follow that book, you don’t endlessly learn syntax and nuances of the language. Instead, you will code, and you will code a lot. It starts out this way:

  1. Do not cut and paste. Type the code as is.
  2. Make the code run.
  3. Now, modify the code to solve slightly different problems. And, make that code run.
  4. Keep repeating till you internalize all the concepts through practice.

In fact, this kind of learning through practice is applied by several people successfully. An example that I found very impressive was the case of Jennifer Dewalt. On April 1st, 2013, she started on a project to develop 180 websites in 180 days – one each per day. All these websites are simple enough to be coded in a day. With practice she got better and better; you can see the progress of her websites for yourself.

Even more experienced programmers like the inventor of JQuery, John Resig, feels that writing code everyday helps him keep his skils. Here is the famous blog post that he wrote: http://ejohn.org/blog/write-code-every-day/

In summary, our courses should not be teaching tricks and nuances of languages, libraries, or tools; they should be teaching the people practicing the craft.

Attention to basics

The big obstacle is coding or practicing the craft is not having the right basics. Even when two people are practicing equally, the one with the better basic tools will win.

image

Unfortunately, most colleges do not teach the tools of the trade. They focus, rightly on the fundamentals. Yet, there is no part of the education that covers the good use of tools and ways of working.

If practicing is the way to teach, the students need to have the right infrastructure to practice. In fact, practicing on that kind of infrastructure teaches them on how to use such infrastructure. These days, the basic infrastructure should be:

  1. Making use of the cloud
  2. Making use of the right OS, editors, and IDE
  3. Making use of the right tools for version control, bug tracking, requirements etc

Even for small projects, it is best to have at least a cloud based setup to try the code, a version control to keep the code history, and the right editor to train your muscles to use the right key strokes.

Quality through osmosis

We can get people to practice on the right infrastructure and churn out finished goods fast. But, will the output reach world-class quality?

While we made the argument that quality comes from quantity (through long practice), a more important ingredient is having right role models. That is where right mentorship can help. This is where intervention by a teacher that give feedback on quality of the output can help.

There are multiple ways we can bring this to practice – say if you are learning programming in JavaScript:

image

Especially, in the third step, and to lesser extent other two steps, good mentors can help. They can point out what the best practices are, idioms are, and why some practice is good.

Assessment through real-life tests

The current crop of testing, because of automation, is focused on multiple choice questions. Unfortunately, it only focuses on the nuances of the programming language or system. The world is far more complex; there is no single correct answer. Even if your problems were to come in the form of these questions, you could always find out answers from the internet.

In contrast, in real life, you will have to produce code or an artifact. Why not prepare you for that, during training? Here is an approach that works:

  1. Pick a large enough problem that we encounter during our work. Lot of these problems require “programming in the large”.
  2. Abstract it sufficiently so that it can be solved in limited time. For instance, you can choose only one module (say, adding new users to the system).
  3. Write a full fledged system that solves that sub-problem.
  4. If multiple people choose different parts of the system, then, we can have a fully functioning system at some point.

Training process

If you were to subscribe to this approach of corporate training, how would you start? Here is the approach I suggest.

  1. Start with a clear statement of what you want the students to be able to do.
    This part is surprisingly difficult. For instance, do not say “I want my students to learn Java”. That does not give them a new capability. Instead say “I want them to solve the following kinds of problems”. Now, these problems can be your way of assessing their status after the training.
  2. Create a pre-conditions for the material: Not just only for assessment, but as a way of setting expectations, you could ask the participants to do some tasks. For instance, if they are going to be doing JavaScript programming, they should know how to use Firefox or Chrome extensions. You could test for that.
  3. Create a curated content of the material: Since there is so much of material online, create an annotated list of material. For instance, you could give a link to the articles, books, slideshare entries or Youtube videos, with some notes about what the expectations from that videos are. You could construct them as a series of lessons.
  4. For each lesson, create a set of problems they should solve: In fact, more problems they practice, the better it is. If there is an entity like LxTHW, we could just follow that. If not, create a set of problems for them to solve so that the lessons really sink in.
  5. Create a basic infrastructure with tools: Even before they start solving problems, provide them an infrastructure such that:
    1. They can use the tools to develop the solutions
    2. They can use the infrastructure to collaborate with others
    3. They can use it to show the mentors
    4. They can test their solutions
  6. Provide the mentors: This part is tough. Just as we mentioned earlier, at the minimum, show them (embed in the curated content), what you consider as good quality to achieve.
  7. Create a post-training evaluation: Crate a large enough problem for the people to choose a part of the problem to solve. Using the mentors see
    1. how fast they are developing the solution (it indicates how they internalized the solution – practice makes for speed).
    2. how good a solution they are developing (it indicates how well they learnt from the masters)
  8. Create a community so that they can interact with each other post training: A training is not complete unless it is followed up. Since resources are difficult to get, use the power of community. Create a social space where people can discuss what they learnt even after they graduated from the course.

Concluding Remarks

image

I am no fan of corporate training; but I realize that not all people are self-motivated to learn and learn well. I think corporate training can be reformed to take advantage of the modern best practices such as incorporating internet material, using repetitive training, and intentional techniques in training, especially for acquiring technical capabilities. This note is an attempt towards that direction.

 Posted by at 1:12 pm
Mar 272014
 

You are the CIO. Or, the director of application development. You hear about consumerization of IT, in different contexts. You hear about mobile applications developed by teenage kids in only months, and these apps are used by millions of adoring people. And, your business people are demanding why can’t be more like those kids.

What should you do? Retrain your staff on some of the consumer technologies? Get a partner who has the skills in the consumer technologies? Move existing applications to consumer-friendly devices? Are they enough? And, why are you doing all these anyway?

In last couple of years, I have been working with different IT leaders to evolve a definition and an approach to this problem. By training, I am a technology person – the kind that develops the consumer technology. By vocation, I help IT departments help adapt technology to meet their strategy. Being in both the sides of fence, I have a perspective that may be interesting.

This note is derived from a few presentations I made at different industry conferences. The self-explanatory slide deck is available at: http://www.slideshare.net/kramarao/consumerization-32279273

Let’s look at the following three questions, in order:

  1. What is consumerization of IT?
  2. How does it affect IT departments?
  3. What should the IT departments do about it?

Consumerization of IT: A bit of history

The as coined in 2001, consumerization refers to the trend of building application that are people centric. Have we not been doing that always? Yes and no. While we were developing the applications for people, our main focus was some where else. The focus was about either growth of the business (by managing the volume of data), automation of activities to speed up the processes, or automating the entire business value chains, or only recently, focusing on the customers.

When we were building applications earlier, we were building them for a purpose: to solve a business problem. People were another peace of the puzzle – they were meant to be a part of the solution, but not the purpose of the solution.

Enterprise IT application development

How are these applications developed? Take a look at the sample development process.

image

In the current traditional situation, the EA people map the needs of an enterprise to a technical gap, see if there is a packaged app, and either customize one or build a new one. The biggest questions often boil down to “Build vs. Buy” or, what to buy.

A few things that you will observe are these:

  • Applications take long time to develop: Typically they are large, and serve long term needs of the enterprise. Any other kind of applications are difficult to retrofit into this model of development. For example, if you want an application by marketing department for one-time event, existing processes of IT makes it difficult to offer that service. That is why, we find marketing is one of the prime movers behind consumerization.
  • Applications serve common denominator: They address most common needs of the people. If your needs are very different from others, they will be ignored, unless you are the big boss. No wonder, that IT departments still develop applications with “Best viewed on IE 6+” sticker.
  • Applications lag behind the market needs:  Since the focus is to create the applications with longevity, the design uses tested technologies. The pace at which these technologies are evolving, this design decision makes the technology foundations obsolete by the time applications are delivered. For example, even today, IT departments use Struts in their design – a technology that is already dead.
  • Applications, developed based on consensus needs, lack focus: Since there is a going to be one large monolithic application meeting requirements of several groups with different needs, the applications lack focus. For example, the same application needs to support new users and experienced users. Or, it needs to support management interested in the big picture view and the workers interested doing the processing.  Naturally, any application that is developed to such diverse and divergent needs ends up being unfocused.
  • Applications are expensive to develop: Compared to consumer apps, where we hear apps getting developed for a fraction of cost, the process and the “enterprise quality” requirements impose lot of additional costs on the application development.

That is, this process yields applications that are built to last.  Let us look at how consumer applications are developed.

Consumer application development

Historically, consumer applications have been developed differently.

image

As you can see, in each era, the consumers are different; the focus is different; and the distribution mechanism is different. File it away, as this historic view is important as we look at consumerizing IT. Dwelling deeper into the current era, we see the following:

image

Consumer applications almost always are better focused on end results than the users needs. For example, take the case of Instagram. In its history, it discovered if it followed user needs and demands, it would end up being another FB clone. Instead, it decided to keep the focus on one metric: “How to get most number of photos uploaded by the consumers”. That focused design led to its success.

Consumer applications are also built in collaboration with the consumers. By creating a model of constant experimentation, feedback from the field, and ability to improve the application, without ramifications of user support, the creators of the applications are able to build systems that are “built for change”.

But, what are the disadvantages for the consumer applications, compared to enterprise applications?

  1. Only interesting applications get developed: Go to Apple’s app store, and you find so many applications around weather apps or gaming apps. You do not find enough applications to do genome analysis. Developers are impatient with problems they do not understand, or the problems that require lot of knowledge to solve.
  2. Capabilities may be replicated in many applications: The strength in consumer applications, namely catering to different groups of people, means some core functionality gets repeated in applications. Instead of getting high quality apps, we might end up with lot of apps that are mediocre.
  3. Lack of uniformity in solutions (depends on the platform): While some platforms are famous for creating a uniform and consistent experience, the applications themselves, provide fragmented experience.  Unlike enterprise applications, they lack control or governance.

Consumerization: Why should IT care?

We established that enterprise applications and consumer applications have different focus. We also established that they are built, distributed, and operated differently. Still, why should IT care about consumer applications? Why should it consumerize itself?

I can think of three reasons.

Consumer focus of the businesses

Several service industries like retail, banking, entertainment, music, and health deal with consumers daily. Their business models are being disrupted by startups that bring new technologies and new breed of applications. While IT does not exactly lead the business transformation, at least by bringing the right capabilities, IT can support businesses better.

Internal users as consumers

Demographics of the employees are changing. More and more young people are joining the workforce. They are used to different kind of experience using modern devices and modern applications.

image

Even the older people are used to consumer applications: they use Gmail at home, facetime on their IPad, Facebook on their laptop, and LinkedIn at work. They come to work and they use Exchange without the benefit of Bayesian spam filters; they use Lync video instead of facetime or Hangouts; they do not even have something like Facebook or LinkedIn at work.

By not exploiting the modern technologies and application paradigms, enterprises are risk losing productivity and ability to attract the right talent.

Cheaper and better Consumer technologies

Large investments in the consumer technologies are making them cheaper and better, at a faster pace than the enterprise technologies. For instance, git is improving at a faster pace than perforce. Those companies that took advantage of the cheaper alternatives in consumer technologies, reaped the benefits of cheaper and better infrastructure, application construction, and operations. Google built their data center on commodity boxes. Facebook leverages open source technologies fully for their needs. The following are the main reasons why the consumer technologies are often better choices than the enterprise grade technologies.

image

So, considering that enterprises are being pushed towards consumerization, how should IT react?

Consumerization: an IT perspective

The best course of action for IT is to get the best of the both worlds. On one hand, it cannot run business as usual without its control and governance. On the other hand, it cannot react to markets innovatively without the consumer technologies.

image

As we bring both these best practices together, we see some interesting facts emerge. At least for certain aspects of application domain,  we see that old style of large scale application development does not work.

image

As consumerization increases,  we end up with large number of small applications instead of small number of large applications. Of course, the definition of application it self is subject to change. For our purposes, consider any isolated, independent functionality that people use to be an application. Historically, we used to talk about modularizing the application. Instead, now, we break down large application into smaller pieces. Some of these smaller pieces may have multiple versions to suit the extreme segmentation that consumerization supports.

If we are moving towards such IT landscape, what does it mean to traditional activities? In fact, all the following are impacted by this aspect of consumerization.

  • Development
  • Life cycle plan
  • Deployment
  • Support
  • Governance
  • Enterprise Architecture

Let us look at some of these challenges.

Challenges in consumerization of IT

I see three challenges in consumerizing IT.

image

These costs are easy to rein in, if IT can bring in some of the consumer technologies. In the next section, we will describe each of the technology changes that can help IT address these challenges.

Coping with consumerization: A recipe for IT

There are four questions that we should ask ourselves as we are embarking on consumerization of IT:

image

Each of these questions require an adjustment to the way IT operates. Each of these key concepts need full explanation. Since this article already has grown long, I am going to be brief in describing the key concepts.

Pace layered architecture

The idea behind pace layered architecture is that different parts of the IT move at different speeds. For instance, front end systems move fast as the technology advances faster there. The ERP packages move slow, as they focus on stability. Based on this idea, we can divide IT systems into three groups:

  1. Systems of record
  2. Systems of differentiation
  3. Systems of innovation

If we were to divide systems this way, we know where consumer technologies play a big role: systems of differentiation and innovation. To take advantage of this idea for consumerization, I recommend the following steps:

image

Platform based development

Typically, when we develop applications, we are given just a set of tools that we can use: Java, app server, database etc. Putting together these parts into a working application is left to the developers. At best, standard configurations are offered.

Most of the tasks developers need to do are standard: add security, add user authentication, add help text, add logging, add auditing, and so on. Considering that there are lot standard tasks that developers need to do, is there a way that we can reuse the effort?

We have been reusing different artifacts in several ways. We used libraries, templates, and frameworks. With the advent of cloud technologies, we can even turn into a platform that is ready for cloud. Platforms turn out to be useful in several ways: they standardize development; they reduce the costs; they reduce the time to get the systems ready for development; they improve quality of the systems.

In addition, within any enterprise, there might be standard best practices that can be incorporated into the platform. With these additions, we can enforce governance as a part of the platform based development.

There are industry standard platforms as well for specific purposes: Google platform, Facebook platform, Azure platform, and SFDC platform. Each of them offer different capabilities and can be used for different purposes. Instead of standardizing on one platform, an enterprise will have to classify its needs and plans, categorize its applications, and from that basis, devise multiple platforms for its needs.

Internally, Microsoft has positioned SharePoint services and Office 365 as such a platform. Coupled with .NET technologies, it can be a full platform for delivering user defined applications.

Backend as APIs

The potential of the platform can be fully realized if the enterprise data and functionality is available to the apps developed on the platform. For instance, the data locked in the ERP applications is valuable for many modern applications. Existing logic within the system is difficult to replicate elsewhere and may be needed by the application.

By providing this information, both data and logic alike, as API’s, we can enable internal application as well as external applications. In fact, the current application development paradigms around API based front end development offer several different frameworks for this style of development.

image

Using REST+JSON API’s, we can develop web as well as mobile applications from the same backend.

Modern app stores

Once applications are developed, they need to be operational for the people. There are four different aspects to putting applications to use.

image

There are several different ways such an app store or platform for delivery is handled historically. Popular choices for different ecosystems include, Apple’s app store, Google Play, FB Apps, etc. If we build it right, we do not have to restrict the app store to mobile applications alone. Instead, the same delivery and support mechanism can support mobile and web applications as well.

Concluding Remarks

Consumerization of IT is a desirable trend, if handled correctly. The right way to handle to bring the useful elements of consumerization to appropriate kind of applications. The core features from consumerization include conceptualization of apps via pace layered architecture, development via platforms, integration via API’s, and delivery via app stores.

Mar 242014
 

All of us developers can write code. We need designers for the look,feel, page designs, and flows. To get the services of a web designer, often we need to do a prototype. If the design is bad, nobody may even see the potential of the website. For a lot of web apps, there may not even be enough budget to attract a designer.

What should the developers do? Can they do the web design by themselves? Even if they can’t do a world-class design, can they create reasonably good web pages? This note, created from some of the lessons I taught my developers, answers the questions.

Small rant: Historically, clothes were hand made. They were ill-fitting, expensive, and of poor quality. After Mr. Singer came up with machine, they were cheaper and had better quality. Over the time, they became reasonably well-fitted too.  Still, a modern custom Italian suit is better than a mass produced suit.

Most people hiring UX designers want that Italian design better than the machine design. But, in reality, they are getting the ill-fitting, expensive medieval designs. For them, a better choice is a machine design – that is, a factory approach, using standards based designs, with mass customization. It is an engineering approach like that of Mr. Issac Merrit Singer’s. If you have the right budget, of course, you can go for high end tailors or high end designers.

If you have not read them already, please read the following blog posts:

  1. http://www.kanneganti.com/technical/tools-for-developing-and-optimizing-web-firefox/ – You will understand how to use Firefox for developing for the web and optimizing for the web.
  2. http://www.kanneganti.com/technical/html5-design-knowledge-for-application-developers/ – You will find the basics of HTML5 (a bit outdated, perhaps) that is required for the modern web development.
  3. http://www.kanneganti.com/technical/what-i-look-for-in-user-interface-designers/ – This is the direct predecessor for this note. I will be showing some standard ways that you can approach the design from an engineering perspective.

If you are strapped for time, you can skip to the last section, where I give you a standard way you can develop web pages, with good chance of success.

We’re not going for the originality. We are looking to use standard resources, standard tools, and standard best practices.  let us study the core elements of a web design: Styles and Trends, UI Elements, Interactions, Fonts, Icons, Colors, and Layouts.

Styles and Trends

This is my soon-to-be-outdated advice: follow Google style. Use affordance where you need instead of full flat design. Don’t go for Skeumorphic designs, as they are difficult to design and maintain.

Skeuomorphism: Imitating the real world. That is, when you find a notebook imitating physical notebook, that is skeuomorphic design.

iOS, iPad and Skeuomorph Interface Design

There are positive points for this design, of course. The user instantly recognizes it by identifying with the real-world counterpart. For novice users, it is a worthwhile association. For example, if a novice user sees a notebook with familiar ruled yellow paper, they know what it is for.

But, the flip side is that it is difficult to design. And, it gets annoying quickly. For an experienced user, it is hurdle for frequent usage.

Follow the link to see the set of examples to understand how it can easily go overboard.

Even Apple realized the excesses of Skeumorphic design and began to prefer flat design. This design reduces clutter and simplifies the appearance. However, flat design loses affordance. That is, if there is a button, you press it. If there is patch of color, you don’t think of pressing. Still, it is the recent trend. You can find different themes for the standard layouts. I find it quite useful to present information.

image

For instance, the above picture is good illustration of information presentation using flat design. You can see lot of examples of the flat widgets in Flat-UI project.

How about bringing some affordances to flat design? See http://sachagreif.com/flat-pixels/ for ideas on how to do it. Simply put, you can do flat design, but add shadows, gradients, and even buttons with relief to retain the usability of the design.

Some of the other trends include blurred images for background and images as seen on mobile and tablets. You can find lot of discussions by hanging around http://forrst.com/posts.

UI Elements

A page is composed of several UI elements. Some are basic, a standard part of HTML, such as tables, buttons, forms etc. Some are built with these basic elements: menus, breadcrumbs etc. Here are some simple rules in creating a standard set of UI elements:

  1. Do not design any UI elements. Borrow from a consistent set. Please see the layouts section for further details on standard set of UI elements that come with toolkits like bootstrap.
  2. For higher-order UI elements, again, do not design your own ones. If the toolkit provides them, use them. If you must design, create a catalog of semantic elements you need and use that to guide a standard set of elements.
  3. For a standard set of UI elements, please see some examples:
    1. http://ui-patterns.com/ – for standard design patterns with examples
    2. http://www.smileycat.com/design_elements – lot of examples from industry
    3. http://quince.infragistics.com/ – examples such as date pickers and so on. Lot of standard UI elements.
    4. http://www.cssbake.com/ – more focus on the basic elements – these can be used to spruce up the ones that come with the layout toolkit.

image

Interactions

These days, the web page design is not static. Depending on the user interactions, we need to change the page. For instance, when the user selects a particular part of the page to interact, perhaps it makes sense to remove the unneeded parts of the page. Lots of these interactions are accomplished with JQuery and its plugins. Some of the standard interactions are table design and infinite scrolling that you see on Facebook.

Here are some places you can find more interactions:

  1. https://plugins.jquery.com/ – where you can find lot of examples of JQuery plugins.
  2. http://www.unheap.com/ – slightly better interface, but fewer examples

Fonts

The easiest way to spruce up a page is to use right kind of typography. Here are a few guidelines:

  1. Use only a few kind of typefaces. I would limit my pages to use no more than three–one serif, one sans-serif, and one fixed.
  2. Do not assume that your fonts are available in target system. For example, Calibri is not available on Linux.
  3. Feel free to use free fonts available from Google and Adobe. They are easy to use; it just takes two lines to add them to your pages.
    1. http://html.adobe.com/edge/webfonts/ – Edge fonts from Adobe.
    2. https://www.google.com/fonts – Google free fonts.

image

Not all fonts are suitable for all situations. To see some beautiful examples of free fonts, visit: http://www.smashingmagazine.com/2014/03/12/taking-a-second-look-at-free-fonts/.

To use the fonts well, you need to understand how to use sizing of fonts to your advantage. If you use the UI patterns appropriately, you will know how to use the right font size to do a call out, or promote. You also will understand how to use colors to indicate the information classification to the users.

Icons

Icons make information easily identifiable and usable. Even the simplest icons provide information quickly. For example, take Otl Aicher’s stick figures: he designed the icons for the Munich Olympics and changed the way public communication occurs through pictograms.

800px-Olympic_parc_munich_pictogramms_ice_rink_0651

In web, icons play even bigger role. There are two ways to use icons:

  1. Using icons as images: For instance, you can find many sets of icons that are free to use in your website. All you need to incorporate these icons is to download them and use their jpg/gif/svg in your website.
    image
  2. Using icons as font: The problem with using icon images is that you cannot manipulate them. For instance, you cannot resize them (in svg, you can, but in others, they lose fidelity). You cannot change the color (you need multiple sets). You cannot transform them. Visit https://css-tricks.com/examples/IconFont/ to understand how icon fonts can be colored, resized, slanted, shadowed etc.
    image
    If you are using icon fonts, you can start with: http://fortawesome.github.io/Font-Awesome/ that go well with bootstrap. Or, you could look at comprehensive library like: http://weloveiconfonts.com/

Still, if you need icons with multi-colors, you need to use the images.

Colors

Lot of engineers feel challenged when asked to choose the colors. They resort to bland colors that do not work well with each other. If you are choosing a theme designed by an in-house designer, or a theme provider, the choices would be made for you. If you need to customize a standard theme, you can consider the following:

  1. https://kuler.adobe.com/create/color-wheel/ – the color wheel is a standard way of choosing a set of colors that go well together. There are different variations – monochromatic to triad, or complementary to analogous. Choose one color and play with the combinations that work well with that color.
  2. http://colorco.de is also a nice interface to using color wheel. Feel free to select the type of the color scheme you want and move the cursor over the colors to vary the combinations.

image

Layouts

The layout of the elements of the page is a crucial step in web design. The earliest designs did not understand the web as a different medium than desktop applications and gave us the designs laden with the same kind of menu’s with dropdown choices. Web is a dynamic medium that can adjust based on the context, the user, and the situation.

There are two parts to the layout: what should be in the layout, and how they should be laid out.

Conceptual elements in a layout

What should be in each page or layout is a much bigger topic than this simple post. I will describe in a separate post. Meanwhile, here are the fundamental rules to remember:

  1. Make the pages task oriented for most post. If we need exploratory elements, use them as right recommendations in the page.
  2. Do not clutter the page with all possible choices. Give the choices that make sense only in that context.
  3. Give the user ability to escape the current thread of flow.
  4. Feel free to hide and show the elements on demand. That is, we do not need to get a new page for every change in the page.
  5. Respect the URL – The user should be able to bookmark and land on a page and carry on transaction from there; or, the user can do back and forth among the URLs.
  6. Set a few standard page elements that reduces the need to learn for the users.

Soon, in other posts, I will describe the process of designing the elements of a layout.

Physical layout

Physical layout part has become simpler. These days, if somebody is developing the front end, the layout should satisfy the following:

  1. It should work on any (modern) browser: Browser wars are oh so ancient. The web page should be viewable on any browser. However, since most modern web technologies require modern browsers, we can assume usage of modern browser (thanks to the mobile revolution), that is beyond IE7. Companies like Google already are pushing the market beyond even IE9. Browsers like Chrome and Firefox keep themselves updated to support most modern features.
  2. It should work on any form factor: The layout should support any size of the browser. Some devices can only support smaller size browser; some support different orientations; some support different resolutions. Our layout should work on all these varying sizes, orientations, and resolutions.

The second capability is called “responsiveness” of the design. How do we know if a layout is responsive? Please read: http://www.kanneganti.com/technical/tools-for-developing-and-optimizing-web-firefox/  to understand how we can test for responsiveness.

There are multiple ways we can make a layout responsive:

  1. We can check for the browser size (and the device, while we are at it), and generate the appropriate HTML. This approach doesn’t work well with proliferation of variations in sizes. Besides, what if we resize (or reorient) the browser after we got the page? Reloading the page is so retrograde, and breaks user experience (think re-posting a purchase – not what user expects).
  2. We can indicate using CSS on how to layout: that is, no absolute sizes – only relative metrics. Using the right kind of weights, CSS positioning, we may be able to achieve the design we want.
  3. We can add JS that can redraw the page based on the size: By adding JS that can hide or show elements, we can enhance the CSS to support devices even better. For instance, why show a full side bar with a menu, when we are seeing it on a mobile device, where there is barely enough space to display the main content?

image

While those are typical choices, in practice, you will use one of the following frameworks. These frameworks incorporate CSS and JS to deliver responsive design:

  1. Bootstrap: The most popular choice for responsive design. You can customize what you need and get only the bare minimum needed. As a bonus, you will get a fully integrated set of icons, widgets, JQuery and plugins, and ability to customize the L&F of the site.
  2. Zurb Foundation: Very similar to Bootstrap. The approach is more of a toolkit – it lets you design what you want. It has limited set of UI elements, and is not as opinionated as Bootstrap is.
  3. Pure css: If you cannot use JS (because of organizational policies of using Javascript), you can always use Pure which is a pure css based responsive layout.

There are several other layout frameworks like skeleton with minor variations on these categories. The popular ones like bootstrap come with standard themes. These themes add standard layouts, colors, fonts, images, icons, and even templates. For example:

  1. http://bootswatch.com/ – for free themes, playing on colors, and fonts.
  2. https://wrapbootstrap.com/ – for around $20, you can get ones with lot more standard themes and templates, for different categories.
  3. http://themeforest.net/collections/2712342-bootstrap-templates – from standard theme forest site. You will find several themes and templates for purchase.

Final advice

All of this information is a lot to digest. Here is some simple advice that yields good enough results for most developers.

image

Upcoming articles:

  1. Elements of a layout: What goes in each page? How to decide, from a developer perspective.
  2. Process of web development: A developer-friendly approach.

Drop me a note to see what you would like to know.

 Posted by at 7:09 pm
Mar 042014
 

Every once in a while, I get the urge to work with computers. I want to get my hands dirty, figuratively, and dig into the details of installation, configuration, and execution. This experimentation comes in handy when we discuss the trends in the enterprise. Typically, we neglect processes when we do small scale experiments, but that is matter for another time. Besides, it is really fun to play with new technologies and understand the direction these technologies are heading to.

My personal machine

I wanted to run virtual machines on my system, instead of messing with my own machine. Because I multiplex a lot, I want to have large enough server. That way, I can keep all the VM’s open instead of waiting for the VM’s to come up, when I need them.

Since my basic requirement is to have large amount of memory, I settled on Sabertooth X79 mobo. It can support 64GB, which is good enough to run at least 8 VM’s simultaneously. Someday, I can convert it to my private cloud instance, but till then, I can use it as my desktop machine with lot of personal VM’s running.

image_thumb1024

I have two 27” monitors ordered off ebay, directly from Korea. Each monitor, costing $320, offers 2560×1440 resolution, with stunning IPS display – it is the same as in Samsung Galaxy, but with large 27” diagonal size. These days, you can get them from even newegg.

To support these monitors, you need dual DVI – two of them. They do not support HDMI and VGA would negate all the benefits of such high resolution. The consumer grade reasonable one is built with GeForce GT 640, of which there are several.

Finally, I used pcpartpicker site (http://pcpartpicker.com/p/23MXv ) to put together all my parts and it showed if my build is compatible internally or not. Also, it helped me pick the stores where I can buy them from. I ended up ordering from newegg and Amazon, for most needs. I also had all other needed peripherals like Logitech mouse, webcam, and MS Keyboard etc. from before, which I used for my new computer.

Software

For software, I opted to use Windows 8.1, as I use office apps most of the time. I use ninite.com to install all my apps –  they can install all the needed free apps. Here are some of the apps I installed using that app: Chrome, Firefox, VLC, Java, Windirstat, Glary, Classic Start, Python, Filezilla, Putty, Eclipse, Dropbox, Google Drive.

Since I needed to run VM’s on this machine, I had a choice of VMPlayer or Virtual Box. I opted for VMPlayer.

My cloud machine

While the personal machine is interesting, that was only to free up my existing 32GB machine. The cost of such a machine, with right components is less than $1000. As per software, I had the choice of using ESXi 5.5, Xenserver 6.2, or Microsoft hypervisor 2012 R2. All of them are free, which meets my budget.

I tried ESXi (VMWare VSphere hypervisor), which did not recognize my NIC on my mother board. I tried inserting the driver in the ISO from previous release, but even after recognizing the Realtek 8111 nic, it still did not work. Xensever, on the other hand, worked perfectly well with first try. Since yesterday, I have been playing with Hadoop based Linux versions in this setup.

If you want to try

It is fairly cheap to have your own private setup to experiment. Here is what you can do:

  1. Get yourself a decent quad-core machine with 32 GB. You do not need dvd drive etc. Add couple of 3TB disks (the best is Seagate Barracuda, for the right price). If you can, get a separate NIC (Intel Pro 1000 is preferred, as it is best supported).

    Here is one that is configured: http://pcpartpicker.com/p/34gg0 just to show how low you can go. For around $1K, you can even get one from http://www.avadirect.com/silent-pc-configurator.asp?PRID=27809 as well.

  2. Install Xenserver on the machine. It is nothing but a custom version of Linux, with Xen virtualization. You can login like any Linux machine as well. The basic interface, though, is a curses based interface to manage the network and other resources.

    [Image courtesy: http://www.vmguru.nl/ – mine was 6.2 version and looks the same. From 6.2 version, it is fully open source.]
  3. On your laptop, install Xencenter, which is the client machine for it. Xencenter is full-fledged client, with lot whizbangs.
    SNAGHTML583c865a_thumb24
    It has support to get to console for the machine and other monitoring help. We can use the center to install machines (from a local ISO repo), convert from VMDK to OVF format for importing etc.
  4. It is best to create machines for it, as conversion is a little error prone. I created a custom Centos 6.4, 64bit machine. I used it as my minimal install.
  5. When I installed it, the installation did not allow me to choose a full install. That is, it ended up installing only basic packages. I did the following to get a full install:
    1. The console doesn’t seem to offer the right support for X. So, I wanted to have VNCserver with client running on my Windows box.
    2. I installed all the needed RPM directly from the CD’s, using the following commands:
      1. I added the CDROM as a device for the YUM repo. All I needed were a few edits in the yum.repos.d folder.
      2. I mounted the CDROM on Linux (“mount /dev/xvdd /media/cdrom” : notice that the cdrom device is available as /dev/xvdd).
      3. I installed all the needed software, with one go: “yum –disablerepo=\* –enablerepo=c6-media groupinstall “Desktop” “Desktop Platform” “X Window System” “Fonts””
    3. I enabled network and assigned static IP.
  6. I installed VNC server and customized to open for my display size of 2260×1440.
  7. In the end, I removed the peripherals, and made the server headless, and stuck it in the closet. With wake-on-lan configured, I never need to visit the server physically. 

At the end, you will have a standard machine to play with, a set of minimal installs for me to experiment with on your XenCenter.

What you can do with it

Now, you do not have a full private data center. For instance, you don’t have machines to migrate the VM’s to, setup complex networking among the machines, and connect storage to compute servers. For even with this, you can do the following activities:

  1. Setup a sample Hadoop cluster to experiment: It is easy enough to start with Apache Hadoop distribution itself so that you can understand the nitty gritty details. There are simple tasks to test out the Hadoop clusters.
  2. Setup a performance test center for different NoSQL databases. And, do the performance tests.  Of course, performance measurements under VM’s cannot be trusted as valid, but at least you will gain expertise in the area.
  3. Setup a machine to experiment with docker

At least, I am going to do these tasks for the rest of the week, before I head out on the road.

 Posted by at 2:25 pm  Tagged with:
Feb 232014
 

There is a lot of interest in moving applications to the cloud. Considering that there is no unanimous definition of cloud, most people do not understand the right approach to migrate to the cloud. In addition, the concept of migration itself is complex; what constitutes an application is also not easy to define.

There are different ways to interpret cloud. You could have private or public cloud. You could have just data center for hire or a full-fledged, highly stylized platform. You could have managed servers or instead measure in terms of computing units, without seeing any servers.

As we move applications to any of these different kinds of clouds, you will see different choices in the way we move the applications.

Moving a simple application

Let us consider a simple application.image

The application is straightforward. Two or three machines run different pieces of software and produce a web-based experience to the customers. Now, how does this application translate to the cloud?

As-is to as is moving

Technically, we can move the machines as-is to a new data center, which is what most people do with the cloud. The notable points are:

  1. To move to “cloud” (in this case, just another data center), we may have to virtualize the individual servers. Yes, we can potentially run whatever OS on whatever hardware, but most cloud companies do not agree. So, you are stuck with X64 and possibly, Linux, Windows, and a few other X64 OS’s (FreeBSD, illumos, smartOS and also variants of Linux).
  2. To move to cloud, we may need to setup the network appropriately. Only the web server needs to be exposed, unlike the other two servers. Additionally, all three machines should be in LAN for high bandwidth communication.
  3. While all the machines may have to be virtualized, database machine is something special. Several data bases, Oracle included, do not support virtualization. Sure, they will run fine in VM’s, but the performance may suffer a bit.
  4. In addition, databases have built-in virtualization. They support multiple users, multiple databases, with (limited) guarantees of individual performances. A cloud provider may offer “database as a service” which we are not using now.

In summary, we can move applications as-is to as-is, but we still may have to move to X64 platform. Other than that, there are no major risks associated with this move. The big question is, “what are the benefits of such a move?” The answer is not always clear. It could be a strategic move; it could be justified by the collective move of several other apps. Or, it could be the right time before making the investment commitment to the data center.

Unfortunately, moving applications is not as easy. Consider the slightly more complex version of the same application:

image

Let us say that we are only moving the systems within the dotted lines. How do we do it? We will discuss those complexities later, once we understand how we can enhance the moving that treats cloud like a true cloud.

Migration to use the cloud services

Most cloud providers offer many services beyond infrastructure. Many of these services can be used without regard to the application itself. By incorporating into the processes and also adding new processes to support the cloud can improve the business case to moving to the cloud. For instance, these services include:

image

Changes to these processes and tooling is not specific to one application. However, without changing these processes and ways of working, the cloud will remain yet another data center for the IT.

Migration to support auto scaling, monitoring

If we go one step ahead, by adjusting the non-functional aspects of the applications, we can get more out of the cloud. The advantage of the cloud is the ability to handle the elasticity of the demand. In addition, paying for only what we need is very attractive for most businesses. It is a welcome relief for architects who are asked to capacity planning based on dubious business plans. It is even bigger relief to infrastructure planners who chafe at the vague capacity requirements from architects. It is much bigger relief for the finance people who need to shell out for fudge factor built into capacity by the infrastructure architects.

But, all of that can work well only if we make some adjustments in the application architecture, specifically  the deployment architecture.

How does scaling happen? In vertical scaling, just move to bigger machine. image The problem with this approach is the cost of the machine goes up dramatically as we scale up. Moreover, there is a natural limit to the size of the machine. If you want to have disaster recovery, you need to add one more of the same size. And, with upgrades, failures, and other kind of events, large machines do not work out economically.

Historically, that was not the case. Architects preferred scaling up as it was the easiest option. Investments into hardware went towards scaling up the machines. Still, with new internet companies, they could not scale vertically; the machines weren’t big enough. Once they figured out how to scale horizontally, why not use the most cost effective machines? Besides, a system might require the right combination of storage, memory, and compute capacity. With big machines, it wasn’t possible to tailor to the exact specs.

Thanks to VMs, we could tailor the machine capacity to the exact specs. And with cheaper machines, we could create the right kind of horizontal scaling.

image

However, horizontal scaling is not so easy to achieve. Suppose you are doing a large computation – say, factorization of large number. How do you do it on multiple machines? Or, if you are searching for an optimal path though all the fifty state capitals? Not so easy.

Still, several problems are easy to scale horizontally. For instance, if you are searching for records through large set of files, you could do the searching on multiple machines. Or, if you are serving web pages, different users can be served from different machines.

Considering that most applications are web based apps, they should be easy to scale. In the beginning, scaling was easy. None of the machines shared any state – that is, there is no communication among the machines was required. However, once J2EE marketing machine moved in, these application servers ended up sharing state. There are other benefits, of course. For instance, if a machine goes down, the user can be seamlessly served out of another machine.

Oracle iPlanet Web Server 7.0

(Image courtesy: http://docs.oracle.com/cd/E19146-01/821-1828/gcznx/index.html)

Suppose you introduce a machine or take out a machine. The system should be adjusted so that session replication can continue to happen. What if we run one thousand machines? Would the communication work well enough? In theory it all works, but in practice it is not worth the trouble.

image

Scaling to large number of regular machines works well with stateless protocols, which are quite popular with the web world. If any existing system does not support this kind of architecture, it is not difficult to adjust to this architecture without wholesale surgery on the application.

Most data centers do monitoring well enough. However, in cloud, monitoring is geared towards maintenance of large number of servers; there is a greater automation built in; there is lot more log file driven automation. Most cloud operators provide their own monitoring tools instead of implementing the customer’s choice of monitoring tools. In most cases, by integrating into their tools (for instance, log file integration, events integration), we can reduce the operational costs of the cloud.

Migration to support cloud services

If you have done all that I told you to – virtualize, move to cloud, use auto-scaling, use the monitoring, what is left to implement? Plenty, as it turns out.

Most cloud providers provide lot of common services. Typically, these services operate better on scale. And, they also implement well-defined protocols or needs. For instance, AWS (Amazon Web Services) offers the following:

image

Given this many services, if we just go from machines to machines, we might just use EC2 and EBS. Using these services not only saves money and time, but eventually, ability to use trained engineers and third party tools.

Re-architecting a system using these services is a tough task. In my experience, the following order provides the best bang for the buck.

image

The actual process of taking an existing application and moving it to this kind of infrastructure is something that we will address in another article.

Re-architecting for the cloud

While there may not be the right justification for re-architecting the applications entirely, for some kind of applications, it makes sense to use the platform that the cloud providers offer. For instance, Google compute offers a seductive platform that offers the right kind of application development. Take a case of providing API for product information that your partners are embedding on their site. Since you do not know what kind of promotions your partners are running, you have no way of even guessing how much the traffic is going to be. In fact, you may need to scale really quickly.

If you are using say, Google app engine, you won’t even be aware of the machines or databases. You would use an appengine, and the APIs for big table. Or, if you are using any platforms provided by the vendors (Facebook, SFDC, etc.), you will not think of machines. Your costs will truly scale up or down without actively planning for it.

However, these platforms are suitable for only a certain kind of application patterns. For instance, if you are developing a heavy duty data transformation, a standard appengine is not appropriate.

Creating an application for a specific cloud or platform would require designing the application to make use the platform from the cloud. By also providing standard language runtime, libraries, services, the platform can lower the cost of development, I will describe the standard cloud based architectures and application patterns some other day.

Complexities in moving

Most of the complexities come from the boundaries of applications. You saw how many different ways the application can be migrated if self-defined. Now, what if there are lot of dependencies? Or, communication between applications?

Moving in groups

All things being equal, it is best to move all the applications at once. Yet, for various reasons we move only few apps at a time.

image

If we are migrating applications in groups, we have to worry about the density of dependencies, the communications among the applications.  Broadly speaking, communication between apps can happen the following ways.

Exchange of data via files

Many applications operate on import and export (and transformation jobs in between). Even when we move a set of applications, it is easy enough to do these file based communication. Since file-based communication is typically asynchronous, it is easy to setup for the cloud.

image

Exchange of data via TCP/IP based protocols

In some cases, applications may be communicating via standard network protocols. Two applications may be communicating via XML over HTTP. Or, they could be communicating over standard TCP/IP with other kinds of protocols. X windows applications communicate over TCP/IP with X server. Applications can use old RPC protocols. While these protocols are not common anymore, we might encounter these kind of communications among applications.

image

To allow the communication to continue, we need to setup the firewall to allow such communications. Since we know the IP numbers of end points, specific ports, and specific protocols, we may be able to setup effective firewall rules to allow such communication. Or we can set up VPN between the two different locations.

It is easy to handle the network throughput; in most applications, the throughput requirements are not very high. However, it is very common to have a low latency requirement between applications. In such cases, we can consider dedicated network connection between the on-premise center and the cloud data center. In several ways, it is similar to handling multi-location data centers.

Even with a dedicated line setup, we may not be fully out of woods yet. We may still need to reduce the latency further. In some cases, we can deal with it by caching and other similar techniques. Or, better yet, we can migrate to modern integration patterns such as SOA or message bus using middleware.

Exchange of data via messages or middleware

If we are using middle ware to communicate, the problem becomes simpler. Sure, we still need to communicate between the apps, but all the communications go through the middleware. Moreover, middleware vendors are dealing with integrating applications across continents, across different data centers, and across companies.

image

ESB or any other variants of middleware can handle a lot of integration-related complexities. They can do transformation, caching, store and forward, and security. In fact, some of the modern integration systems are specifically targeted towards integrating with the cloud, or running integration systems in the cloud. Most cloud providers offer their own messaging systems that work not only within their clouds, but also across the Internet.

Database based communication

Now, what if applications communicate via database? For instance, an order processing system and an e-commerce system communicating via database. And, if e-commerce system is on the cloud, how does it communicate with on-the-premise system?

image

DB-to-DB sync has several special tools, since this is a common problem. If the application doesn’t require a real-time integration, it is easy to sync the databases. Real-time or near-real-time integration between databases requires special (and often expensive) tools. A better way is to handle the issue at the application level. That means we should plan for asynchronous integration.

Conclusion

Moving applications to cloud opens up many choices, each choice with its own costs and benefits. If we do not understand the choices and treat every kind of move as equal, we risk not getting the right kind of ROI from moving to cloud. In another post, we will discuss the cloud migration framework and how to create the business case, and also how to understand what application should migrate to which cloud and to what target state.

 Posted by at 8:55 pm  Tagged with:
Feb 182014
 

I attended Strata last week (Feb 11-13) in Santa Clara, CA, a big data conference. Over the years, it has become big. This year, it can be said to become mainstream — there are lot of novices around. I wanted to note my impressions for those who would have liked to attend the conference.

Exhibitors details

The conference exhibitors can be distributed into these groups:

clip_image002

As you can see Hadoop is the big elephant in the room.

Big picture view

Most of the companies, alas, are not used to the enterprise world. They are from the valley, not the from the plains where much of these technologies can be used profitably. Even in innovation, there are only a few participants. Most of the energies are going in minute increments of usability of technology. Only a few companies are addressing the challenge of bringing Big Data to main stream companies that already invested in plethora of data technologies.

The established players like Teradata, Greenplum would like you to see big data as a standard way of operating along with their technologies. They position big data as relevant in places, and they provide mechanisms to use big data in conjunction with their technologies. They build connectors; they provide seamless access to big data from their own ecosystem.

clip_image003

[From Teradata website.]

As you can see, Teradata’s world center is solidly its existing database product(s).

The new comers like Cloudera would like to upend the equation. They compare the data warehouse with a big DSLR camera and the big data as a Smartphone. Which gets used more? While data warehouse is perfect for some uses, it is costly, cumbersome, and doesn’t get used for most places. Instead, big data is easy, with lot of advances in the pipeline, to make it easier to use.  Their view is this:

clip_image005

[From Cloudera presentation at Strata 2014].

Historically, in place of EDH, all you had was some sort of staging area for ETL or ELT kind of work. Now, they want to enhance it to include lot more “modern” analytics, exploratory analytics, and learning systems.

These are fundamentally different views: While both see big data systems co-existing with data warehouse, the new companies see them taking on increasing role to provide ETL, analytics, and other services. The old players see it as an augmentation to the warehouse when unstructured or large data volumes are present.

As an aside, at least Cloudera presented their vision clearly. Teradata on the other hand, came in with marketese which does not offer any information on their perspective. I had to glean through several pages to understand their positioning.

A big disappointment is Pivotal. They ceded the leadership in these matters to other companies. Considering their leadership in Java, I expected them to extend Map Reduce to multiple places. That job is taken up by Berkeley folks with Spark and other tools. With lead in Greenplum HD, I thought they would define the next generation data warehouse. They have a concept called data lake, which is merely a concept. None of the people in the booth were articulate about what it is, how it can be constructed, what way it is different, and why it is interesting.

Big data analytics and learning systems

Historically, analytics field is dominated with descriptive analytics. The initial phase of predictive analytics was focusing on getting the right kind of data (for instance, TIBCO was harping on real-time information to predict events quickly). Now that we got Big data, it is not so much as getting the right data, but computing it fast. And, not just computing fast, but having the right statistical models to evaluate correlations, causations and other statistical stuff.

clip_image006

[From Wikipedia on Bigdata]

These topics are very difficult for most computer programmers to grasp. Just as we needed understanding of algorithms to program in the beginning, we need the knowledge of these techniques to analyze big data these days. Just as the libraries that codified the algorithms made them accessible to any programmer (think when you had to program the data structure for an associate array), new crop of companies are creating systems to make the analytics accessible to programmers.

SQL in many bottles

A big problem with most big data systems is the not having relational structure. Big data proponents may rile against the confines of relational structures, but they are not going to fight against SQL systems. Lot of third party systems assume SQL like capabilities from the backend systems. And, lot of people are familiar with SQL systems. SQL is remarkably succinct and expressive for several natural activities on Data.

A distinct trend is to slap on SQL interface onto non-SQL data. For example presto does SQL on Big data. Or, impala does SQL on Hadoop. Pivotal does Hawq. Hortonworks does Stinger. Several of them modify SQL slightly to make it work with reasonable semantics.

Visualization

Big data conference is big on visualization. The key insight is that visualization is not something that enhances analytics or insights. It itself is a facet of analytics; it itself is an insight. Proper visualization is the key to so many other initiatives:

  1. Design time tools for various activities, including data transformation.
  2. Monitoring tools on the web
  3. Analytics visualization
  4. Interactive and exploratory analytics

The big story is D3.js. How a purely technical library like D3.js has become the de facto visualization library is something that we will revisit some other day.

D3.jpg

Summary

I am disappointed with the state of big data. Lot of companies are chasing the technology end of the big data, with minute segmentation. The real challenges are adoption in the enterprises, where the endless details of big data and too many choices increase the complexity of solutions. These companies are not able to tell businesses why and how they should use Big data. Instead, they collude with analysts, media, and a few well-publicized cases to drum up hype.

Still, Big data is real. It will grow up. It will reduce the costs of data so dramatically to support new ways of doing old things. And, with right confluence of statistics and machine learning, we will see the fruits of big data in every industry. That is, doing new things in entirely in new ways.

Nov 042013
 

The usability of banking took a big leap, with the invention of branch office. In modern times, in the 12th century, the Templars created a network of branches, taking the banking business to where there is a need – for instance to middle east and England. They allowed the movement of funds, currency conversion, and ransom payments to happen smoothly in those days.

In recent times, a prominent feature of wild west town is an imposing bank building. This branch office provided a life line of credit to the local citizens. Along with rail road, post office, church, school, and news paper, bank branch provided the underpinnings of civilization. The building is meant to convey stability, a much needed quality, as fly-by-night operators running off with depositor’s money were common in those days.  Bank of America is supposed to have spearheaded the growth of satellite branches in the US.

Copyright: www.jaxhistory.com

In the 20th century, with the advent of telephone, traditional banking got extended slightly. Unlike before, you do not need to go to the bank to carry on certain kind of transactions. You could call and enquire about the status of a transaction. You could call and even initiate certain transactions. Still, if you needed cash, you needed to go to the bank.

Credit cards changed the situation quite drastically, starting in the 60’s. You could charge the card for various purchases. In a way, it created an alternative to traditional cash. You are not using the cash, but the credit letter that the bank gave you in place of cash, in a fashion.

ATM’s changed even that situation. You can get cash when you need it – in a mall, in a super market, at a train station, and even in a casino. It truly untethered us from the branch office.

Internet: How it changed the banking

Considering that we have a way of carrying on transactions without even a branch office, do we really need branch office? That would the natural question that we may ask, looking at the trends.

As soon as internet became reliable, traditional banks have taken a different approach. They did not see it as a replacement to the existing channels, but yet another channel to serve customers. They created websites, exposing the online transactional systems and querying systems to the consumers. As technology, and adoption of technology improved, they improved the websites. They added even mobile apps.

Discover-Bank

Today, a stated policy of innovation in banking might go something like:

  1. Enabling users to conduct lot more transactions on their website.
  2. Enabling mobile users – that means, mobile apps, to conduct transactions.
  3. Offering lot of analytical tools: analysis of transactions, planning.
  4. Gamification to get users to behave in certain ways – for instance, improving saving rates, planning properly and so on.
  5. Adding new products such as group banking etc.

In most situations, banks see these efforts augmenting their traditional channels. In fact, the biggest effort these days is to reconcile these different channels. Integration of data (for example, getting the same amount of balance on iPhone app or ATM), integration of processes (for example, starting a wire transfer online and finishing at the branch) are some of the challenges in this channel unification effort.

Modern banks have taken a different route. Since they have not established branch offices, they bypass that infrastructure, and make it a virtue. They offer better interest rates, better usability of the applications, and better customer service. For example, check out http://www.bankrate.com/compare-rates.aspx to see the best rates – they are offered by banks with no local branches. Bank Simple, which tries to offer superior technology service, has gained more $1B deposits within an year of opening, without any track record.

Simple.com

[Simple.com’s mobile application].

Surprisingly, a bank’s ability to attract customers is directly proportional to the number of branch offices they have in the neighborhood. [See: http://www.economist.com/node/21554746]. However, with the changing demographics, wider adoption of technology, and the pressure from different industries, the situation is changing.

Web 2.0: How it will change the banking

Whether banks view internet applications as an another channel, or the primary channel, the focus has been always about improving their applications: websites, mobile applications, internal applications. Yet, the biggest financial innovation of the early internet, PayPal, does none of that.

[Using PayPal’s payment services: A workflow from PayPal developer site].

Technology wise speaking, PayPal succeeded in taking the ball where the game is, instead of insisting people come to its playground. It successfully integrated into several online store fronts. It is almost like it setup ATM’s all over the internet, at the moment of purchase.

When we look at other industries, we see the same trend. Instead of assuming the burden of developing the applications consumers want, they allow others to develop apps. With extreme segmentation, they allow multiple groups to develop and serve different segments as those groups seek to serve. In fact, several companies use API’s are a way to increase awareness with internal departments, external partners, and potential employees. They embrace it to such an extent, that they even hold hackathons to create apps.

In mid 90’s, I read a paper called, “It’s bits, stupid”, a take-off on Clinton’s “It’s economy, stupid”. The concept is that the telephone companies controlled the telephone applications from beginning to end. Want to introduce three way calling? You need to go and change the switch code, change the telephone handsets etc. Want to have call hunting? Again, you need to change code in the switch etc.

Compare it with internet, where it was only interested in pushing bits. Building the actual apps was left to the ecosystem. Internet, web, VOIP, Google hangouts – all these were result of that innovation. To think that SS7 could have been TCP/IP or even could have assumed the same openness as TCP/IP is unimaginable these days.

In fact, even in the staid old world of telephony, one of the most successful companies in creating an ecosystem is twilio. Using its API’s people have crafted different applications ranging for customer service apps, SMS apps, and helpdesk apps.

[Twilio has the ability to analysis of the calls – this app is put together on top of Twilio API’s. Copyright: Twilio.]

If Banks have to embrace this way of participating in a large ecosystem, they need to change the way they develop applications. They could take cues from successful companies like Twitter and Facebook. Twitter built its entire business through API’s allowing users to share stories, comment from within the applications. So did Facebook. Let us see how companies are embracing this philosophy of separation of core API’s and apps, .

API economy

When we look at companies that are successful at fostering an ecosystem where others can participate in developing applications, we find the following:

  1. They make it easy for others to use the API’s.
  2. The standard, routine, or the core portion of the logic is managed by  the company. The customization, specialization etc. are delegated to the ecosystem.
  3. They allow the users to integrate into their workflows and ways of working.

Even if the companies are not interested in exposing the APIs to general public, they are interested in going this route at least for internal audience. For one thing, in several large companies, different groups behave as perfect strangers – therefore, all the standard techniques of getting developers to adapt your platform and API’s apply here. For another, the technical and engineering advantages are increasingly in favor of this approach.

[How Netflix develops even internal apps using REST API’s. Copyright: Netflix].

We can analyze the API economy from two different trends:

Banking trends

For banks, the API’s offer an interesting mix of privacy, convenience, security and trust. For instance, PayPal offers privacy (they need not know my cc number), trust (they can trust that PayPal will pay out and do any dispute management). The most popular with new web companies, stripe, offers both, without the burden of keeping track of payments, or regulatory compliance of keeping the CC numbers.

The tug-of-war we see these days is between these two: trust and privacy. Lot of people hate PayPal because they do not trust its track record as the arbitrator. That is, it is protecting privacy, even at the expense of trust. Cash for example, offers a good balance between trust and privacy. However, it is not convenient. Bitcoin offers perfect anonymity, and little less of trust. Banks offer great deal of trust, but little less anonymity.

[Does popularity = trust? At least in Bitcoin case, it seems to be so.]

The current generation is losing its trust in governments. With the rise of citizen journalism, governments are seen as cynic at best, or corrupt at worst. Banks, aligned to government through fiscal policies, are tainted by the same guilt. While the current business does not suffer, and even the future business – commercial and high net-worth business may not suffer, individuals may eventually find alternatives to banking.

Hopefully, with the right API’s banks will relinquish some of the power they hold, for which they are blamed. If all I am doing is facilitating a payment, then, I cannot be held responsible for the application built on it, correct? While the laws catch up to the creative frenzy of the internet, banks will end up focusing on providing safe, proven, trusted, and secure services.

Incidentally, banks already offer API’s, whether in proper technical form or not. They work with tax prep software to get the tax details. They work with aggregators like mint.com, sigfig.com, yodlee.com for the  get the details of the user accounts for analytic purpose. Most of these aggregators built solutions to get the account details from banks, but lot of those solutions are brittle, without support from banks.

Mint.com example

[Mint.com got the information from two accounts here: Etrade, Fidelity and showing the analysis].

Technical trends

Loosely speaking, APIs are SOA for the easy app development. Most modern API’s are simply JSON over HTTP. Typically, they are used directly from the web by:

  • including the js library
  • calling out the API (via HTTP protocol)
  • parse the result and display the result. (sometimes, js library may have standard display library as well).

For instance, consider this API for Stripe, a payment company:

<form action="" method="POST">
  <script
    src="https://checkout.stripe.com/v2/checkout.js" class="stripe-button"
    data-key="pk_test_czwzkTp2tactuLOEOqbMTRzG"
    data-amount="2000"
    data-name="Demo Site"
    data-description="2 widgets ($20.00)"
    data-image="/128x128.png">
  </script>
</form>

Here, we included the stripe checkout.js library. We are including all the needed information with that call. The result should look like this:

image

In this scenario, the credit card number doesn’t even touch the local system. That means, PCI compliance does not apply to this site. The credit card information is handled by Stripe.

Architecturally, applications are converging to this broad pattern:

image

In this picture, the backend services are exposed by the API’s. With the rise of HTML5 and the front-end MVC, the architecture will look like this:

image

What it means is this: The API’s can directly be consumed by the browser based application. We do not really need server side page creation at all. For instance, I can develop a static shopping mall application with ability to track users, send mails, take payments, integrate with warehouse, all from within the browser, without writing any server side code!

This paradigm is becoming so successful, there are several companies that are catering to developing, documenting, managing, and delivering the API’s:

  1. apigee: API management and strategy company. They raised close to $107 million dollars so far.Their strategy especially focuses on mobile application development on API’s.
  2. Mashery: Competition to apigee. They only (!) raised $35 million dollars. They have been at this game far longer.
  3. Layer7: They are extending their SOA governance to API management and governance.
  4. Apiary: This company offers services to collaboratively design and develop services. They generate documentation, test services from the API description. They have a nice site, http://apiblueprint.org/ that describes API development and offers several services free.
  5. Apiphany: Acquired by Microsoft, this company is going to serve API management within Azure family.

There are several other companies that have entered this already crowded market. If history is any indication, eventually, the technologies, tools, and skills that these companies are developing will become available for enterprises at competitive prices.

Other industries: How they embracing API’s

These API management companies provide only limited perspective on API development. To truly embrace API based technologies, solution design, we should look at the current generation technology companies. The website http://leanstack.io/ describes how cutting edge technology solutions are built, using API’s offered by several other companies. For instance, highly successful Pinterest uses the following services:

image

As you can see, several of these cloud services are available as API’s to integrate into applications. Google analytics lets apps track users. Qubole is used for big data services. Sendgrid lets apps send mails.

In the current crop of companies, there are several services that are cheap enough and modern enough for banks to be able to integrate into their applications. They can reduce the effort in developing comprehensive solutions and increase customer satisfaction. For example, Rightsignature offers easy way to get docs signed, with support for integration via API’s. Hubspot provides API’s to make use of its inbound marketing services. Qualaroo lets you design, target, and host surveys for your users easily. Spnnakr lets you offer segmented pricing.

Summary

Banking is evolving. By focusing on the essential services, it can foster new innovations from the community of users and companies. Currently, technology is embracing API’s as a way to integrate services from different providers to create new consumer applications. Banks may not be able to create such an ecosystem by themselves, but they can participate already existing ecosystems. By creating the right technology support via API’s, banks can offer the solutions that meets the needs of diverse audience with different demands on privacy, convenience, security, and trust.

Oct 212013
 

This post has nothing to do with whether “obamacare”  is good or bad. It is only about the discussion of the technology stack and the details of it.

At $634M, it is one of the costlier government projects. At its launch, it ended up failing for several users. Even now, they estimate 5M lines of code change to fix the system. What is going on?

The problem, viewed from one angle is simple. Let the users discover and onboard to a particular plan. And, it should cater to large number of users. Since the users do not have other options, you can dictate terms to users (you could say that they need to download a specific version of browser to work with it). Looks easy enough.

On the other hand, the problem is complex. There are any number of plans. There are several different exchanges. The eligibility criteria is complex. There are different agencies, databases, vendors involved. And, the integration is bound to be complex. So over all, it is, well, complex.

To top it off, it is overseen by government agency. These people are good at sticking to rules, procurement, checklists etc. If they check for the wrong things, the site meets all the requirements, and yet, fail.

Tech stack details:

The tech stack is modern. They rely on JS on the browser.

Performance issues

Summary: They are doing lot of rookie mistakes in optimizing the page for performance. With minimal effort, they can take care of most of the issues.

  1. They seem to use 62 JS files in each page. They need to use fewer files and minified as well to reduce the round trip. With 62 files, and that too, without expires headers, we are looking at 15 round trips and that means around 5 seconds of loading time itself (assuming .3 sec for round trip and processing).
  2. The page is heavy! The JavaScript is 2MB and the css is .5 MB and images are .25 MB. So, over all, the site needs to download 2.75MB just to start working.
  3. For the returning user, the situation is only marginally better. They still need to make 85 round trips (that is the number of components); but they only need to download .5 MB.

If experienced folks developed this site, they can reduce the round trip time to less than 1 second (5 fold improvement), easily.

Code quality issues

First the tech stack details.

  1. Bootstrap
  2. JQuery
  3. Backbone
  4. Underscore
  5. JSON
  6. JQuery UI (Why?)
  7. Parts of Scirptaculous

Their stack is not to blame. They are making use of API’s heavily. They use bootstrap (version 2.3), Jquery, backbone, underscore and JSON. I think backbone is too complex a technology (I am partial to Angular, or lot of other modern JS technologies), and the rest are simple enough. In the hands of novice developers Backbone can get very convoluted. In fact, the same can be said off JS. 

Let us take look at code quality (what we can see in JS files):

  1. Backbone is complex for these kind of apps. That too for average developers, BB tends to be difficult to use.
  2. Checkout the file: https://www.healthcare.gov/marketplace/global/en_US/registration.js – to understand how the code is laid out. They are not doing template driven development or metadata driven development. This is too much of hard-coded stuff. And, look for “lorem ipsum” too, while you are at it (that shows poor test coverage, or unnecessary code). (this file may be auto generated…).
  3. Use of too many technologies: Shows sub-contracting and no architectural oversight. For instance, if you are using Bootstrap, might as well stick to it, instead of getting JQuery UI stuff. Also, lot of JS files like Carousel etc are built in Bootstrap – why have separate ones any more?
  4. Lot of code in JS seems to have been generated by MDA – that may account for some of the mess. Check out the MDA tool: http://www.andromda.org/index.html

At this point, I don’t have much information: The github pages seem to have some code, but that may not be the one used here. The github uses static html generator (a good strategy) – but that is not what the current website has (Github code seems have been removed now).

Overall, it is looks like high concept, reasonable execution, bad integration, and terrible architectural and design oversight.

I will return with more, when I get some time to analyze the site.

 Posted by at 9:58 am
Aug 212013
 

We don’t talk numbers enough, in software business. You walk into a store to buy a gallon of milk and you know the price. You ask a software consultant how much the solution costs and he says, “it depends”. You cannot even get him to state assumptions and give a reasonable price.

At IITM, I heard a story about a German professor, and his style of evaluating the exam papers. In one of the questions, the student made a mistake in some order calculation. His answer was accurate but for the the trailing zero. The student got “0”.

Now, in general, if the student understood what he needs to do, and applied the formula, even if made some simple calculation mistake, he would get partial credit, in other courses. Getting zero marks was a shocker to him. But the German professor wanted was for the students to develop a good feel for the answer. For instance, if I were to calculate the weight (er…, mass) of a person and it came to 700 Kg, it should ring some warning bells, right?

In my job, lack of basic understanding about numbers holds back people, unfortunately. This ignorance shows up in places like sizing the hardware, designing the solutions, and creating budgets.

My inspiration for this post is a wonderful book “Programming pearls” that talks about these kind of back of the envelope calculations.

Network Latencies

Suppose you are hosting a web application in San Francisco. Your users are in Delhi. How much minimum latency can you expect?

The distance is around 7500 miles or 12000 Km. Light can travel 186000 miles per second. So, it takes around .04 seconds. But light travels around 30% less speed in fiber than in vacuum. Also, it is not going to travel in a straight line – it may go through lot of segments. Besides, there are relays and other elements that delay the signal. All in all, we can say our signal will have an effective speed of say .15 seconds. Now, we need to a round trip, for any acknowledgement – so that makes it .3 seconds. A simple web page will have 15 components (images, fonts, CSS, and JS). That means around 4 round trips (Most browsers do four components at a time).

So, just in speed of light basis alone, your network is going to take up 1.2 seconds. That is going be the number you are going to add to your testing on your laptop.

Developing Java code

I met a seasoned developer of Java code. He has been developing systems, and well versed with optimization. We were reviewing some code and I noticed the he was optimizing for creating less number of objects.

How much time does it take to create an Object in Java? How much with initialization from String? How much time does it take to concatenate strings? To append to strings? Or, to parse an XML string of 1K size?

The usual answer from most people is “it depends”. Of course, it depends on the JVM, OS, machine, and so many other factors. My response was “pick your choice – your laptop, whatever JVM, and OS that you use, when the system is doing whatever it normally does”. It is surprising that most people have no idea about performance of their favorite language!

In fact, most of us who work with computers should have an idea about the following numbers.

Numbers every should know about computers

In a famous talk by Jeff Dean, he lays out the standard numbers that every engineer should know:

image

These may appear simple enough, but the implications of developing code are enormous. Consider some sample cases:

  1. What if you want to design a system for authentication for all the users in Google (ignore the security side of the question – only think of it as a lookup problem), how would you do it?
  2. If you want to design a system for a quote server for the stock market, how would you speed it up?
  3. You are developing an e-commerce for a medium retailer with 10000 SKU’s (Stock keeping Units). What options would you consider for speeding up the application?

Naturally, these kind of questions lead to some other numbers. What is the cost of the solution?

Cost of computing, storage, and network

There are two ways you can go about constructing infrastructure: the lexis-nexis way or the Google way. Remember that Lexis-Nexis is a search company that lets people search for legal, scientific, and specialty data. Their approach to build robust, fail-safe, humongous machines that serve the need. On the opposite spectrum is Google, which uses white boxed machines, with stripped down parts. I suppose it gives a new meaning to the phrase “Lean, mean machine”.  (Incidentally, HDFS etc, take similar approach).

Our situation lies more towards Google. Let us look at the price of some machines.

  1. A two processor, 32 threaded blade server, with 256 GB is around $22000. You can run 32 average virtual machines on this beast. Even if you are going for high-performance machines, you can run at least 8 machines.
  2. If you are going for slightly lower end machines (because you are taking care of robustness in the architecture), you have other choices. For instance, you can do away with ECC memory etc. You can give up power management, KVM over IP etc, for simpler needs. [Example: setting up internal POC’s and labs.]. If that is the case, you can get 64 GB machine, with SSD of 512 GB and 5 TB storage at around $3000.

So, you have some rough numbers to play with, if you are constructing your own setup. What about the cost of cloud? There, pricing can be usage based, and can get complex. Let us take a 64GB machine with 8 hyper-threads. If we are running it most of the time, what is its cost?

Amazon’s cost tends to be slightly on the high side. You pay by the hour and it costs around $1.3 per hr. That is roughly equivalent to $1000 per month. If you know your usage patterns well, you may optimize it down to say, $500 per month.

Or, you could use one of my favorite hosting sites: OVH. There, a server like the above, would cost around $150 per month. Most of the others fall somewhere in between.

Now, do a small experiment in understanding the costs of the solutions: Say, to create an e-commerce site that keeps the entire catalogue cached in memory, what is the cost of the solution?

To truly understand the cost of solution, you also need to factor in people cost as well. That means, what is the effort to develop solutions, operate and support them.

TCO: Total cost of ownership

To understand the cost of operations and support, here is a rule of thumb. If you are automating tasks, reducing human intervention, you can assume that cost of design and development can range from $50 to $250. The variation is due to location, complexity of systems, effort estimation variations, choices of technology, and a few other details.

A few details worth noting:

  1. You can get a good idea of salary positions and skills by looking at the sites like dice.salary.com. Try indeed.com, for a more detailed look at, based on the history of postings.
  2. To get the loaded cost to the solution for labor, multiply the cost per hour  by 2. For instance, if you need to pay $100 K salary, the hourly cost is going to be $100 (2000 hrs per year).
  3. By choosing a good offshore partner, you can get the ops cost as low as $15 to $30.
  4. The cost of good design and architecture – as they say, is priceless!

As per technology choices: which technology gets you low TCO? What kind of numbers can you use in your back of the envelope calculations?

That will be the topic for some other day.

 Posted by at 4:51 pm
Aug 082013
 

Suppose you developed a website to show to others. You have it running on your laptop. What choices do you have?

You can setup a livemeeting:  This is what we seem to do in most companies.

  • That means you are going to show the demo. Unless you are going be physically there, they cannot access it.
  • If the audience wants to try out the website, you need to hand over control.

It is not a problem with livemeeting, per se. Even other paid ones like webex or free ones like join.me or Teamviewer suffer from the same issues.

After all, your goal is not just to demo the site by yourself, but to let others play with it.

Here is one alternative, using ngrok:

image

The beauty of this software is that it needs no installation. Just one executable. You can even carry it in a thumb drive! It is around 1.7MB zipped and 6MB sized executable.

The next step is to run it. Suppose I have a server running on port 8000. Here is the way I expose it to the world:

image

Now, what you can see is this: Your port 8000 is proxied through the open web under the name: http://65dc9ab7.ngrok.com

Before going there, here is how the website looks on the local page:

image

Notice that the host is localhost. If your friends try on their machine, they will not see these files.

Now, let us go to the http://65dc9ab7.ngrok.com that ngrok gave us from some other machine and see it:

image

See! Identical!!

Here are some other fun things to do:

Suppose your users are playing around on the site. You can see all their interactions from http://localhost:4040/

Let us see how that looks. I am going to access 1.html, 2.html and a1.html via the ngrok proxy.

image

That is, you get to see who is accessing your site, from where they are coming and what is happening. You can replay the entire web interaction too! This can be a simple, yet powerful debugging tool.

Now, let us look at other use cases.

Serving files from your machine

Suppose you have a bunch of files that you want to share with others. You can do the following:

  1. Install python (3.3.x will do nicely).
  2. Run the following command from the folder where you want to share the files from: python –m http.server. You can optionally specify the port number as well.

Logging into your machine from outside

Now, suppose you got a Unix machine behind the firewall and you want to be able to access it. Here is what you can do.

  1. Install shell in a box: https://code.google.com/p/shellinabox/
  2. Run the shell – by default on port 4200.
  3. Now, run ngrok to proxy 4200.
  4. Presto! Instant access from the internet to login to your Unix box.

Caution: Your acceptable user policies may vary. Please verify with your system admins to find out if you are allowed to run any of these programs.

 Posted by at 12:04 pm