rama

 

Consider the simple web program. For the sake of convenience, we will assume it is written in Java. In its simplicity it most likely is as the following:

simple-db-backed-site

What are the implicit assumptions we are making?

  1. The database is supporting the reads and writes into the system.
  2. If other systems need this information, they access it from the database.
  3. Any writes happen directly to the database.
  4. The data fits in the database – both structure wise and size wise.

Now, let us see what the issues with these assumptions are. Turns out that we can relax or change each of these constraints to get a different kind of system. For now, let us take a specific path so that I can illustrate a point. I will cover the rest of the roads in other posts.

So, consider the case of database not being very convenient for access. That is, structurally, the data, while it fits well in the RDBMS (as relational tables), needs to be viewed in other forms, most notably as objects. And, of course, going to database for each and every access is costly.

Putting it another way, you need to have object interface to the database to handle the data well.

Enter ORMs (Object Relational Mappers). A typical layout would be:

simple-website-with-orm

What does this mean?

  1. The Java applications handle the data through POJO’s (plain old java objects).
  2. The ORM translates between the POJO’s and the relational tables of the RDBMs, using JDBC for communication.

Some popular choices for ORM are hibernate and OpenJPA.

It all looks hunky dory, until you start seeing performance hit. You realize that for a simple object look up, you end up doing a bunch of queries. Of course, you want to do some caching to take care of this problem. Now, the picture looks like this:

simple-website-with-orm-cache

Summary: ORM could benefit from a cache.

Now let us look into this cache a little bit more deeply. If all that we are looking to speeding up the ORM, then we are fine with handcrafting a cache solution. Turns out that there are a lot of choices for Cache, as caching is a general-purpose paradigm.

Some popular choices for Cache include Ehcache etc.

Now, an interesting phenomenon is this: instead of cache, we can even use a main-memory database! After all, it is tuned to be in main-memory, with fast accesses and well debugged.

If we have a main memory database as a cache, the picture would look like this:

simple-website-wth-orm-mmdb

Now, look at how it looks: You have a main-memory DB and you have an RDBMS. You have your ORM synching between them, a job that it never signed up for. In fact, if you have main memory, can you get away with eliminating ORM completely, rendering the following picture?

simple-website-with-mmdb

Of course, it is possible only if the following hold:

  1. The main memory database should support all the data structurally.
  2. The main memory db should support all the data size wise as well.
  3. The synching from MMDM and RDBMs should be possible.
  4. If MMDB, in conjunction with the RDBMs should support transactions (ACID properties and perhaps roll backs).
  5. The Java program should be able to access MMDB fitting with the way Java programming is done.

Let us look at each of these areas and see how the picture changes:

MMDM and data structures

The standard RDBMs support only tables. If you want represent a matrix or a sparse array, you know the length of troubles you have to go through to represent in RDBMs. So, before you think, RDBMs are structurally superior, you should pause and think about that.

Enough dissing of RDBMs. Is there a way that MMDB’s operate to support different structures? There is good news and bad news. Perhaps bad news is good news.

The bad news is this: There is no one ubiquitous way that all the MMDB’s represent the data. That means, you cannot assume that the data is going to be in tables, neatly described by metadata. The side effect is that no way you can standardize the access to the MMDB. After all, you are storing different structures.

The good news, as a corollary is this: You can pick and choose the DB that is most suited for your purpose. That means, your program’s view of data and your MMDB’s view of data does not differ. That is a good state to be in.

There are a few choices that MMDB’s offer. Without being comprehensive, let me give you a couple of choices.

Key, value databases

kv-db

In here, you can assume the whole database to be a giant table. (Think BigTable – in fact, google’s BigTable is a good example). Just like you access a hash table, you access your data in your MMDB.

A few things to observe:

  1. You can store anything that you want to in the KV store. In this picture, I showed storing as strings – but you can use other than the strings, say XML or Object serialization.
  2. You can retrieve only based on the key.
  3. It is not a general-purpose database – you cannot do general-purpose queries. The storage types are quite primitive.

There are several examples of this kind of databases: MemCache, Redis, and so on.

Document style databases:

One of the problems of KV style store is the opaqueness of the values. That is, the value is treated as just a value – you cannot peer into it, query it, see it as a complex structure. Document style db’s address that problem.

Let me illustrate it in the context of Mongo db, which is a document style:

mongodb-schema

Key points to note are:

  1. You are using very structured way of storing data. In fact, you are using JSON to store the data, which is nice to use.
  2. You don’t have a fixed schema. You can keep any data associated with the object.
  3. I did not show you the queries – you will see that you can query based on just about any column.

This kind of data storing is amazingly simple. Consider the alternative of storing in a relational database. You will store either metadata in the database or you will create a large schema. Either one will have problems. [This topic deserves a note of its own – someday.]

If you notice we are conflating two issues:

  1. Performance that comes from keeping the data in main memory.
  2. Supporting the right kind of data structures in the database.

For example, we can even support relational data in main memory. A properly tuned Oracle keeps lot of table data pinned in the memory.

So, for our needs, we can think of the structure of the data is primary.

Distributing data: key to scalability

So far, we have been assuming that all the data resides in memory. What if the data doesn’t fit in the memory (of one machine)? For instance, JVM has well known limitations on the memory (on 32bit machines, it can support, at best 4GB, and in reality considerably less; on 64bit machines with 64bit JVM, too many issues to maximize the heap), we may want to keep the database out of the JVM process.

There are other reasons why you want to keep outside the process as well. For instance, what if you want to use the same db from multiple apps? What if we want to independently backup and synchronize the database?

The best way to do that would be:

mapper

That is, you partition the data and go to different data source (node) for querying. The mapper determines where to go for which data – think of it as a hash function.

For example, let us say all the customer information is partitioned based on the first letter of their name. You don’t need a mapper explicitly – each application can map the query to the node and get the information via a web service or something equally simple.

Of course, with this kind of partitioning a few questions will come up:

  1. What is the effective way of partitioning the data? (think: There are not that many people whose name starts with Q).
  2. How do we create partition tolerance? What if one of the nodes goes down? Can we have a different node answer the same query? If we have a different server, how does that synchronize with the original one?
  3. How do we support transactions? Can we support Atomicity? Concurrency? Isolation? Durability?

Turns out that there is a theorem (CAP theorem) that shows that you can’t have everything: consistency of the data across all the nodes, availability of the data even if some nodes may fail, and partition tolerance (failure in syncing the data between nodes will not cause problems for the systems, in the short term). The real theorem is more complex – read it up if you want to really know what it says.

So, what do people do? Actually, the compromise in the quality of the database supports interesting real-life situations. For example, let us say that you uploaded a photo from your home and your friend in India doesn’t see if for 2 minutes. Would you be upset? Or, you commented on some thread and somebody else doesn’t see it for a while. Or, your bulletin board doesn’t respond occasionally throwing open a page saying “Please visit in a little while”?

All of these are examples of specific situations that even with limitations of CAP, can be addressed usefully with these kind of databases.

What is the lesson for us?

Here is what we have seen so far:

For some specific situations, NoSQL databases can offer simpler mode of development and higher performance for web development.

If you are developing some expertise for yourself or your group, these are the areas that could help:

  1. Understand and classify the situations for which NoSQL databases work well. Create a flow chart for choosing the right technology components suitable for the problem context.
  2. Most enterprises are nervous about the risk – which NoSQL DB is going to survive? They recall the days of betting on Sybase only to ask for migration funding years later. So, create a risk mitigation strategy:
    1. By creating an API layer that is more semantic in nature and is implemented on more than one NoSQL db.
    2. By creating a way of syncing the data with the backend Oracle or DB2.
  3. Tune the performance by adjusting the level of C or A or P that we need. In fact, you will find out different databases support different CAP. This knowledge is useful to recommend and tune a NoSQL database for a specific need.

In the later posts, I will drill down into these topics to see how we can put together the right solutions for the problems that we face.

 

I am not the one to say follow the crowds. I am all for listening to a different drummer. But, in our field, we rely on so much publicly available code, knowledge, and support, it makes sense to understand where the world is.

Consider programming languages.

image

(Courtesy: TIOBE).

How to interpret this list? This is the compiled list based on several different web properties with some weightage. How many searches, how many pages, how many blogs, how many tweets etc. Obviously, it doesn’t tell the real story. Still, you know where the buzz is. It either means that there is lot of innovation is happening or there are lot of issues with that language for people to seek help. Btw, I can’t explain why Ruby and Python are on the lower trajectory. Perhaps there is a consolidation of the information that is going on.

Let us take a look at another site: www.indeed.com which measures the job trends (not absolute numbers). You can go to http://www.indeed.com/jobtrends and try it yourself.

Here is the graph for Java, C#, Objective C, JavaScript, and Python.

image

This is in absolute scale how the growth is occurring. That may be unfair to some languages that start out on smaller basis. Let us take a look at relative trends:

image

What can you conclude from relative trends? That iOS is fuelling Objective-C growth. Still, based on my experience, these relative growth stories are short living.

So, what are the top trends in Job market these days? According to indeed.com, these are the top trends:

  1. HTML5
  2. Mobile app
  3. Android
  4. Twitter
  5. jQuery
  6. Facebook
  7. Social Media
  8. iPhone
  9. Cloud Computing
  10. Virtualization

Of course, as usual, this should be understood in the context. It is trend, not the absolute need. That is, today there may be 5 twitter developers and they may need 10 tomorrow. On the other hand, they may have 100,000 Java developers and they need 120,000 tomorrow. So, twitter has 100% growth while Java has only 20% growth rate. Where would you specialize in, if you want a surety of job? Java, of course, considering that they need 20000 people vs. 5 more people in twitter.

Coming back, let us see how to understand these trends again. The way I read it as is this: These are the skills that are gaining popularity because of a market reason. Therefore we have to understand the why the market is going that way so that we can see how we can solve those problems.

Let us focus only on web development for a second and see what the story is there. Looking at the site www.builtwith.com. They too have a wonderful http://trends.builtwith.com as well as a report for 2011.

Quoting from that report, here are a few things:

I am a fan of JQuery mobile. I know it is not as good as Sencha Touch, but it is getting there. jQTouch is popular because of the small size – but that advantage will not last too long.

If you are a young developer who is looking to specialize in an area, don’t worry about any of these: Become a world class expert – that is good enough. If you are a manager and want to play it safe, you can use these trends to formulate your staffing strategy.

 

I keep reviewing so many documents in several stages. In more than once document, people keep on assuming that Multi-tenancy is not only desired, but also mandatory for applications to move to cloud. People, that is wrong notion. We are not in dark ages any more. If any multi-tenancy idea is old and should be moth-balled (except in specific cases).

Let me make the case for you.

Let us take a look at original server applications: your sendmail, your apache server, your ftp server etc. Each one of them supported some sort of multi-tenancy. Technically, we can assign multiple “names” to a single computer and we can make each of the services respond to that name it their own way. That is a virtual server comprised of all these virtual service. Of course, there is no partitioning of the computer.

What is wrong with it? Nothing, if all you want is to provide the services to different servers. In fact, you can take a look a virtualmin, a program that automatically creates the virtual services on various application servers like mail server, web server, db server and so on.

image

Let us look at the down side. You can configure the server with disk space and some other quotas. But, for most part, the sharing of the service is a really cooperative plan. If one of the virtual servers take up all the computing or networking resources, then the other servers will have to starve. Yes, some applications do offer support for fair scheduling, but it is awfully difficult to do that.

Then there are other issues: security, for one thing. Any bug in the app may leak the information to untended parties. If one of the servers want a new version of the server, no such luck.

When we say an application is a multi-tenant, this is the property we are referring to. That is, a single instance (or a set of instances) serving multiple clients, especially, each client getting their own virtual service.

Considering the difficulty of building a multi-tenant application, there is no reason to put that much effort into making an application multi-tenant. A better option is to create a virtual appliance. It is simpler to manage, maintain, and secure such an appliance.

image

For instance, these are the appliances http://www.turnkeylinux.org provides. All free and easy to use. And, there is no need for these applications to be multi-tenant. If you need to support one more client, just fire up one more virtual appliance. (An appliance is a virtual server, that is stripped down to minimal installation so that all it does well is just run that application). A simple virtual appliance can range from 20 MB(!) to 250 MB.

Summary: Don’t worry about multi-tenancy, unless you are building a blazingly fast or performance oriented, or special purpose software like an enterprise database. Otherwise, you are best served by a virtual appliance. This is the trend and get with it!

 

I grew up with Unix (RIP, dmr!) for a long time. Originally, my idea of computing is playing with Unix. In fact, I still say that Linux taught me more about practical aspects of programming than any of my courses.

Two years back, I became involved in a large project where I was closely working with Infra and ops teams. It was a very valuable experience where I could see the impact of each architectural and programming decision we take. I knew all that before, of course, but that project made it much more vivid.

SNAGHTML37e6549

Since then, I have been spearheading a movement in my organization about nextgen application platform which brings the operational and infrastructure view to development. Virtualization, of course, is the key technology that can deliver the synergy.

The big revelation for me is how much of infrastructure knowledge is required to engineer a proper system. We think machines as an abstraction, where we put the code in and they run magically. We ignore the true costs of deploying and running the code. In fact, the wall between the development and operations is the one causing lot of application instability or inflexibility.

Incidentally, if you look at most modern computer companies (like Google, Amazon, Facebook), they embrace this philosophy of erasing the boundaries. The devops people straddle the multiple universes – see the picture below:

From Wikipedia

What is fascinating to me is this: How the trends become part of collective consciousness. One day the word doesn’t exist and then it seems to be part of every conversation. I think devops is one becoming one such word.

To be concrete, what is devops? Well, there is no one single definition. The goal, though, is clear: to align development and operations so that they can support business agility. How does devops do it? Through a a set of practices addressing the following issues:

  1. Last mile problems in development: Deployment, testing, configurations, roll outs, integration …
  2. Non-functional problems in applications: HA, DR, performance, caching…
  3. Operational aspects of applications: Log file management, log correlation, upgrades, roll backs …

I may be missing some, but by and large, this is the main focus.

What are the tools that support it? Well, as it is a nascent field, there are lot of tools that are being built to support devops. I can only list out a few:

  1. Configuration management: Puppet/Chef – so that we can automate most of the configurations, including those of applications. Extensible and declarative. It is instructive that these tools came out of people that worked at cloud companies like Amazon.
  2. Source code control systems: Git and Github seems to be the way most of these systems are developed.
  3. Monitoring tools: Hyperic etc.
  4. […]

Looks like I can’t enumerate these ones as they keep growing. Just Google and follow the list.

Meanwhile, this is what I can suggest for any developer/sysadmin who wants to become good at devops thing:

  1. Learn a modern scripting language, or even two: Python and Ruby are the popular ones.
  2. Learn the existing tools: Start with Git. Use Puppet and so on. Play with a hypervisor.
  3. Learn standard datacenter components: Learn about storage (Standard SAN). Learn about network.
  4. Learn some standard deployment architectures: Say,using varnish, HA-Proxy, and caching. Or, site-to-site replication. Or, database backups.
  5. Learn some web servers: I would suggest in-depth understanding of one web server, say apache, would help a great deal.
  6. Learn about VMs: Not merely VMware based VM’s, but JVM as well. Understand a little basics on how to monitor, manage, and optimize.

I am also new at devops – but most of the skills that are needed for devops is what an average Unix programmer used to learn in 90’s. It is just old wine in new bottle. Still vintage!

 

Before I became a pointy haired boss, I was a developer. I still develop, but none of my code goes into projects. Apart from being a PHB, I had experience in being an entrepreneur doing all sorts of jobs. Most of my work happens in meetings.

There are two problems in meetings, especially, exploratory meetings:

  • Mismatch in the understanding: The meeting may not offer us a chance to understand and restate the problems as we see them.
  • Lack of follow up: While we understand what that is said then and there, as soon as we walk out of the meeting, it is jumbled up. Even action items may not help, as the context is lost.

I find it useful to use note taking as a way to take care of these problems. Here are a few simple note taking best practices:

Always take notes on a computer

In this day and age, you don’t want to use a notebook and then transcribe. If you are anything like me, you will never end up writing and enhancing. In fact, the more you practice taking notes on the computer, the better you will become. Imagine the effect you can create by sending out the notes and your thoughts immediately after the meeting!

You can use any tool for taking notes, but I prefer freemind (or XMind). It is a mind-mapping tool, that provides a hierarchical view of the information. It is especially useful, when the information is hierarchical instead of being linear. For instance, if you are listing out all the team members and their role and other details– this information is not linear (you don’t care which order you get to them), but hierarchical (you list out the name and under the name you list the role and other details).

Start with a simple template

As I said, I prefer the mind map tool for this job. Before the meeting, I jot down the various aspect of the meeting that I am supposed to gather information about. (If I am providing the information, that is a different kind of meeting).

Take this example: Suppose we are having a meeting about a project we want to start.  You are meeting the customer as a potential project manager.  A possible information template could be:

SNAGHTML29cd723e

Of course, your headings could change. The beauty of such arrangement is that you can see the items of similar importance at the same font size and same distance from the center (Radial Hierarchy).

Keep the hierarchy in mind – use it to guide your questions

As the meeting progresses, you find yourself doing the following:

  1. As you get information about any topic, you will put it under that topic. Good – this is the way to go.
  2. You are getting details about a topic, but the context is not there:  What it means is that you are at level 1 and you are getting details about level 3. You will establish the context at level 2. Example: You may be given details about the project users, without categorizing the details. That means you will supply the categorization (like business users, consumers etc.).
  3. You are getting new top level topics: That may mean two things: you may not have thought about the meeting to give a good starting point. Or, the meeting is going off direction. In the first case, you add the topic to your mind map. In the second case, you nudge the meeting to the right topic: (“Before getting to those details, can you please tell me who are the people involved so far in the project?”).

If all things go well, you will get a rich description of the meeting. And, you will look like a genius for providing a structure and context to the information.

One advantage with mind-map tool is this: By looking at the picture, I can tell if we did a good job on information gathering or not. If we are too deep in one topic, the image shows the imbalance. If we did not cover a topic or excessively covered a level 2 topic, the picture clearly shows.

Publish it immediately

You think you will refine the mind-map, create a document and then publish it. Trust me, you won’t get around to it. I suggest you correct the typos, enter any contact information, clear out any questionable material, pay attention to the action items, and if needed, add your perspectives (make sure that is marked as your take on the meeting), and then publish it immediately. I publish my mind-map meeting minutes in less than 30 minutes after the meeting, in general.

Summary: Always take notes if you are participating in an exploratory meeting. Use a computer and a program to take the notes. Prepare an outline and use it to guide the meeting. Publish the finished outline.

 

http://www.kanneganti.com/social/my-beloved-city/

It has been 10 years since I wrote that piece in a moment of anguish and shock. Ten years passed by, bringing in more and more dystopian visions of the future – curtailment of liberties, heavy-handed government, needless wars, suffering of the innocence, self-censoring of the press, and untold missed opportunities for golden future.

In the last ten years too, lot of things have changed. I moved farther and farther away from my childhood as it receded slowly from memory. First to go is the poetry. None of the old memories – the rains, the stars, the morning walks to the animal yard, the idle cards players in the library – seep into the semi-conscious morning. Instead of feeling the lyrics of the songs, now I merely listen to them.

Simply put: life goes on. We get used to things that we never understood. We take off shoes silently, paying the homage to TSA Gods and proceed to the altar of the winged machines. Does it remind me of my temple going days? Did I take of my shoes in quiet obedience then? I don’t recall that 10 year old person — – did he understand the nature of the God? Did he marvel at mornings and evenings? Did he stare at the stars to brand that image into the brain forever? What did he think of the world?

As memories fade away, I lose a bit of myself. I forget the excitement of the first day of school, earnestness of skipping the water puddles on the roads, first flush of youth, the sweet anticipation of exam results, and fateful farewell from the familiar.

Then, in forgetting those old memories, we make new ones. I suppose these memories are too static, stealing a moment of contentment from everyday life, or a happiness shared. It is not the same as exploring the world with wondering eyes — none of the childhood stuff.

Perhaps these new memories are not so far back as yet to romanticize. Perhaps I need to age before I look back wistfully at the mundane routine of meeting with friends in Farmington library, or taking the kids to ice-skating on a snowy morning in Michigan.

Till then, I will try to hang onto my old memories a bit longer, thank you.

 

With more than 20,000 participants and 400 exhibitors, VMWorld is the place to be for people interested in virtualization, cloud, and new trends in IT, as driven by fast changes in infrastructure. Of course, with $3000 per head, it is also one of the most expensive conferences out there. I, along with VMware’s Thirumalesh, presented on migrating to cloud-ready platforms from Weblogic:

First, the general trends

VMworld 2011 at Las Vegas

It is clear VMware spawned one of the biggest eco-systems in the world.

To start with, all VMware has is the ability to virtualize a machine. That is, it can run a program on a machine so that the machine can appear to be multiple machines or VM’s. It means, on one server or PC, we can run multiple OS’s.

Next up, they provide the ability to take an application and OS and bundle it as a thin VM (just enough OS to make that app run). That means, all the issues of porting to different OS’s and incompatibilities of OS’s disappear.

Next, they moved along the direction of managing the VM’s. They can manage capacity leveling, load balancing, and sharing across the VM’s. They can manage a cluster of host machines running several VM’s. For instance, they can migrate one VM from one server to another server. The potential management possibilities are enormous. Your server going down? Just move your VM to another server and be done with it. Or, there is a hurricane coming your way? Move your VM to a server in a different country.

And, of course, if you have the capability of capturing, restarting, replicating the computer, then you can do lot of other interesting magic: horizontal scaling, recovery, and backups. Of course, lot of these capabilities need support at various levels: computing, storage, and networking.

The traditional way of viewing the virtual server market: Compute, Storage, and Networking is maturing progressively. There are lot companies that are solving some problem in the management, provisioning, operations, and decision making in the virtual infrastructure area.

One interesting trend is virtualizing at the device level. VMware is working on technologies to push a virtual device onto android or iPad (it is bit more convoluted – it is actually an interface) so that people can own their own devices and yet access company resources with a clear wall between personal and company. If a person leaves the company, a central admin can de-provision without having to be there physically.

octupusOn one hand, the trend of managing the personal devices for the company is exciting. Still, I have mixed feelings about it (I think it is encroaching on the capabilities of the web – why are we going down this path anyway?). Watch out this space for a detailed blog about my perspective on that.

Next, journey to the cloud

Stick figure guide to cloud computing from Tier 3

As interesting as the the infrastructure is (and lucrative too), VMware is spending lot of time and money on the cloud platforms. For them, it is a part of the strategy to get the organizations move to the cloud. They invested heavily in SpringSource platform under vfabric. They are also investing in other platforms to make sure that they run better in VMware’s cloud. In addition, they are looking at other platforms as well as long as it fits into their cloud vision.

vFabricDiagramOne of the big issues with their platform vision is this: they are not leaders in the platform business. They are not, historically, a player in providing any software stack. Other vendors (IBM, Microsoft, Oracle etc) or other cloud players (SFDC, google) are bigger at this platform business. So, VMware is making a big push towards this platform.

The first impediment towards this platform is that there are lot of applications that companies have developed in standard J2EE platform. These technologies are complex, with too many levers to operate and optimize. It means, for VMware, they are an obstacle to successful movement to cloud. Yes, they can be virtualized, but without many benefits that VMware can offer.

If you are wondering about the benefits, consider one example: VMware’s hypervisor (the application that runs all the VMs) as you know allocated memory to the VM’s. Now if it can know how the machines are using the memory, it can do some dynamic adjustment so that it can manage more VM’s for the same memory. It may be that different machines are peaking at different times. Now, with regular VM’s VMware runs a program at OS level to help with this smoothing out of the demand for memory. If it java program, how can it do that? Enter EM4J – elastic memory for java. It runs only on tcServer, a part of vFabric.

All this means is this: VMware likes to see transition to the vFabric from other platforms so that they can win the cloud war.

This is a short note – soon, I will be writing about my impressions of micro foundry.

 

It is obvious to most people that, at heart, I am a programmer. I am not a good artist. My wife wouldn’t trust me to pick colors for our bath room. She wouldn’t even let me pick the tiles! But, obviously, I am qualified to make judgments about UI design Smile.

image

Deprecations aside, I worked hard to compensate for my innate weaknesses: I spend long hours on the internet. Whenever I see a website, I view the source and then understand how they did each page. I try to even guess what kind of software they used in the backend. I regularly check out standard sites. I took college level courses in history of art.  In my misguided youth, I even designed fonts. I am writing al this preamble only so that you don’t think that I am ranting.

Long back (in the internet time, it means just 5 years back), we used to get user interface designers. We expected little from them because we didn’t know much. All we told them was to make it look pretty. What we used to get is always a windows interface tacked on to the web:

image

Essentially, you have a menu bar, and then the menu bar lets users perform activities on the data in the screen. The menu can be on the side, but that is still the same principle. Even if you have multiple menus, it is somewhat same.

What is the problem with this? Nothing, if you are addressing the same audience. But, most applications on the web are not like windows applications. They are for different purpose, for different audience, to be used in different context. For instance, they may be for casual users who do not know all these menu’s are about. Or, they may be for doing a series of standard steps. Just like how, even in windows, you have different kind of UI’s for specific tasks (like say, a wizard), you do need to adapt to the activity and the users.

If we want something better than this monstrosity, what do we need? Since sky is the limit, let me stipulate the following:

  1. We are not developing a world class system where UI is the main innovation. We know that the difference between facebook and myspace is say 50(?) billion dollars and better user interface. If you have such great ambitions, what I am going to say may not work for you.
  2. We also want the best UI that our limited budget is going to buy. Surely, it is the functionality we are selling. At the same time good UI make the functionality actually usable.
  3. We are building enterprise applications with only limited number of users. We are not building a website for the millions.

You may think it is restrictive – but wait till you see remedy interface. I wish they spent all of $500 on the interface!

What I expect the UI developers to do

With those goals, if I were to design a web applications, this is what I would want my UI people (or UX people) to do:

To be in between users and developers

I want my UI people to know the users well enough to disregard UI requirements and create a cohesive and coherent interface. UI design is not about focus groups and satisfying different users. It is about understanding what users really want to do and provide a consistent way to do it.

Similarly, they should understand enough about developers to see map the requirements to familiar patterns. Most developers don’t develop well, if you ask them to do it from scratch. They can imitate well; they can reconstruct reasonably well. UI developers should be able to map the user requirements to familiar patterns in the developer toolbox.

Adapt Industry best practices

Either we lead or we follow. Looking at the constraints that I laid out earlier, I say we should follow the leaders. There is nothing intuitive about any design (except some – like pulling a loose cord, or pushing a stone – blame it on our nomadic hunting days). Most of the complex paradigms are learnt over years of experience from using the systems.

image

And, innovation over those complex interactions is somebody else’s job. Your job is merely understanding the right innovation and using them in your system. In fact, I have seen that any innovation fail miserably because of lack of talent, support, or budget. So, stick to what worked outside.

Integrate the standard paradigms

Most enterprise applications are about data entry and queries. Workers add information and managers review information, approve applications, or get reports. There are lot of variations in the usage paradigms: workflow, process flow, content management, user interaction, social networking, CRM, helpdesk, self-service, e-commerce etc.

From http://ui-patterns.com/

Most modern applications are a combination of all of these. A good UI person understands the nature of the application to get the UI representation of these standard paradigms and map the user requests to the design. If not, what we get is a bunch of pages, that ignores the connections and neglects the possibilities of achieving a coherent vision. Worse still, it will not leverage the established UI patterns and therefore ends up being too costly.

Manage the costs of technology

I recall an incident that a friend of mine told me. The incident happened in the hype of Java applets age. The client wanted the comments field to support variable width fonts and some limited formatting capabilities. The UI person, who was working on the wireframes put that as a part of the design. Unfortunately, the design was not supported in HTML in those days (no/low css; low adoption of JS). So, the developers were forced to use Java applet. That complicated development, testing, and deployment. When the final ballooning of the cost was shown to be more than $200K, the client was dumbstruck. “Why didn’t you tell me that it was going to cost so much – it was not a big deal!” was his last plaintive cry before being wheeled into emergency room.

image

So, I want my UI people to know the standard Javascript libraries. I want them to know the what plugins exist, what are the costs of doing different interactions. I want them to know some security repercussions of the designs as well. And, with that understanding I would like them to design the user interactions.

Work with templating systems

I often find in the following difficult situation. I get a bunch of screens (HTML +CSS). I do not know the relationship between the pages. Not the navigation scheme, but the relationships. As a programmer, I am used to seeing pages as a hierarchy. Like, say, the results page is just like any other page except the results are in the middle of the page.

(Repetition is good, if we can identify the hierarchy and differences)

In short, if we are able to create a hierarchy of pages, each child specializing the parent page by adding more element, I can quickly prototype and build a system that can be changed at will. If not, each page will get programmed and changing the look and feel of large number of pages depends on the programmer skill.

Unfortunately, here we get into a snag often. The tools the UI people use do support this kind templating system (Dreamweaver and so on), but they do not or cannot expose that to the developers because the developers use different kinds of templating systems. I cannot offer a universal solution – all I say is, the UI people should be able to work it out with the developers.

What I look for in the UI designers

Now that I specified what I want them to do, I will tell you what I look for in the interview:

Knowledge of CSS

Most UI developers are quite good at this. If we want low cost solution though, your best bet is to go with something standard like using Jquery UI. If we are using Jquery UI, it comes with its own themes. Any average UI person should be able to develop the themes to meet the client requirements. The advantages are many fold: you can use lot of plugins as is. You can reuse developer skills as well. [Here is a simple introduction.]

image

Knowledge of Javascript

In the modern days, any reasonable UI requires a Javascript person on the team. Even the UI people should be familiar with the standard libraries, what they can do, which are the ones that we decided to go with and how to use them to achieve the interactions we need. For finer customizations, we may need a full-fledged JS developer, but most of the basic ones can be done by UI developers.

image

Color schemes

I know that this is supposed to be the forte of the UI developers. I have often been disappointed with the UI team’s failure to develop a good color scheme. If you are forced to do it, just look at these two resources: color wheel and kuler.

image

Icons

A simple good set of icons can spice up the design and get user attention easily. Here is where we can create a unique branding, play on the color scheme, and provide good visual relief. If the UI designer cannot design the icons, at least he/she can buy them or get them from public domain. Ability to incorporate the icons in the web design for consistency and usability is something I look for in a UI developer.

image

On the issue of stock images: Most corporate applications get spoilt by usage of stock images. How many times have you seen a bunch of white people in suits, with couple of minorities thrown in to indicate seriousness of the site? I agree that you can create a good ambience and set the tone for the site by the stock photos – but that is rare enough that I lost faith. In any case, if you want good stock images free, you can always go to this site. Summary: Don’t expect too much in the way of stock photos from your UI designer.

Fonts

A few words: CSS; standard set of fonts; good readability; ability to use web fonts. And, of course, use them as a theme for using Jquery UI or some such system. If your UI engineer can do that, that would be good enough.

PS: If you are a developer and think that your designer is producing that is way too difficult to develop for, or not doing a good enough job, you can do a few simple things:

  1. Use Jquery UI for your design
  2. For your screen layout, check out the following: http://www.openwebdesign.org/. For instance, I found this http://www.openwebdesign.org/design/3499/multiflex32/ design – a reasonable, if a bit old fashioned one.
  3. For stock images, check out: http://sxc.hu
  4. For your icons: There are several open source ones. While I like the ones from the nounproject, they are not in color.
  5. For your fonts: check out http://www.google.com/webfonts
  6. For your UI paradigms, you are on your own – you need to understand the user needs and what your application is about!
 

[A caveat for the experienced people: this is a highly simplistic introduction to college grads. I only mean to provide some overall view.]

I am a relatively new comer to data warehousing. I come from OLTP world, where we take lot of transactions and put them through the databases. Early in my career, I worked in data warehousing, specifically archiving.

Historically (For the most readers at least), RDBMS have been the mainstay of the OLTP system. Get a high end RISC (*nix) computer and put Oracle on it – this is the standard OLTP in most shops. There have been lot of changes in the last 10 years, starting with the rise of low-end databases with different performance characteristics (lot of reads and few writes or explicit transaction control etc). But, I digress.

You have your database where all your data is storied; but the people upstairs want a different database for other purposes. Why? What other purposes could there be? Why not use the same database?

Consider the case of a Acme company that sells widgets to coyotes on an e-commerce site. Naturally, they use an OLTP system to sell their widgets – from the record of the sale to current status of the order. But, if they want to generate standard reports (say, how many widgets sold in which state for the last month), they cannot go against the OLTP database – obviously, that can interfere with serving your e-commerce. Or, you may want to look at the trends – which products are selling fast in the last month. Naturally, you do not want to hit the OLTP database.

What you need is a separate DSS (Decision support system) to support your queries and reports. Here is the way it would fit together.

image

Of course, this is highly simplified picture We will see how it can evolve and can reflect real-world scenarios as we proceed.

A small note: You will encounter words like OLAP. OLAP is what one kind of DSS system does. You also will hear about Datamart. You can think of it as a subset of data (or a mini data warehouse). DSS database is called data warehouse for the most part. DSS is used for the applications+data warehouse.

Why did we have to introduce the ETL tool at all? Consider the following possibilities:

  1. The schema in OLTP is optimized for specific purpose: to run transactions. The data warehouse is optimized for a different purpose – ability to run queries and reports faster. So, that schema is different from OLTP. ETL tool can do the translation.
  2. The data in the warehouse may have to come from multiple different databases. Typically, you have the departmental, corporate warehouses that collate the information from multiple sources. You need ETL for that.
  3. The data may lie in different kinds of databases as well as other sources. ETL deals with all those different sources.

Standard ETL tools are: Informatica, Ab Initio, or in the open source Kettle and Talend. (More on ETL tools later).

Before going further, you are right to ask: What other kinds of uses are there for databases? As a web developer, I am familiar only with OLTP databases. Tell me about the other standard ways that databases are used.

Here is a simple view, that will suffice for the beginners:

image

Of course, to solve these different needs, we do have different kind of databases (and constantly invented as well). Recall that most databases speak SQL, which is a bit of a burden on developing a new database. Fortunately, some databases break that taboo for greater glory. In another note, we will get into those details.

Here are a few take-away’s for the beginners:

  1. Standard database that we generally learn in college or while programming the web, is good for transactions but not well-optimized for other needs.
  2. There are lot of other ways databases are needed; and there are different kind of databases that support different needs.
  3. As you are creating all these different kind of databases, you will encounter different data transformation needs and other needs pertaining to integration – which is solved by yet another piece of software like ETL (and a few others – you will see later).
 

In today’s installment, I will show how to install a hypervisor. I choose a free product VMware hypervisor to install on my computer so that I can run VM’s.

What is a hypervisor? It is an hosting environment to run virtual machines. There are two kinds. The first kind runs on the bare metal (no OS needed) and the second one runs on an OS. Naturally, the first kind is faster. In case of VMware hypervisor, it runs a barebones version of Linux with hypervisor software, file system and a few other tools.

Step 1: Download the software

Hypervisor itself is free, which you can download it at: http://www.vmware.com/go/esxi. Make a note of the license number they give you – you need that later. The old name for hypervisor is ESXi. You will find references to vSphere, which is the enterprise version which costs lots of money. You need vSphere Client as well, but that can wait. The software comes in the .iso format. You can mount it in windows using http://www.daemon-tools.cc/eng/products/dtLite (daemon tools).

Step 2: My machine is not supported!

First thing I did was to burn the iso to the the DVD. I installed Linux countless times from 1994 and thought I could get this past quickly. Little did I know that VMware removed support for standard components like Mobo based network interfaces. Too bad.

First thing is to remember what components I have in my system. The mobo (mother board) information comes up during boot times, but I still wanted a complete list. An evaluation copy of AIDA 64 listed the complete list of the machine hardware details after I installed in that machine.

What I need is to do the following:

  1. Get the driver: You can find most drivers as oem.tgz here.
  2. You need to modify the boot image. You can use ESXi customizer for this purpose. It creates a new iso for you.

At this point, you will be running out of DVD’s to burn. I wanted to use flash drive to install my OS. That way, if there is a mistake, I can always rewrite that.

Step 3: Creating a bootable flash drive

Most mother boards support booting from flash drives. When booting up, you can enter the setup (by pressing DEL often, but it tells you what to press to edit the setup) and edit the boot sequence. You may have to insert the flash drive before and rescan it and then setup the boot sequence. That is what I did.

To prepare the flash drive, there are several ways. Here are the two methods (thanks to: vmhelp site).

  1. Use syslinux(make sure you use version 3.8x – the version 4.x doesn’t work):
    • First format the flash drive in DAT32 mode.
    • Go to syslinux/win32 folder and run the command: syslinux.exe –mbf G: (assuming G: is where your flash drive is).
    • Copy the contents of the iso that you created with ESXi customizer. Again, mount the iso using daemon tools.
    • Rename the isolinux.cfg to SYSlinux.cfg.
  2. Use Unetbootin for creating the flash drive.

Once you created the boot drive, you will notice that you still will encounter issues. Specifically, the system needs to know that it needs to copy files from the flash drive. Here is what you can do, using kickstart script.

First edit the syslinux.cfg file:

menu title VMware VMvisor Boot Menu
timeout 80

label ESXi Installer
menu label ^ESXi Installer
kernel mboot.c32
append vmkboot.gz ks=usb --- vmkernel.gz --- sys.vgz --- cim.vgz --- ienviron.vgz --- install.vgz --- mod.tgz

label ^Boot from local disk
menu label ^Boot from local disk
localboot 0x80

All we did is to add  ks=usb Now, we need to create a ks.cfg in the following way:

vmaccepteula
rootpw password
autopart --firstdisk --overwritevmfs
install usb
network --bootproto=dhcp --device=vmnic0

Thanks to Jonathan Medd for this guidance. Now, boot from the flash drive and finally, the system will copy the files to the first bootable drive and complete the installation. Finally, remove the flash drive and reboot the system. Before the reboot is done, for a good measure, adjust the booting sequence.

Step 4: Configuring the hypervisor

When you first login (remember the login password that you specified in the ks.cfg file (password)), you can do a few adjustments. This is what I did:

  1. Move from DHCP to static IP. Makes it easier to remember the IP number.
  2. Set the DNS to static DNS (I have a forwarding DNS running on a different machine, that caches the requests – I use that as well as google’s which is 8.8.8.8).
  3. Enable local tech help mode (which lets me login on the local console into shell).
  4. Enable remote tech help mode (which lets me login via ssh).

The last two are not a good idea security wise, but I decided to do it temporarily.

Step 5: Installing the client

There are four/five ways to interact with the hypervisor:

  1. You can login locally to do what you can with shell.
  2. You can ssh into the box to do the same.
  3. You can run remoteCLI that VMware gives.
  4. You can use web services SDK and forge your own programs.
  5. You can use vSphere Client, a gui program to create/manage your virtual machines.

Naturally, most people opt for the last one. I am doing the same too (except that I use the ssh as well. More later). The licensing restricts some functionality. It would be fascinating to see if web services are also restricted.

Btw, once you start the vSphere client, you can provide the license that you got when you were downloading. Otherwise, you are in 60 day evaluation mode.

Step 6: Installing an existing VM

The web is replete with virtual machines. Most virtual machines are created for use in VMPlayer. THe hypervisor only lets you create/clone machines. How do you transfer the new machines in here? As I see it, there are two choices:

  1. You can use vmware converter to convert the existing physical machines. It can covert to and upload the image to the hypervisor.
  2. You can move the vmdk file to the hypervisor (you can use scp or winscp)  and convert the file with the following command: “vmkfstools -i sourcefile.vmdk destinationfile.vmdk”. After that, when you are creating the virtual machine, you can link into this file.

I did the second type.

In the next posts, I will describe the machine setups that I am planning on (mostly for various BI projects that I am planning on). Specifically, I will see if we can use openstack with this hypervisor.

© 2011 Rama's home page Suffusion theme by Sayontan Sinha