Dec 312014
 

Because of Facebook, I have been in constant touch with friends, acquaintances and even people that I did not meet. I am using this annual letter as a way of summarizing, introspecting, and filling in the gaps in my usual communications about technologies to friends. It is heavily slanted towards technology, not the usual intersection of business and technology.

There are three ways that I learn about technologies. One is by experimenting on my own. By actually coding, practicing, verifying hunches, validating ideas, and playing, I learn a bit. By talking to customers, sales people, engineering managers, and developers, I understand what the problems of application of technologies are. By reading books, news papers, and blogs, I try to see the interrelationships between different subjects and the technology influences in the modern world. Let me review from the annual perspective, what these three different influences taught me.

(Cloud) Container based technologies

I played quite a bit with container based technologies. Specifically, I tried various docker based tools. In fact, I setup systems on Digital Ocean that lets me create a new website, make modifications, and push to public, in less than 10 minutes. That is, using the scripts that I cobbled together (most of them are public domain, but I had to make some tweaks to suit my workflow), I can create a new docker instance, initialize with the right stack, provision reverse proxy, and push the content to that machine, install all dependencies, and start running the system.

 image

Minimal, cloud-ready OS’s

From my experiments with docker, I see that the power of virtual machine that can run any OS is not much useful. I don’t want to think in terms of OS. What I want is a stack to program in. My OS should be cloud-aware OS. It should be minimal, should work well in a hive, should support orchestration, and should be invisible. Since I am unlikely to interactively work in it, I would place a premium on programmability, and support for REST services. Of course, it needs to secure, but since it is minimal OS, I want security at different layers.

image

Based on all these requirements, I see hope for CoreOS kind of OS. I don’t know if coreos is it, but something like that — a minimal, cloud ready OS is going to succeed. Companies like Google and Facebook already use such systems for their internet scale applications.

(Server side technologies) Node.js technologies

I entered this year having learnt a bit about node.js. I have a love-hate relationship (who doesn’t?) with JavaScript. On one hand, I love its ability to treat functions as first class objects, its polymorphism, its ability to extend objects etc. Its approach to type system is dynamic, flexible, and incredibly powerful.

Yet, I find lot of things to hate about JS. It scoping is infuriating. Its support for basics of large scale programming are absent. Its ability to type check are minimal. It leaves our crucial features of programming, letting users create competing idioms.

For small to medium systems that are cobbled together by REST services, node.js still is a quick way of getting things done. And, I like npm — it is even better than CPAN. I am not a fan of reinventing the wheel with all the tools like gulp, bower etc. The proliferation of these tools is confusing, putting off a casual user with their own standard tools. (Java was the original culprit in these matters.)

image

In Node.js, the technologies I played with are:

  • Express: Of course, duh. There may be better ones there, but I needed one standard one in my arsenal. This one will do.
  • Mongo: The power of using JavaScript across all the layers is seductive. I don’t have to translate from one format to another. And, let somebody worry about performance (well, not actually, but will do for the most part).
  • Usual set of libraries, involving, parsing, slicing and dicing, and template management.

Front end JavaScript

I have been frustrated with the front end technologies. There are too many frameworks. MVC? MVCC? And, the complex edifice makes my head swim. At the end, I am not able to make sense of it all. Thankfully, I see things changing. I am able to settle down on a few choices for myself — not because they are the best (in some cases, they are), but they are good for a generalist like me.

JQuery: I still find it the best way to manage a DOM from JS. I never bothered to learn the full DOM API, and I find JQuery convenient way of managing.

Web components: Specifically, I fell in love with Polymer. Why? Think about this. HTML has a standard set of tags. Suppose you want to introduce a new tag. You need to create JavaScript that parses the new tag and manages it for you. So, your code is strewn in a few places: the JavaScript code, CSS specifications, and the HTML. It is too messy, too unmaintainable, and more importantly, difficult to mix different styles.

Enter web components. You can create new elements as first class entries in DOM. The looks of the new element are specified via CSS in there itself. The behavior through the JavaScript also goes there. You expose specific properties and interactivity. The big thing is since it is first class element in DOM (as opposed to translated to standard elements through JS), you are able to reference it from JQuery and manage it just like you would a heading.

image

Since not many browsers implemented web components, we need a Polyfill, a way of mimicking the behavior. Thanks to Polymer, now we have JavaScript code that makes it appear that the browser supports this DOM behavior of web components. This polyfill intercept every call to DOM and translates appropriately.

Summary: It is slow and buggy at the moment. In time, it will take off, creating a nice 3rd party market for web components. It almost like what COM did for Microsoft.

Assortment of libs: While I did not reach my ideal setup (where the machine can speak IKWYM – “I know what you mean: language), there are several libs that help me now. Specifically, I like the templates with inheritance like nunjucks. I also use Lodash to make life simpler. And, async.js to use the async paradigm of JavaScript.

HTML looks and feel

As an early adapter of Bootstrap, I stand vindicated in pushing it for my corporate users. Now a days, almost all development we do is responsive, built on standard template driven development. Within that, I dabbled with a few design trends because I found them interesting:

  • Parallax Effect: You see pages where the background images roll slower than the text? It gives a 3D layering effect. It is particularly effective in creating narrative stories. When the images are also responsive, this 3D effect can make the web pages come alive. To take a look at some examples, see: http://www.creativebloq.com/web-design/parallax-scrolling-1131762

  • Interactive HTML pages: Imagine you are telling a story, where by changing some choices, the story changes. For examples, tabs change the content. But, imagine creating new content based on the user input, not merely showing and displaying the content. For instance, once we know the name of the reader, age and other details, it is easy to change the text to incorporate those elements. Or, if we know what they need, we can directly address in a sales dialog. While I did not carry out this idea to completion, I satisfied myself which the technology and the framework to do this interactive narrative sales tool. Naturally, this framework has a little bit of JS magic.

  • Auto generation of web pages: As I was writing text, converting the text to HTML and a suitable web page became an obsession with me. I finally settled down to using combination of md5, bootstrap, and yaml properties to generate a good looking web page to share with people.

If you are interested, please see these two blog posts, from yester years:

  1. http://www.kanneganti.com/technical/engineers-guide-to-understanding-web-design/
  2. http://www.kanneganti.com/technical/html5-design-knowledge-for-application-developers/

Static web apps

As I started seeing the advances in the web front ends, I see the possibilities of static web site. For instance, it is easy to create and run a static e-commerce application, with 10K or so SKU’s without any trouble. We can even have recommendation engine, shopping cart, various payment methods — all these thanks to web services and HTML5.

image

The following are the technologies that I found useful in the static websites.

  • markdown for html generation: For generic content, markdown is easy to author format. In fact, we can even author books in this format.
  • asciidoc for book authoring: For larger format HTML with more whizbangs, asciidoc is what I tried.
  • docpad for static website generation
  • pandoc for format conversions

For the in-page manipulation of large number of objects, I find the following very useful:

  • pourover: The amazing slice and dice web sites for displaying large tables is done by pourover library from NY Times. I have high hopes for it. I think there are lot of innovative uses for this library, with its performance, and ability to cache the results.

One of my favorite problems is to develop a web applications, without any db in the backend, a pure static application that acts as a library interface. We can search, slice and dice the selection using various criteria. For instance, latest document about a particular technology, written by author XXX, related to YYY document.

Mobile & Hybrid application development

I have for a long while, bet on hybrid application development, instead of native application. Now, I concede that on the higher end market, native apps have an edge that is unlikely to be equaled by hybrid applications. Still, in the hands of an average developer, hybrid platforms may be better. They are difficult to get wrong, for simple applications.

image

This year, I was hoping to do native application development, but never came around to it. With polymer being not yet completely ready, I dabbled very little with Angular based framework called Ionic. It was OK for the most part.

Still, for simple pattern based development, I hold lot of hope in Yeoman. For corporate contexts, one can develop various scaffoldings and tools in Yeoman generator framework. That leads to compliant applications that share the standard look and feel without expensive coding.

Languages

In my mind, right now, there are three kinds of languages: ones that run on JVM — that includes scala, Java etc. Ones that translate to Javascript: these include Typescript, Coffeescript etc. And, the rest, like Go etc. Innovation in other languages has slowed down.

a1

Despite that, the three languages I dabbled this year are: Scala for big data applications, specifically for spark; Python, again for data applications, specifically statistical processing, and Javascript, as I mentioned earlier. I liked typescript, especially, since it has support from Visual studio. I started on R, but did not proceed much further.  Another language I played with a bit is Go, in the context of containers and deployments.

Data and databases

This year, I gave a 3 hr lecture in Singapore on bigdata, in the context of internet of things. I should have written it up in a document. The focus of that talk is what are the different areas of interest in big data are and what technologies, companies, and startups are playing in those areas.

image

This holidays, I experimented with Aerospike, a distributed KV database developed by my friend Srini’s company. Whatever little I tried, I loved it. It is easy to use, simple to install, and fast to boot. According to their numbers, it costs only $10 per hour to do 1 million reads per second on google compute platform. I will replicate and see how it compares against other databases like Redis and Cassandra that I am familiar with.

On the statistics front, I familiarized with basics of statistics, which is always handy. I focused on http://www.greenteapress.com/thinkbayes/thinkbayes.pdf to learn more. I followed Quora to learn about the subjects in Coursera. I wanted to get to machine learning, but that will have to wait for 2015.

On particular aspect of big data and analytics that fascinates me visualization. I played with D3 — it was of course the basis of most of the visualization advances that we see these days (http://bost.ocks.org/mike/). I am on the lookout for other toolkits Rickshaw. I will keep following it to see the new upcoming advances to make it more main stream.

Other computing tools

I find myself using the following tools:

  1. Atom for editing text files.
  2. Brackets.io for editing html files, and for live preview of the static web pages.
  3. VMWare workstation for my virtual machine needs.
  4. Markdown for note taking.
  5. Developer tools from Chrome for web development.
  6. Visual studio for web development
  7. Digital Ocean as my playground of coding activities
  8. OVH for any large size applications and servers

Customer conversations

Since most these conversations have some proprietary content, I cannot give full details here. In general, the focus in on innovation and how corporations innovate in the context of established businesses. Typically, it is a mix of technology, processes and organizational structure transformations to change the way businesses are run. I will need to talk about in byte size postings some other time.

Wish you a happy 2015! May your year be full of exciting discovery! See you in 2015!

Feb 182014
 

I attended Strata last week (Feb 11-13) in Santa Clara, CA, a big data conference. Over the years, it has become big. This year, it can be said to become mainstream — there are lot of novices around. I wanted to note my impressions for those who would have liked to attend the conference.

Exhibitors details

The conference exhibitors can be distributed into these groups:

clip_image002

As you can see Hadoop is the big elephant in the room.

Big picture view

Most of the companies, alas, are not used to the enterprise world. They are from the valley, not the from the plains where much of these technologies can be used profitably. Even in innovation, there are only a few participants. Most of the energies are going in minute increments of usability of technology. Only a few companies are addressing the challenge of bringing Big Data to main stream companies that already invested in plethora of data technologies.

The established players like Teradata, Greenplum would like you to see big data as a standard way of operating along with their technologies. They position big data as relevant in places, and they provide mechanisms to use big data in conjunction with their technologies. They build connectors; they provide seamless access to big data from their own ecosystem.

clip_image003

[From Teradata website.]

As you can see, Teradata’s world center is solidly its existing database product(s).

The new comers like Cloudera would like to upend the equation. They compare the data warehouse with a big DSLR camera and the big data as a Smartphone. Which gets used more? While data warehouse is perfect for some uses, it is costly, cumbersome, and doesn’t get used for most places. Instead, big data is easy, with lot of advances in the pipeline, to make it easier to use.  Their view is this:

clip_image005

[From Cloudera presentation at Strata 2014].

Historically, in place of EDH, all you had was some sort of staging area for ETL or ELT kind of work. Now, they want to enhance it to include lot more “modern” analytics, exploratory analytics, and learning systems.

These are fundamentally different views: While both see big data systems co-existing with data warehouse, the new companies see them taking on increasing role to provide ETL, analytics, and other services. The old players see it as an augmentation to the warehouse when unstructured or large data volumes are present.

As an aside, at least Cloudera presented their vision clearly. Teradata on the other hand, came in with marketese which does not offer any information on their perspective. I had to glean through several pages to understand their positioning.

A big disappointment is Pivotal. They ceded the leadership in these matters to other companies. Considering their leadership in Java, I expected them to extend Map Reduce to multiple places. That job is taken up by Berkeley folks with Spark and other tools. With lead in Greenplum HD, I thought they would define the next generation data warehouse. They have a concept called data lake, which is merely a concept. None of the people in the booth were articulate about what it is, how it can be constructed, what way it is different, and why it is interesting.

Big data analytics and learning systems

Historically, analytics field is dominated with descriptive analytics. The initial phase of predictive analytics was focusing on getting the right kind of data (for instance, TIBCO was harping on real-time information to predict events quickly). Now that we got Big data, it is not so much as getting the right data, but computing it fast. And, not just computing fast, but having the right statistical models to evaluate correlations, causations and other statistical stuff.

clip_image006

[From Wikipedia on Bigdata]

These topics are very difficult for most computer programmers to grasp. Just as we needed understanding of algorithms to program in the beginning, we need the knowledge of these techniques to analyze big data these days. Just as the libraries that codified the algorithms made them accessible to any programmer (think when you had to program the data structure for an associate array), new crop of companies are creating systems to make the analytics accessible to programmers.

SQL in many bottles

A big problem with most big data systems is the not having relational structure. Big data proponents may rile against the confines of relational structures, but they are not going to fight against SQL systems. Lot of third party systems assume SQL like capabilities from the backend systems. And, lot of people are familiar with SQL systems. SQL is remarkably succinct and expressive for several natural activities on Data.

A distinct trend is to slap on SQL interface onto non-SQL data. For example presto does SQL on Big data. Or, impala does SQL on Hadoop. Pivotal does Hawq. Hortonworks does Stinger. Several of them modify SQL slightly to make it work with reasonable semantics.

Visualization

Big data conference is big on visualization. The key insight is that visualization is not something that enhances analytics or insights. It itself is a facet of analytics; it itself is an insight. Proper visualization is the key to so many other initiatives:

  1. Design time tools for various activities, including data transformation.
  2. Monitoring tools on the web
  3. Analytics visualization
  4. Interactive and exploratory analytics

The big story is D3.js. How a purely technical library like D3.js has become the de facto visualization library is something that we will revisit some other day.

D3.jpg

Summary

I am disappointed with the state of big data. Lot of companies are chasing the technology end of the big data, with minute segmentation. The real challenges are adoption in the enterprises, where the endless details of big data and too many choices increase the complexity of solutions. These companies are not able to tell businesses why and how they should use Big data. Instead, they collude with analysts, media, and a few well-publicized cases to drum up hype.

Still, Big data is real. It will grow up. It will reduce the costs of data so dramatically to support new ways of doing old things. And, with right confluence of statistics and machine learning, we will see the fruits of big data in every industry. That is, doing new things in entirely in new ways.