Wednesday 14 October 2020

Big data with Hadoop


Data and Big Data

Tounderstand Hadoop and it’s working, it is necessary to know what Big data is. As the internet grew and with the advent of Web 2.0 & 3.0, demand for data grew. Data collected and shared increased. Particularly with social media and IoT , there is an exponential growth of data stored, transacted and accumulated. Big data is an umbrella term for a large volume of data.

Big Data Analytics

Data is information. Information that is used for planning, for scheduling, forecasting, budgeting, mitigating, etc. The larger the data set, more accurate the information. Hence the importance of Big data. Big data helps an enterprise to uncover information like buying patterns, demographic preferences, trends, correlations, etc. This in turns helps an enterprise discover new streams of revenue, provide better customer service, improve processes, etc. Tools that analyze such large data sets are known as Big data analytics applications.

Hadoop and Big Data

Hadoop is one such big data analytics application. A quick look at its history

Authors: Doug Cutting, Mike Cafarella

Developers: Apache Software Foundation

First Release: April 1st, 2006

Current Release: 3.3.0

Hadoop is a framework for the distributed processing of large data sets across clusters of computers using simple programming models. Whether it’s a single storage server or hundreds of machines across location, every node offers local computation and storage. With this parallel processing happening across hundreds of nodes which are distributed, data processing is faster than conventional databases. So basically rather than store all the data in a single or limited number of datasets, Hadoop clusters multiple nodes which can process even petabytes of data quickly

Hadoop Modules

Hadoop Distributed File System — HDFS is a file system that’s designed for fault tolerance and for deployment to the lowest end of hardware as well. Since Hadoop works with Big data, HDFS file systems are tuned for large data files. A major plus point to HDFS is that it provides mechanisms for applications to perform the computation on large data files where they are located instead of sending these large files to the application

Yet Another Resource Negotiator (YARN) — YARN is to applications what Houston is to Nasa. The control center!. Yarn manages nodes, cluster nodes, job status, scheduling, tasks. It also does extensive monitoring and reports.

MapReduce — Hadoop supports and uses replicated data. There are a number of benefits of this. Such as the availability of data at any time, fault tolerance etc. But data replication can lead to redundancy as well. This is solved by MapReduce. The “Map” in MapReduce job filters and sorts the input data-set into independent chunks which are processed by the map tasks parallely. Once mapping is done, reducing steps in and performs a aggregate or summary function to flush out any redundant data

And finally across the above 3 models and other models in Hadoop, Hadoop common libraries exist which can be used by all modules

Hadoop vs Conventional Database

Actually, Hadoop is not a database, so this comparison might feel like apples and oranges, but Hadoop as we saw is an ecosystem for distributed storage and thus has some similarities with database

Hadoop is ideally used for applications dealing with petabytes of data RDBMS are a good fit for applications with GigaBytes of data

RDBMS are mainly used for structured data. Well defined schemas and data definitions are mandatory for RDBMS. Hadoop works well with structured and unstructured data

Since Hadoop works mostly with semi-structured or unstructured data, it does not need SQL. Hadoop applications do support SQL as well in various lite versions, but it primarily uses HQL or Hive Query Language. Which is particularly good for Metastore. RDBMS use SQL

Hadoop is fully open source, whilst most RDBMS are licensed

Data and Data objects are stored as key-value pairs in Hadoop whereas as the name suggests in RDBMS, data objects are relational

Hadoop is well suited for data types like Audio, Video, Images, etc. RDBMS goes well with relational data such as commonly found in OLTP

Hadoop Ecosystem

Hadoop is not a single library or tool or framework which handles big data. It is essentially a platform or a suite or an ecosystem which provides tools, frameworks, libraries, models, objects to solve, analyse, gleam insight from Big data. The four main modules of Hadoop are listed above. In addition the main components of an Hadoop ecosystem are

Data Storage is a collection of hardware, commodity hardware which are known as nodes or clusters in the Hadoop Ecosystem

Data Processing involves computational techniques which processes the data stored in the data storage. This is built for optimization

Data access tools helps applications to write queries, execute commands on the data sets returned by Data processing tools

Needless to say, the Hadoop Ecosystem is constantly evolving and new tools keep getting added every day. What we have listed above are some of the popular ones


Hadoop framework is written in the Java programming language, with some native code in C and command-line utilities written as shell scripts.

The MapReduce model can be implemented in any language for any application that uses Hadoop.

Hadoop can be implemented on-premise or on the cloud.

More than half of the fortune 50 companies use Hadoop


Data is only going to increase. With more applications coming up every day, data growth is one-way traffic. In such an environment, with the proven benefits exhibited by Hadoop, enterprises can and should start thinking about the virtues of Big data for their organizations. With the ability to host Hadoop on clouds, enterprises can get the best of both worlds, cloud storage and the power of Big data

Monday 5 October 2020

Enterprise Digital Transformation Guide in the Post CovidWorld

 Business Transformation

Uncertainty is the new normal today, with no clear end as well. Businesses, small and large, have been besieged with unprecedented challenges. Any business is constantly in a state of improvement, upgrade and transition. This is necessitated primarily by the current pandemic, then changing market economics, customer preferences, government policies etc. Business transformation is inevitable. It can be defined as changes or modification to business operations necessitated by the need to improve or due to external factors such as Covid. Digital transformation is one of the key steps in Business transformation.

What is digital transformation?

Is it simply the adaptation of technology in an organisation?

Using software, tools and other systems to replace manual work?

Does adding more softwares and state of the art hardware transform a business digitally?

Far From it! Adapting technology makes the job easier for sure, but digital transformation is an entirely different subject.

Digital transformation can be defined as a time bound, holistic change in the business culture which involves people, process and physical aspects, driven by technology. It is a mindset that needs to be adopted by a business. To break it down, Digital transformation redefines a business operation, how employees work, how management perceives data and makes decisions. It alters the way of working in a more systematic and automated way with the singular goal of improving the business at all target levels. Digital transformation is not simply adding technology. Like replacing a software with another. Its planning, analysing, conducting & supporting business operations via technology. IDC has predicted that digital transformation will take up 50% of IT budgets by 2023

Need for Digital Transformation

Before we start on defining the need for digital transformation, why does an organisation need digital transformation? Is it mandatory? The answer to that is simply where does the business see itself in a few years down the line? If the goal is to scale, constantly get better, outrank the competition, diversify etc, then digital transformation is a must. Customer experience is being driven by digital mediums at a rapid pace. Cheaper hardware, faster data speed also drive the need for digital transformation. In the post covid world, uncertainty is written large. No one knows for sure if the old ways of working and doing business will return. Take for example RPA or Robotic Process Automation. Gartner has predicted the spending on RPA services to grow to $2.4 billion by 2022

Core Changes across business areas

  1. Employee upskill
  2. Business Process
  3. Client and stakeholders empowerment
  4. Partners, vendors
  5. Business functions
  6. Assets(Moveable and Immoveable)

Digital transformation helps customers, employees and management get information as and when they require. Frictionless experience goes a long way in retaining customers

With information available all the time, efficiency is achieved in process, output and decision making

Digital transformations biggest plus point is the data driven feedback which is priceless when it comes to planning. In a digital work environment, companies can pinpoint which areas, depts are bleeding, which are thriving, which need optimisation

Ease of operations invariably drive lower costs and improved bottom lines

Tools and Frameworks

Digital transformation is applied across a wide array of business functions such as sales, marketing, HR, administration and operations via

Cloud Computing for data storage

Big Data tools to mine data for analytics

Mobility to give customers omni channel experience

Mobility to give field staff best access to resources and sales

CRM tools and process for operations and administration

Digital Road Map

Any organisation that wants to radically change its course of working with Digital Transformation needs to have a Digital road map. This should be time bound and it should clearly list out the aims and objectives of the company in a short period of time frame and long period of time frame.

Prepare a team and leadership that defines the current state of capabilities and builds a road map for the transition. This will be based on company values, shared goals, budget and objectives. List out the benefits, list out the goals, list out the time and levels of transformation desired

Across verticals, identify opportunities, bottlenecks, pivot points and evaluate. Ask questions in this stage. For e.x In a delivery department, try identifying the most time consuming activity for delivery. Or in a marketing department, what’s the average time taken to respond a inquiry email

Identify vendors, evaluate solutions, measure the pros and cons of solutions. Make a comparative study and evaluate which is better. Compare over a large time frame and wide data set

Implement tech driven solutions like automation, RPA, cloud based backup etc at the basic level. Train the employees and field workers on how to adapt to this new mode. Take feedback and incorporate

Update processes to Agile on a per project basis. At each project constantly evaluate the time and efficiency of execution.

Digital transformation for the future

The UK Govt has a beautiful slogan Digital by Default which summarises the importance of digital service consumption whether private or public in an increasing digital world. As mentioned above, Digital transformation is not limited to mere adoption of technology in the existing work culture of the organisation. It is a change in the mindset of the organisation in an holistic manner across functions and departments

Monday 28 September 2020

A Quick Guide to CI/CD - CodeCraft


CI/CD is a central part of software engineering. It is a well defined, automated life cycle for code integration and delivery to stakeholders(QA’s, Customers, Business heads etc). Central to this is automation. CI/CD or the lifecycle is built on automation. Software development teams usually consist of people with diverse skill sets. Such as engineering, quality, business associate, CxOs, designers, administrators etc. CI/CD as a discipline helps these diverse groups to communicate effectively by continuously integrating and delivering. For e.g.

Are Design artefacts available in the code repository?

Has the latest feature merge has broken the existing flow?

Is there an update or a new deliverable?

By automating the entire process of building, integrating and testing, bugs are detected earlier and fixed, delivery is faster and inevitably, quality is improved.

Everything is part of a pipeline

As we saw above CI/CD is a series of activities that are automated. A pipeline is simply a group of such activities or events which have well built rules. The “jobs” in a pipeline are executed one after another or if there is no dependency, based on requirement, in parallel. In Fact it’s recommended to speed up the build, test, feedback loop, and to speed up deployments, by running parallel pipelines. Each job/event in a pipeline can generate output which will serve as an input to the next job. They can be triggered because of an external event. For ex A job in a pipeline can consist of the following events

Start a build environment for executing the build script

Link to necessary libraries and dependencies

Run the test suite on the desired environment

If successfully built, notify or deploy depending on result

If failed to build, notify and take appropriate action

This external event that runs in the pipeline is triggered by a simple code commit. Note that if the output of one stage fails, the entire process fails, thereby catching errors right at the source.

Every time a commit happens, if all goes well, there is a deployable version of the product.

Errors are spotted at the source and hence fixed easily.

Production deployments are almost certain free from broken code

No manual intervention, hence less scope of missing out on errors. Repeatable process

Role of automation

CI/CD is built on automation. Every job or task can be configured to run or work on a particular environment or with a certain set of variables. This is known as declarative configuration. Automation is mandatory for testing a build. There are lots of tools which integrate with various build systems and provide comprehensive coverage. Because of automation, every engineering integration passes through a reliable and comprehensive suite of test cases. These are incrementally added/updated. Hence errors and faults are far and few in between. Even when it happens, automation steps in and informs ASAP via logs, alerts, mailers. All pre configured

CI/CD with Agile

In a dynamic project with changing requirements, and reduced time to market, CI/CD is the perfect foil for the Agile methodology. Agile is built on continuous feedback, rapid prototyping, self organising teams and fast development/deployment. CI/CD helps the agile team by removing hindrances to communication, automating deployment and testing and emphasis on cross functional compatibility. CI/CD in spirit is about the spirit of cooperation across teams to deliver a winning product. This is a perfect companion to agile’s focus on process improvement while accelerating delivery

DevOps Culture + Agile Mindset

CI/CD is not limited to a set of automated processes to speed up delivery and improve quality. It aims to be the backbone of DevOps culture. In association with Agile process and CI/CD, DevOps bridges the gaps, complexity and differences between engineering and operations. DevOps strives to increase cross functional collaboration by sharing responsibilities and increasing communication. By defining what needs to be achieved and how it should be achieved via CI/CD, DevOps encourages working together


There are tons of free, premium and freemium tools for all CI/CD tasks like automating builds, test suites, code scans. Some of the most popular tools we use at CodeCraft for our customers and in house development are

Also, check out sites like and

where new and latest DevOps related tools are listed every day. It also highlights the trending topics in the DevOps landscape


CI/CD is no longer or rather was never a buzz word which enterprises used to throw around. It is a proven way of developing, testing, and deploying software faster. All the while keeping the focus firmly on Code quality. It syncs well with any existing software development methodology, although it’s always better to use it with an Agile process.

Thursday 24 September 2020

How to remain productivity working from home


What exactly is the workplace?

Historically, a workplace was a physical location where a group of people, contractually bound, assembled to perform a set of tasks in return for something. Like a factory, a tannery, a wood cutting mill, etc Workplaces also had or provided tools, resources, machinery, and instruments that would be required to perform the tasks. Then came the digital era, or the start of computers and the internet. In this era, a new wave of jobs was created which redefined the workplace as we knew from the previous era. Instead of physical labor and the use of huge complex machinery, work began to be accomplished behind desks and desktops, laptops. Moore’s law captured this evolution very well. As electronic devices became smaller and yet more effective, digitalization revolutionized every aspect of our lives. Workplace as well.

The workplace was always defined as essential for getting things done. It enforced accountability, helped in focussing, made essentials for working available, and all in all enabled productivity. But as we saw above, the evolution of the workforce, made organizations rethink this policy of enforcing workplace as the only way or rather place of working

Remote Working is a mode of working which allows the professional to work from any location. Usually, flexible timings for work are closely associated with remote working. There are variations of remote working as well such as working from home. And the definition of a remote worker may change depending on their employment status. i.e Are they a full-time employee or contractual employee. And irrespective of what kind of employee they are, how can a remote worker ensure the same level of productivity.

Let’s find out…

COVID and the New Normal

The COVID-19 pandemic has completely altered the meaning of the workplace. Particularly in the information technology sector which is almost 100% digitized. Earlier where companies would allow and authorize remote working in varying degrees, based on factors like requirements, logistics, security, etc, companies are now forced to come up with frameworks for allowing remote working as much as possible to keep the wheels moving. It is also becoming apparent that this is going to be the new normal for a long time and hence remote working and work from home is no longer going to be an employee privilege but rather a way of life

Remote Working vs Work from Home

Remote working is a working system where an employee is not mandated by contract to be present in a physical, authorized location of the employer. The reason for such an arrangement could be many. E.g. A global workforce. An employer in India hires a consultant in the US. In the internet age, proximity and information is just a click away.

Work from Home can be considered as an arrangement where an employee is entitled to work from home subject to certain restrictions and rules. Work from Home makes an exception to the requirement that an employee should be physically present as and when required under certain circumstances.

Drivers of Remote Working

Let’s explore what are the drivers of remote working.

Why does somebody work remotely in the first place?

  • Happier Employee — More Productivity
  • Technology
  • Current Norm
  • Reduced Cost
  • Social Benefits

Happier Employee — More Productivity

Allowing an employee to work in a location where he is more comfortable i.e his home or any other preferred location, puts him in a happier and calmer mindset. This invariably leads to a more productive employee. Plus the support extended by the organisation to the employee in letting him work from home or remotely builds loyalty in the minds of the employee toward the organisation


Employees work from home; simply because they can! Technology is a great enabler and distances and physical presence no longer matter in many sectors. This offers the additional benefit to the employer of having access to the employee, company data at any point of time. Video discussions, conference calls are already used in organisations which have teams spread across the globe. Remote working/Work from home simply extends those technologies locally

Current Norm

Companies around the globe are always trying to improve their employee engagement program to make them more productive. With modern lifestyle, the pressures of work can affect the work-life balance. Hence to provide some balance to the employee when it comes to their personal lives, companies have accepted WFH or remote working as a norm

Reduced Cost

Having an employee working remotely or working from home, has measurable financial benefits for the company. Such as savings on office space, physical infrastructure, transportation, conveyance allowance etc

Social Benefits

In addition to the benefits to both the employee and employer, there are lots of social benefits associated with remote working and working from home. Reduced traffic, reduced noise, reduced environmental impact associated with pollution generation, reduced spending in commute, etc

Remote working best practice

Now that we have a good idea about what remote working is and we have seen the benefits of the same, let’s have a look at how we can maximise our productivity working from home


Set up the environment

Proactive with potential Distractions

Take Scheduled Breaks

Call Over Chat

Solicit Feedback

1. Reachability

When an employee works from home, the most important thing is his reachability. It is the responsibility of the employee to inform his team, subordinates, superiors when is he available, when is he on break and when will he log off. Do not make colleagues ping you multiple times waiting for your response while you have switched off. Stick to the normal work hours. Be available on the authorized communication channels

2. Set up the environment

An employee must set up an environment which is similar to his work environment. This is not limited to tools and technology.It extends to mundane things like tables, chairs, settings etc. Ensure that all the authorised, required tools, permissions, configurations, provisioning certificates are available and working on your system before you start. Similarly have a physical environment similar to your office setup. Dress for the occasion, have the same environment and desk setup so you will be focussed on the work like the workplace

3. Proactive with potential distractions

Working from home has more potential for distractions. It might be amusing and cute to see your cat dart across the keyboard, but when you are in the middle of a serious discussion over a video call, it can appear awkward. Home delivery, a family member popping in to ask something, kids throwing tantrums and a million and one awkward scenarios like that. Avoid all this by defining a boundary. Time and space wise. Let your family know you are unavailable between these timings. Keep the door shut if you will. Do whatever is required based on your home setup, but ensure that personal distractions don’t impede your work

4. Take scheduled breaks

It is easy to be “always-on” when you are working from home. But just like the way someone takes regular breaks in their workplace, schedule breaks when you are working at home as well.

Just ensure that your team members are aware. A simple status update on your communication channel which indicates that you are away would be a good example. Similarly, ensure non-work tasks are scheduled at a fixed time and it does not interfere with your working hours.

5. Calls over chat

In a working environment, help or clarification or answer to problems is just a desk away. You might swing by your colleague’s desk to get some clarification. But when you are working remotely and you are connected with your colleagues virtually, discussions and clarifications might get difficult over chats. Do not hesitate to reach out to your team to discuss any contentious issue over the phone when you find yourself going round in circles with chat-based discussions

6. Solicit feedback

Periodically talk to your team. Your subordinates, your superiors about how they are finding your current setup.

Are you easily accessible?

Are you properly audible?

Do they have any suggestions on how you can be more productive when working from home?

This goes a long way in enforcing your commitment and professionalism when it comes to working from home

What to avoid

Casual Nature

Non-Adherence to Time

Not Informing in Advance

Not Being Prepared

1. Casual nature

Remote working or work from home is a privilege or a grant to accomplish your work better while balancing other priorities. The time you are working from home is accountable to the employee for the work you are supposed to deliver. Often, the comfortable and familiar environment of homes can induce a sense of casualness in an employee. Work from home is nothing but simply working from home. Maintain the same level of professionalism as in the workplace

2. Non-adherence to time

Adherence to time is mandatory when it comes to working from home. Since an employee is not physically present, it becomes imperative that the same person maintains time when it comes to meetings, calls, discussion, etc.

3. Not informing in advance

In the event of an employee being dragged into some personal work, it is necessary to intimate the team members about any nonavailability. If the user is stepping out for a while, he/she should notify their subordinates, superiors about the same. Similarly, if someone is on leave, ensure that the connected people are informed in advance

4. Not being prepared

Do not show up to a meeting with a faulty microphone or webcam and fix things during call. Do dry runs of every interaction to make sure everything works as expected. Being prepared with no logistical issues is necessary because its professional, and respectful of others time

Challenges of Remote Working

Tech Availability

More Prone to Distraction

Difficulty in Switching off

Need to over-communicate and Reachability

Tech availability

A workplace is equipped with the latest and the required tools, hardware, and software to perform one’s work effectively. Having access to the same kind of setup might be difficult in a remote location. This leads to ineffective substitute which might lead to lower productivity depending on the substitute

More prone to distraction

A work culture that is centered around focus helps an employee to work efficiently in the office. But at home or remotely an employees attention is more prone to get distracted with mundane tasks not directly related to work

Difficulty in switching off

Work from home creates an “Alway On” expectation. People take it for granted that since someone is working from home they are going to be always available. It is essential to take regular breaks and draw a definite boundary between personal and professional time.

Need to over-communicate and reachability

No matter how much technology simplifies communication and accessibility, nothing beats a personal touch when it comes to discussing, collaborating, and communicating. In a connected world, due to lack of understanding, time difference, cultural differences, etc, remote working or work from home might necessitate over communication to simplify matters


Remote working is a system that was an effective, additional way to do work before the COVID pandemic. But life as we know has completely changed. What was a good thing previously is now a necessity and might become the way of life in the future. If performed in the right manner with diligence and care, working from home and remote working is as effective as working at the workplace; if not better. Hence each person must follow a certain code, a certain personal ethic, and professionalism to make this successful and to handle a part of this crisis.

Happy Working