Wednesday, 14 October 2020

Big data with Hadoop

 

Data and Big Data

Tounderstand Hadoop and it’s working, it is necessary to know what Big data is. As the internet grew and with the advent of Web 2.0 & 3.0, demand for data grew. Data collected and shared increased. Particularly with social media and IoT , there is an exponential growth of data stored, transacted and accumulated. Big data is an umbrella term for a large volume of data.

Big Data Analytics

Data is information. Information that is used for planning, for scheduling, forecasting, budgeting, mitigating, etc. The larger the data set, more accurate the information. Hence the importance of Big data. Big data helps an enterprise to uncover information like buying patterns, demographic preferences, trends, correlations, etc. This in turns helps an enterprise discover new streams of revenue, provide better customer service, improve processes, etc. Tools that analyze such large data sets are known as Big data analytics applications.

Hadoop and Big Data

Hadoop is one such big data analytics application. A quick look at its history

Authors: Doug Cutting, Mike Cafarella

Developers: Apache Software Foundation

First Release: April 1st, 2006

Current Release: 3.3.0

Hadoop is a framework for the distributed processing of large data sets across clusters of computers using simple programming models. Whether it’s a single storage server or hundreds of machines across location, every node offers local computation and storage. With this parallel processing happening across hundreds of nodes which are distributed, data processing is faster than conventional databases. So basically rather than store all the data in a single or limited number of datasets, Hadoop clusters multiple nodes which can process even petabytes of data quickly

Hadoop Modules

Hadoop Distributed File System — HDFS is a file system that’s designed for fault tolerance and for deployment to the lowest end of hardware as well. Since Hadoop works with Big data, HDFS file systems are tuned for large data files. A major plus point to HDFS is that it provides mechanisms for applications to perform the computation on large data files where they are located instead of sending these large files to the application

Yet Another Resource Negotiator (YARN) — YARN is to applications what Houston is to Nasa. The control center!. Yarn manages nodes, cluster nodes, job status, scheduling, tasks. It also does extensive monitoring and reports.

MapReduce — Hadoop supports and uses replicated data. There are a number of benefits of this. Such as the availability of data at any time, fault tolerance etc. But data replication can lead to redundancy as well. This is solved by MapReduce. The “Map” in MapReduce job filters and sorts the input data-set into independent chunks which are processed by the map tasks parallely. Once mapping is done, reducing steps in and performs a aggregate or summary function to flush out any redundant data

And finally across the above 3 models and other models in Hadoop, Hadoop common libraries exist which can be used by all modules

Hadoop vs Conventional Database

Actually, Hadoop is not a database, so this comparison might feel like apples and oranges, but Hadoop as we saw is an ecosystem for distributed storage and thus has some similarities with database

Hadoop is ideally used for applications dealing with petabytes of data RDBMS are a good fit for applications with GigaBytes of data

RDBMS are mainly used for structured data. Well defined schemas and data definitions are mandatory for RDBMS. Hadoop works well with structured and unstructured data

Since Hadoop works mostly with semi-structured or unstructured data, it does not need SQL. Hadoop applications do support SQL as well in various lite versions, but it primarily uses HQL or Hive Query Language. Which is particularly good for Metastore. RDBMS use SQL

Hadoop is fully open source, whilst most RDBMS are licensed

Data and Data objects are stored as key-value pairs in Hadoop whereas as the name suggests in RDBMS, data objects are relational

Hadoop is well suited for data types like Audio, Video, Images, etc. RDBMS goes well with relational data such as commonly found in OLTP

Hadoop Ecosystem




Hadoop is not a single library or tool or framework which handles big data. It is essentially a platform or a suite or an ecosystem which provides tools, frameworks, libraries, models, objects to solve, analyse, gleam insight from Big data. The four main modules of Hadoop are listed above. In addition the main components of an Hadoop ecosystem are



Data Storage is a collection of hardware, commodity hardware which are known as nodes or clusters in the Hadoop Ecosystem

Data Processing involves computational techniques which processes the data stored in the data storage. This is built for optimization

Data access tools helps applications to write queries, execute commands on the data sets returned by Data processing tools

Needless to say, the Hadoop Ecosystem is constantly evolving and new tools keep getting added every day. What we have listed above are some of the popular ones

Extras

Hadoop framework is written in the Java programming language, with some native code in C and command-line utilities written as shell scripts.

The MapReduce model can be implemented in any language for any application that uses Hadoop.

Hadoop can be implemented on-premise or on the cloud.

More than half of the fortune 50 companies use Hadoop

https://www.prnewswire.com/news-releases/altiors-altrastar---hadoop-storage-accelerator-and-optimizer-now-certified-on-cdh4-clouderas-distribution-including-apache-hadoop-version-4-183906141.html

Conclusion

Data is only going to increase. With more applications coming up every day, data growth is one-way traffic. In such an environment, with the proven benefits exhibited by Hadoop, enterprises can and should start thinking about the virtues of Big data for their organizations. With the ability to host Hadoop on clouds, enterprises can get the best of both worlds, cloud storage and the power of Big data


Monday, 5 October 2020

Enterprise Digital Transformation Guide in the Post CovidWorld

 Business Transformation

Uncertainty is the new normal today, with no clear end as well. Businesses, small and large, have been besieged with unprecedented challenges. Any business is constantly in a state of improvement, upgrade and transition. This is necessitated primarily by the current pandemic, then changing market economics, customer preferences, government policies etc. Business transformation is inevitable. It can be defined as changes or modification to business operations necessitated by the need to improve or due to external factors such as Covid. Digital transformation is one of the key steps in Business transformation.

What is digital transformation?

Is it simply the adaptation of technology in an organisation?

Using software, tools and other systems to replace manual work?

Does adding more softwares and state of the art hardware transform a business digitally?

Far From it! Adapting technology makes the job easier for sure, but digital transformation is an entirely different subject.

Digital transformation can be defined as a time bound, holistic change in the business culture which involves people, process and physical aspects, driven by technology. It is a mindset that needs to be adopted by a business. To break it down, Digital transformation redefines a business operation, how employees work, how management perceives data and makes decisions. It alters the way of working in a more systematic and automated way with the singular goal of improving the business at all target levels. Digital transformation is not simply adding technology. Like replacing a software with another. Its planning, analysing, conducting & supporting business operations via technology. IDC has predicted that digital transformation will take up 50% of IT budgets by 2023

Need for Digital Transformation

Before we start on defining the need for digital transformation, why does an organisation need digital transformation? Is it mandatory? The answer to that is simply where does the business see itself in a few years down the line? If the goal is to scale, constantly get better, outrank the competition, diversify etc, then digital transformation is a must. Customer experience is being driven by digital mediums at a rapid pace. Cheaper hardware, faster data speed also drive the need for digital transformation. In the post covid world, uncertainty is written large. No one knows for sure if the old ways of working and doing business will return. Take for example RPA or Robotic Process Automation. Gartner has predicted the spending on RPA services to grow to $2.4 billion by 2022

Core Changes across business areas




  1. Employee upskill
  2. Business Process
  3. Client and stakeholders empowerment
  4. Partners, vendors
  5. Business functions
  6. Assets(Moveable and Immoveable)

Digital transformation helps customers, employees and management get information as and when they require. Frictionless experience goes a long way in retaining customers

With information available all the time, efficiency is achieved in process, output and decision making

Digital transformations biggest plus point is the data driven feedback which is priceless when it comes to planning. In a digital work environment, companies can pinpoint which areas, depts are bleeding, which are thriving, which need optimisation

Ease of operations invariably drive lower costs and improved bottom lines

Tools and Frameworks

Digital transformation is applied across a wide array of business functions such as sales, marketing, HR, administration and operations via

Cloud Computing for data storage

Big Data tools to mine data for analytics

Mobility to give customers omni channel experience

Mobility to give field staff best access to resources and sales

CRM tools and process for operations and administration

Digital Road Map

Any organisation that wants to radically change its course of working with Digital Transformation needs to have a Digital road map. This should be time bound and it should clearly list out the aims and objectives of the company in a short period of time frame and long period of time frame.

Prepare a team and leadership that defines the current state of capabilities and builds a road map for the transition. This will be based on company values, shared goals, budget and objectives. List out the benefits, list out the goals, list out the time and levels of transformation desired

Across verticals, identify opportunities, bottlenecks, pivot points and evaluate. Ask questions in this stage. For e.x In a delivery department, try identifying the most time consuming activity for delivery. Or in a marketing department, what’s the average time taken to respond a inquiry email

Identify vendors, evaluate solutions, measure the pros and cons of solutions. Make a comparative study and evaluate which is better. Compare over a large time frame and wide data set

Implement tech driven solutions like automation, RPA, cloud based backup etc at the basic level. Train the employees and field workers on how to adapt to this new mode. Take feedback and incorporate

Update processes to Agile on a per project basis. At each project constantly evaluate the time and efficiency of execution.

Digital transformation for the future

The UK Govt has a beautiful slogan Digital by Default which summarises the importance of digital service consumption whether private or public in an increasing digital world. As mentioned above, Digital transformation is not limited to mere adoption of technology in the existing work culture of the organisation. It is a change in the mindset of the organisation in an holistic manner across functions and departments