Galaxy Consulting
  • Home
  • About Us
    • Our Process
    • Meet Us at Industry Events
  • Services
    • Business Analysis and Usability
    • Content and Knowledge Management
    • Records Management
    • Information Architecture
    • Enterprise Search
    • Taxonomy and Metadata Development and Management
    • Document Control
    • Information Governance
  • Solutions
    • Information Overload
    • Compliance
    • E-Discovery
    • Internal and External Websites
    • Enterprise Search
    • Collaboration and New Employees’ Onboarding
    • Customer Service
    • Manual Processes
    • Vulnerability of Sensitive Information
  • Portfolio
    • Our Brochure
    • Our Clients
    • Case Studies
    • Presentations
    • Press Releases >
      • Galaxy Consulting Receives 2016 Best of Redwood City Award
      • Galaxy Consulting Receives 2015 Best of Redwood City Award
    • Videos
  • Testimonials
  • Blog
  • Free Consultation
  • Contact Us
  • Terms of Use/Privacy Policy

Hadoop Adoption

6/12/2016

0 Comments

 
Picture
True to its iconic logo, Hadoop is still very much the elephant in the room. Many organizations heard of it, yet relatively few can say they have a firm grasp on what the technology can do for their business, and even fewer have actually implemented it successfully at their organization.

Forrester Research predicted that Hadoop will become a cornerstone of the business technology agenda at most organizations. 

Scalability, affordability, and flexibility make Hadoop uniquely suited to change the big data scene. An open-source software framework, Hadoop allows for the processing of big data sets across clusters on commodity hardware either on-premises or in the cloud. 

At roughly one-thirtieth the cost of traditional data storage and processing, Hadoop makes it realistic and cost effective to analyze all data instead of just a data sample. Its open-source architecture enables data scientists and developers to build on top of it to form customized connectors or integrations.

Typically, data analysis requires some level of data preparation, such as data cleansing and eliminating errors, outside of traditional data warehouses. Once the data is prepared, it is transferred to a high-performance analytics tool, such as a Teradata data warehouse. With data stored in Hadoop, however, users can see "instant ROI" by moving the data workloads off of Teradata and running analytics right where the data resides.

Other use of Hadoop is for live archiving. Instead of backing up data and storing it in a data recovery system, such as Iron Mountain, users can store everything in Hadoop and easily pull it up whenever necessary.

The greatest power of Hadoop lies in its ability to house and process data that couldn't be analyzed in the past due to its volume and unstructured form. Hadoop can parse emails and other unstructured feedback to reveal similar insight.

The sheer volume of data that businesses can store on Hadoop changes the level of analytics and insight that users can expect. Because it allows users to analyze all data and not just a segment or sample, the results can better anticipate customer engagement. Hadoop is surpassing model analytics that can describe certain patterns and is now delivering full data set analytics that can predict future behavior.

There are few challenges.

Hadoop's ability to process massive amounts of data, for example, is both a blessing and a curse. Because it's designed to handle large data loads relatively quickly, the system runs in batch mode, meaning it processes massive amounts of data at once, rather than looking at smaller segments in real time. As a result, the system often forces users to choose between quantity and quality. At this point in Hadoop's life cycle, the focus is more on enormous data size than high-performance analytics.

Because of the large size of the data sets fed into Hadoop, the number-crunching doesn't take place in real time. This is problematic because as the time between when you input the data and the time at which you have to make a decision based on that data grows, the effectiveness of that data decreases.

The biggest problem of all is that Hadoop's seeming boundlessness instills a proclivity for data exploration in those who use it. Relying on Hadoop to deliver all the answers without asking the right questions is inefficient.

As companies begin to recognize Hadoop's potential, demand is increasing, and vendors are actively developing solutions that promise to painlessly transfer data onto Hadoop, improve its processing performance, and operationalize data to make it more business-ready.

Big data integration vendor Talend, for example, offers solutions that help organizations transition their data onto Hadoop in high volume. The company works with more than 800 connectors that link up to other data systems and warehouses to "pull data out, put it into Hadoop, and transform it into a shape that you can run analytics on.

While solutions such as those offered by Talend make the Hadoop migration more manageable for companies, vendors such as MapR tackle the batch-processing lag. MapR developed a solution that enhances the Hadoop data platform to make it behave like enterprise storage. It enables Hadoop to be accessed as easily as network-attached storage is accessed through the network file system; this means faster data management and system administration without having to move any data. 

Veteran data solution vendors such as Oracle are innovating as well, developing platforms that make Hadoop easier to use and to incorporate into existing data infrastructures. Its latest updates revolved around allowing users to store and analyze structured and unstructured data together and giving users a set of tools to visualize data and find data patterns or problems.

RapidMiner's approach to Hadoop has been to simplify it, eliminate the need for end users to code, and do for Hadoop analytics what Wordpress did for Web site building. Once usable insights are collected, RapidMiner can connect the data platform to a marketing automation system or other digital experience management system to deploy campaigns or make changes based on data predictions.

Moving forward, analysts predict that leveraging Hadoop's potential will become a more attainable goal for companies. Because it's open-source, the possibilities are vast. Hadoop's ability to connect openly to other systems and solutions will increase adoption in the coming months and years.

0 Comments

Hadoop and Big Data

7/22/2014

1 Comment

 
Picture
During last ten years the volume and diversity of digital information grew at unprecedented rates. Amount of information is doubling every 18 months, and unstructured information volumes grow six times faster than structured.

Big data is the nowadays trend. Big data has been defined as data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time.

Hadoop was created in 2005 by Doug Cutting and Mike Cafarella to address the big data issue. Doug Cutting named it after his son's toy elephant It was originally developed for the Nutch search engine project. Nutch is an effort to build an open source web search engine based on Lucene and Java for the search and index component.

As of 2013, Hadoop adoption is widespread. A number of companies offer commercial implementations or support for Hadoop. For example, more than half of the Fortune 50 use Hadoop. Hadoop is designed to process terabytes and even petabytes of unstructured and structured data. It breaks large workloads into smaller data blocks that are distributed across a cluster of commodity hardware for faster processing.

Ventana Research, a benchmark research and advisory services firm published the results of its groundbreaking survey on enterprise adoption of Hadoop to manage big data. According to this survey:
  • More than one-half (54%) of organizations surveyed are using or considering Hadoop for large-scale data processing needs.
  • More than twice as many Hadoop users report being able to create new products and services and enjoy costs savings beyond those using other platforms; over 82% benefit from faster analysis and better utilization of computing resources.
  • 87% of Hadoop users are performing or planning new types of analysis with large scale data.
  • 94% of Hadoop users perform analytics on large volumes of data not possible before; 88% analyze data in greater detail; while 82% can now retain more of their data.
  • 63% of organizations use Hadoop in particular to work with unstructured data such as logs and event data.
  • More than two-thirds of Hadoop users perform advanced analysis such as data mining or algorithm development and testing.
Today, Hadoop is being used as a:
  • Staging layer: the most common use of Hadoop in enterprise environments is as “Hadoop ETL” — pre-processing, filtering, and transforming vast quantities of semi-structured and unstructured data for loading into a data warehouse.
  • Event analytics layer: large-scale log processing of event data: call records, behavioral analysis, social network analysis, clickstream data, etc.
  • Content analytics layer: next-best action, customer experience optimization, social media analytics. MapReduce provides the abstraction layer for integrating content analytics with more traditional forms of advanced analysis.
  • Most existing vendors in the data warehousing space have announced integrations between their products and Hadoop/MapReduce.
Hadoop is particularly useful when:
  • Complex information processing is needed.
  • Unstructured data needs to be turned into structured data.
  • Queries can’t be reasonably expressed using SQL.
  • Heavily recursive algorithms.
  • Complex but parallelizable algorithms needed, such as geo-spatial analysis or genome sequencing.
  • Machine learning.
  • Data sets are too large to fit into database RAM, discs, or require too many cores (10’s of TB up to PB).
  • Data value does not justify expense of constant real-time availability, such as archives or special interest information, which can be moved to Hadoop and remain available at lower cost.
  • Results are not needed in real time.
  • Fault tolerance is critical.
  • Significant custom coding would be required to handle job scheduling.

Does Hadoop and Big Data Solve All Our Data Problems?

Hadoop provides a new, complementary approach to traditional data warehousing that helps deliver on some of the most difficult challenges of enterprise data warehouses. Of course, it’s not a panacea, but by making it easier to gather and analyze data, it may help move the spotlight away from the technology towards the more important limitations on today’s business intelligence efforts: information culture and the limited ability of many people to actually use information to make the right decisions.

1 Comment

    Archives

    April 2022
    March 2022
    January 2022
    July 2021
    May 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    July 2020
    April 2020
    March 2020
    December 2019
    November 2019
    September 2019
    August 2019
    July 2019
    May 2019
    March 2019
    February 2019
    January 2019
    December 2018
    October 2018
    July 2018
    June 2018
    May 2018
    March 2018
    February 2018
    January 2018
    December 2017
    September 2017
    July 2017
    June 2017
    May 2017
    April 2017
    January 2017
    December 2016
    November 2016
    September 2016
    July 2016
    June 2016
    May 2016
    April 2016
    March 2016
    February 2016
    January 2016
    December 2015
    November 2015
    October 2015
    September 2015
    July 2015
    June 2015
    May 2015
    April 2015
    March 2015
    February 2015
    January 2015
    December 2014
    November 2014
    October 2014
    September 2014
    August 2014
    July 2014
    June 2014
    May 2014
    April 2014
    March 2014
    February 2014
    January 2014
    December 2013
    November 2013
    October 2013
    September 2013
    July 2013
    June 2013
    May 2013
    April 2013
    March 2013
    February 2013
    January 2013
    December 2012
    November 2012
    October 2012
    September 2012
    August 2012
    July 2012
    June 2012
    May 2012
    April 2012
    March 2012
    February 2012
    January 2012
    December 2011
    November 2011

    Categories

    All
    Alfresco
    Arena
    Automatic Classification
    Autonomy
    Big Data
    Business Analysis
    Case Studies
    Change Control
    Change Management
    Cloud Content Management
    Cloud Ecm
    Cloud Enterprise Content Management
    Cms
    Collaboration
    Compliance
    Concept Searching
    Confluence
    Content Analysis
    Content Localization
    Content Management
    Content Management Systems
    Content Strategy
    Controlled Vocabulary
    Coveo
    Crisis Management
    Dams
    Data Integrity
    Data Security
    Digital Asset Management
    Digital Asset Management System
    Digital Transformation
    Dita
    Document Control
    Document Control Systems
    Documents Management
    Documentum
    Drupal
    Dublin Core Metadata
    Ecm
    E Discovery
    Engineering Change Process
    Enterprise Content Management
    Enterprise Search
    ERoom
    E-Signature
    Exalead
    Fatwire
    Gamification
    Gmp
    Gxp
    Hadoop
    Information Architecture
    Information Governance
    Information Overload
    Information Technology
    Iso 9001
    IT Systems Validation
    Joomla
    Knowledge Management
    Knowledge Management Applications
    Metadata
    Mobile Devices
    Naming Conventions
    Ontology
    Open Source Cms
    Open Text
    Oracle
    OWL
    Personalization
    RDF
    Records Management
    Risk
    Search Applications
    Self Service
    SEO
    Sharepoint
    Social Media
    Structured Content
    Taxonomy
    Teamsite
    Thesaurus
    Tridion
    Twiki
    Unified Data
    Usability
    User Adoption
    User Centered Design
    Vasont
    Vivisimo
    Web Site Content
    Web Site Design
    Wiki

    RSS Feed

Powered by Create your own unique website with customizable templates.