Round-trip Data Enrichment between Teradata and Hadoop
February 18th, 2014
Hadoop can be a great complement to existing data warehouse platforms, such as Teradata, as it naturally helps to address two key storage challenges:
Managing large volumes of historical or archival data.
Handling data from non-standard or un-structured sources
The purpose of this article is to detail some of the key integration points and to show how data can be easily exchanged for enrichment between the two platforms.
What Do Hadoop and the Avengers Have in Common?
February 16th, 2014
Ever since I was a kid, I’ve used memorable movie quotes to help people understand a key point in a way that lightens the mood and generates some laughs. If you’re going to work hard, you gotta have fun, right???
“Don’t make me angry… you wouldn’t like me when I’m angry”
The big data market is rife with aspirational marketing misinformation, which among other things causes customer confusion, slows the path to value, and frankly, makes me a little angry.
At Hortonworks, we are maniacally focused on innovating Enterprise Hadoop in the open and enabling our customers and partners to unlock the broadest opportunities that Hadoop has to offer.
Actian, YARN and a Modern Data Architecture with Hadoop
February 14th, 2014
With the growing number of large-scale enterprise deployments of big data, certain limitations have become more apparent bringing to light some weaknesses in this first phase of analytics infrastructures. Hadoop, clearly a very valuable tool for the collection of unstructured data, poses some challenges that need to be overcome for wide spread successful enterprise adoption.
In our upcoming webinar on Tuesday Feb 19 at 10 am PT, we will address these issues and highlight how to solve them using Hortonworks Data Platform and our partner Actian.
One of the challenges is the growing number of server clusters in the datacenter as data collection grows exponentially. Here’s where YARN (Yet Another Resource Negotiator) helps to address some of these challenges and essentially leapfrogs MapReduce. YARN breaks Hadoop free of MapReduce restrictions and simplifies the delivery of data services. YARN gives users the option to use better, more efficient compute models and is now able to handle extremely large amounts of data with a dramatic increase in performance. YARN also enables multiple applications to run on the Hadoop environment.
How-to: Make Hadoop Accessible via LDAP
February 18, 2014
Hue, the open source Web UI that makes Apache Hadoop easier to use, easily integrates with your corporation’s existing identity management systems and provides authentication mechanisms for SSO providers. So, by changing a few configuration parameters, your employees can start analyzing Big Data in their own browsers under an existing security policy.
In this blog post, you’ll learn details about the various features and capabilities available in Hue for integrating with likely the most popular authentication mechanism, LDAP. (It is also possible to authenticate Hue users via PAM, SPNEGO, OpenID, OAuth, and SAML, but those topics are for another post.)
Getting MapReduce 2 Up to Speed
February 13, 2014
Thanks to the improvements described here, CDH 5 will ship with a version of MapReduce 2 that is just as fast (or faster) than MapReduce 1.
Performance fixes are tiny, easy, and boring, once you know what the problem is. The hard work is in putting your finger on that problem: narrowing, drilling down, and measuring, measuring, measuring.
Apache Hadoop is no exception to this rule. Recently, Cloudera engineers set out to ensure that MapReduce performance in Hadoop 2 (MR2/YARN) is on par with, or better than, MapReduce performance in Hadoop 1 (MR1). Architecturally, MR2 has many performance advantages over MR1:
Better scalability by splitting the JobTracker into the ResourceManager and Application Masters.
Better cluster utilization and higher throughput through finer-grained resource scheduling.
Less tuning required to avoid over-spilling from smarter sort buffer management.
Faster completion times for small jobs through “Uber Application Masters,” which run all of a job’s tasks in a single JVM.