Blog Jan14

Marvel in elasticsearch
January 29, 2014
Marvel is a plugin for Elasticsearch that hooks into the heart of Elasticsearch clusters and immediately starts shipping statistics and change events. By default, these events are stored in the very same Elasticsearch cluster. However, you can send them to any other Elasticsearch cluster of your choice.

Once data is extracted and stored, the second aspect of Marvel kicks in – a set of dedicated dashboards built specifically to give both a clear overview of cluster status and to supply the tools needed to deep dive into the darkest corners of Elasticsearch.
Read more

Why Did the Pig Cross the Join? (Part 4)

January 29, 2014
Extending Apache Accumulo Support in Hadoop with Hortonworks HDP and Sqrrl
January 30th, 2014
Apache Accumulo is gaining momentum in markets such as government, financial services and health care for its enhanced security and performance. Hortonworks has a long history with this technology and has multiple committers to the Accumulo project on staff – at least one of whom literally helped to write the book on Accumulo. This has enabled Hortonworks to provide enterprise support for Accumulo within the Hortonworks Data Platform for some time now. For those interested, more specifics can be found in our support datasheet.

Since many users have very advanced requirements when working with Accumulo, we often work closely with Sqrrl, who have built extensions to Accumulo adding enterprise-grade functionality that have wide appeal. Here’s what Ely Kahn, vp of business development at Sqrrl has to say.

Read more

How-to: Create a Simple Hadoop Cluster with VirtualBox
January 28, 2014
I wanted to get familiar with the big data world, and decided to test Hadoop. Initially, I used Cloudera’s pre-built virtual machine with its full Apache Hadoop suite pre-configured (called Cloudera QuickStart VM), and gave it a try. It was a really interesting and informative experience. The QuickStart VM is fully functional and you can test many Hadoop services, even though it is running as a single-node cluster.

I wondered what it would take to install a small four-node cluster…

I did some research and I found this excellent video on YouTube presenting a step by step explanation on how to setup a cluster with VMware and Cloudera. I adapted this tutorial to use VirtualBox instead, and this article describes the steps used
Read more

How-to: Get Started Writing Impala UDFs
January 24, 2014

User-defined functions (UDFs) let you code your own application logic for processing column values during a Cloudera Impala query. For example, a UDF could perform calculations using an external math library, combine several column values into one, do geospatial calculations, or other kinds of tests and transformations that are outside the scope of the built-in SQL operators and functions.

You can use UDFs to simplify query logic when producing reports, or to transform data in flexible ways when copying from one table to another with the INSERT … SELECT syntax.

Since release 1.2.0, Impala has supported UDFs written in C++. Although existing Apache Hive UDFs written in Java are supported as well, Cloudera recommends using C++ UDFs because the compiled native code can yield higher performance — as illustrated in the chart below (running on a single core; see sample UDF here):

Read more

Leave a Reply

Your email address will not be published. Required fields are marked *

− 3 = four