-
Sudo to the hdfs user to begin generating data. Change to the home directory for the hdfs user:
- sudo -u hdfs -s
- cd /home/hdfs
-
Download the testbench utilities from Github and unzip them:
- wget https://github.com/hortonworks/hive-testbench/archive/hive14.zip
- unzip hive14.zip
create database if not exists llap | |
location 's3a://<your_S3_bucket>/llap.db'; | |
drop table if exists llap.customer; | |
create table llap.customer | |
stored as orc | |
as select * from tpch_text_2.customer; | |
drop table if exists llap.lineitem; | |
create table llap.lineitem |
Dell EMC and Hortonworks brings together industry leading solutions for enterprise-ready open data platforms and modern data applications, helping our customers Modernize, Automate and Transform how they deliver IT services to their critical business applications while simultaneously realizing cost savings allowing them to fund and invest in new technologies, methodologies and skills to succeed in the emerging digital economy. Empower your organization with deeper insights and enhanced data-driven decision making by using the right infrastructure for the right data. With solutions that integrate, store, manage, and protect your data, you can rapidly deploy Big Data analytics applications or start to develop your own.
As a Select member of the Dell EMC Technology Connect Partner Program, Dell EMC is able to resell Hortonworks Data Platform (HDP™), giving customers a simple way to procure Open Enterprise Hadoop as a complementary component of their data architectures to enable a broad range of new applications
Azure HDInsight, is an enterprise grade cloud platform for industry's leading open source big data technologies.
The best way to explain big data is to look at how customers are leveraging big data to be more productive on Azure HDInsight.
AccuWeather is a global technology firm which is leveraging Microsoft cloud to build predictive analytics as part of the solutions. With the power of Microsoft cloud and Azure HDInsight, AccuWeather has been able to scale to billions of requests a day and to scale petabytes of data in size.
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ |
import org.json.JSONArray; | |
import org.json.JSONException; | |
import org.json.JSONObject; | |
public class WeatherDataParser { | |
/** | |
* Given a string of the form returned by the api call: | |
* http://api.openweathermap.org/data/2.5/forecast/daily?q=94043&mode=json&units=metric&cnt=7 | |
* retrieve the maximum temperature for the day indicated by dayIndex |
Hive is designed to enable easy data summarization and ad-hoc analysis of large volumes of data. It uses a query language called Hive-QL which is similar to SQL.
In this tutorial, we will explore the following:
- Load a data file into a Hive table
- Create a table using RCFormat
- Query tables
- Managed tables vs external tables
In this section we are going to walk through the process of using Apache Zeppelin and Apache Spark to interactively analyze data on a Apache Hadoop Cluster.
By the end of this tutorial, you will have learned:
- How to interact with Apache Spark from Apache Zeppelin
- How to read a text file from HDFS and create a RDD
- How to interactively analyze a data set through a rich set of Spark API operations
This tutorial describes how to load data into the Hortonworks sandbox.
The Hortonworks sandbox is a fully contained Hortonworks Data Platform (HDP) environment. The sandbox includes the core Hadoop components (HDFS and MapReduce), as well as all the tools needed for data ingestion and processing. You can access and analyze sandbox data with many Business Intelligence (BI) applications.
In this tutorial, we will load and review data for a fictitious web retail store in what has become an established use case for Hadoop: deriving insights from large data sources such as web logs. By combining web logs with more traditional customer data, we can better understand our customers, and also understand how to optimize future promotions and advertising.
- Hortonworks Sandbox 2.3 (installed and running)