This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
set -x | |
# create the input file based on size (you can get size pattern by running fdisk -l as root) | |
# Be sure to exclude the Root disk if it is part of your config. You must edit this file to do so | |
size=$1 | |
shift; | |
fdisk -l|grep $size|awk '{print $2}'|sed -e"s/\:$//g" > foo |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
B5b. Configure Oozie SSH action | |
Sometimes, you may need to execute jobs on a specific node - instead of any cluster node. | |
For this you need oozie service user to be able to connect to the node of choice as your workflow user. | |
# The following documentation details configuring an application ID to execute a SSH action | |
# In the illustration- | |
# edge node=cdh-en01 | |
# oozie server=cdh-mn01 | |
# applicaiton ID=akhanolk |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package com.khanolkar.bda.util | |
/** | |
* @author Anagha Khanolkar | |
*/ | |
import org.apache.spark.sql.SparkSession | |
import org.apache.hadoop.fs.{ FileSystem, Path } | |
import org.apache.hadoop.conf.Configuration | |
import org.apache.spark.sql._ | |
import com.databricks.spark.avro._ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
spark-submit --class com.khanolkar.bda.util.CompactRawLogs \ | |
............ | |
MyJar-1.0.jar \ | |
"/user/akhanolk/data/raw/streaming/to-be-compacted/" \ | |
"/user/akhanolk/data/raw/compacted/" \ | |
"2" "128" "oozie-124" | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
using System; | |
using System.Text; | |
using Microsoft.ServiceBus.Messaging; | |
using System.Net; | |
using System.IO; | |
namespace StreamingAnalyticsEventPublisher | |
{ | |
class MeetupRSVPEventSender | |
{ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Kerberos | |
Kerberos is a network authentication protocol. It is designed to provide strong authentication for client/server applications by using secret-key cryptography | |
Kerberos Principals | |
A user in Kerberos is called a principal, which is made up of three distinct components: the primary, instance, and realm. | |
A Kerberos principal is used in a Kerberos-secured system to represent a unique identity. | |
The first component of the principal is called the primary, or sometimes the user component. | |
The primary component is an arbitrary string and may be the operating system username of the user or the name of a service. | |
The primary component is followed by an optional section called the instance, which is used to create principals that are used by users in special roles or to define the host on which a service runs, for example. | |
An instance, if it exists, is separated from the primary by a slash and then the content is used to disambiguate multiple principals for a single user or service. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# The following documentation details configuring an application ID to execute a SSH action | |
# In the illustration- | |
# edge node=cdh-sn03 | |
# oozie server=cdh-mn01 | |
# applicaiton ID=akhanolk | |
# ========================================== | |
# 1. On edge node, as application ID |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The sample programs, for Cascading(2.5.1) for Accumulo(1.5.0) are in github - | |
https://github.com/airawat/cascading.accumulo.examples | |
The source code for the extensions are at- | |
https://github.com/airawat/cascading.accumulo |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
About this gist: | |
================ | |
This gist is a part of a series of log parsers in Java Mapreduce, Pig, Hive, Python... | |
This one covers a log parser in Cascading. | |
It reads syslogs in HDFS - | |
a) Parses them based on a regex pattern & writes parsed files to HDFS | |
b) Writes records that dont match pattern to HDFS | |
c) Writes a report to HDFS that contains the count of distinct processes logged. | |
Other gists/blogs: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
...... | |
List<String> artifactList = new List<String> (); | |
var scanOpts = new ScanOptions(); | |
String rowRegex = rowID + ".*"; | |
IteratorSetting iterSttng = new IteratorSetting(); | |
iterSttng.Priority = 15; | |
iterSttng.Name = "rowIDRegexFilter"; | |
iterSttng.IteratorClass="org.apache.accumulo.core.iterators.user.RegExFilter"; |
NewerOlder