Skip to content

Instantly share code, notes, and snippets.

@SiddheshKukade
Last active May 27, 2023 03:42
Show Gist options
  • Save SiddheshKukade/d93af0418be52b496f62951a20433c6f to your computer and use it in GitHub Desktop.
Save SiddheshKukade/d93af0418be52b496f62951a20433c6f to your computer and use it in GitHub Desktop.
hadoop basic assignment for access log counter

Requirements

  • count.java
  • mapper.java
  • reducer.java

Steps

  1. Create a new eclipse project
  2. give name maximum_log
  3. It should be as follows:
maximum_log
 -- src
    -- maximum_log
      - count.java
      - mapper.java
      - reducer.java
  1. Now Right Click on Project -> Build Path -> Add External Archives..
  2. Then select other location -> computer -> user -> local -> hadoop-> share -> hadoop -> hdfs HDFS Folder
  3. select hadoop-hdfs-3.3.4-jar
  4. select hadoop-hdfs-client-3.3.4-jar

Map Reducer folder

  1. select hadoop_map_reducer_client_core-3.3.4.jar
  2. select hadoop_map_reducer_client_common-3.3.4.jar

Common Folder

  1. Select hadoop-common-3.3.3.jar

  2. Exporting the Project Click on File -> Export -> select java -> select jarfille -> select name of project -> Browse -> Downloads -> Save -> finish -> mycount.jar

Running the java code in Hadoop

  1. start-all.sh OR start-dfs.sh , start-yarn.sh
  2. create the directory on the hadoop hdfs dfs -mkdir /dir_name
  3. put the file from pc to hadoop
  • cd to input file
  • hdfs dfs -put input_file_Name /dir_name
  1. open Eclipse and create java project
  2. create the package of the name my
  • chage the version to java 1.7 ( right click on project -> propertise -> click on java compiler -> untick the JAVA SE 17 -> choose version from dropdown 1.7 -> apply and close) checking the files on the hadoop ===>> Go to localhost:9870
  1. Add the required 6 libraries
 	go to path ==>> other loacation/computer/usr/local/hadoop/share/hadoop |
/hdfs/hadoop-hdfs-3.3.4.jar  (No.1)
/hdfs/hadoop-hdfs-client-3.3.4.jar (No.3)
   									        
/common/hadoop-common-3.3.4.jar (No.1)
   									        
/mapreduce/hadoop-mapreduce-client-common-3.3.4.jar (No.2)
/mapreduce/hadoop-mapreduce-client-core-3.3.4.jar (No.3)
/mapreduce/hadoop-mapreduce-client-jobclient-3.3.4.jar (No.6)
  1. go to package right click and create class

mapper.java

code

package sidhd;

import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import java.util.StringTokenizer;
import java.io.IOException;
import java.lang.InterruptedException;

public class mapper extends Mapper < Object, Text, Text, IntWritable > {
    public void map(Object offset, Text key, Context con) throws IOException,
    InterruptedException {
        StringTokenizer token = new StringTokenizer(key.toString(), " - - ");
        //while(token.hasMoreElements())
        //{
        con.write(new Text(token.nextToken()), new IntWritable(1));
        //}
    }

}

reducer.java

package sidhd;

import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import java.io.IOException;
import java.lang.InterruptedException;

public class reducer extends Reducer < Text, IntWritable, Text, IntWritable > {
    public void reduce(Text key, Iterable < IntWritable > values, Context context) throws IOException,
    InterruptedException {
        int sum = 0;
        for (IntWritable val: values) {
            sum += val.get();
        }
        context.write(key, new IntWritable(sum));
    }
}

count.java

package sidhd;

import java.io.IOException;
import java.lang.ClassNotFoundException;
import java.net.URI;
import java.util.Scanner;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class count {

    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "log count");
        job.setJarByClass(count.class);
        job.setMapperClass(mapper.class);
        job.setReducerClass(reducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.waitForCompletion(true);


        FileSystem fs = FileSystem.get(URI.create("hdfs://localhost:9000" + args[1]), conf);

        Path study = fs.listStatus(new Path(args[1]))[1].getPath();

        FSDataInputStream in = null;

        in = fs.open(study);

        int max = 0;


        String obj;

        Scanner sc = new Scanner( in );

        String result = null;
        while (sc.hasNext()) {
            obj = sc.nextLine();
            String[] arrobj = obj.trim().split("\t+");
            int n = Integer.parseInt(arrobj[1]);
            if (n > max) {
                max = n;
                result = obj;
            }
        }
        System.out.println(result);
        //sc.close();
    }

}
  1. hadoop jar /home/hduser/Downloads/sid.jar access_log.wordcount /datasett /siddheshop
hdfs dfs -put data.txt 
hadoop jar `/home/hduser/Downloads/sid.jar` sidhd.wordcount data.txt    /siddheshop
hdfs dfs -cat /siddheshop/*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment