This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| https://github.com/Thomas-George-T/Movies-Analytics-in-Spark-and-Scala | |
| Change execution engine = Tez, spark ( set Tez/Spark client jars into HADOOP_CLASSPATH) | |
| Partitioning - PARTITIONED BY clause is used to divide the table into buckets. | |
| Buckting - CLUSTERED BY clause is used to divide the table into buckets. | |
| Map-Side join, Bucket-Map-Side join, Sorted Bucket-Map-Side join | |
| Usage of suitable file format = ORC(Optimized Row Columnar) file formate | |
| Indexing | |
| Vectorization along with ORC | |
| CBO |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import org.apache.spark.sql.expressions.Window | |
| import org.apache.spark.sql.functions._ | |
| val data = sc.parallelize(Seq((101,"ram","12-01-2021",10001,120.00),(102,"sam","12-01-2021",10002,130.00),(101,"ram","12-01-2021",10003,140.00),(103,"jam","12-01-2021",10004,150.00),(101,"ram","12-01-2021",10005,130.00),(103,"jam","12-01-2021",10006,120.00),(102,"sam","12-01-2021",10007,130.00))) | |
| val dataDF = data.toDF("id","name","date","transid","amount") | |
| val windowSpec = Window.partitionBy("id").orderBy('transid desc) | |
| val dataDF1 = dataDF.withColumn("row_number",rank().over(windowSpec)) | |
| dataDF.printSchema | |
| dataDF.show() | |
| dataDF1.printSchema | |
| dataDF1.show() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| 1. How to find out second hightest value from a Map<String, Integer> | |
| Map<String, Integer> books = new HashMap<>(); | |
| books.put("one", 1); | |
| books.put("two", 22); | |
| books.put("three", 333); | |
| books.put("four", 4444); | |
| books.put("five", 55555); | |
| books.put("six", 666666); | |
| Stream<Integer> list = books.entrySet().stream().filter(e -> e.getValue().toString().length() > 3) | |
| .map(Map.Entry::getValue); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from pyspark.sql import SparkSession | |
| from pyspark.sql import Window | |
| from pyspark.sql.functions import * | |
| spark = SparkSession \ | |
| .builder \ | |
| .master('local') \ | |
| .appName('pyspark-test-run') \ | |
| .getOrCreate() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from pyspark.sql import SparkSession | |
| from pyspark.sql.functions import * | |
| spark = SparkSession \ | |
| .builder \ | |
| .master('local') \ | |
| .appName('pyspark-test-run') \ | |
| .getOrCreate() | |
| spark.sparkContext.setLogLevel("ERROR") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from pyspark.sql import SparkSession | |
| from pyspark.sql.functions import * | |
| spark = SparkSession \ | |
| .builder \ | |
| .master('local') \ | |
| .appName('pyspark-test-run') \ | |
| .getOrCreate() | |
| spark.sparkContext.setLogLevel("ERROR") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from pyspark.sql import SparkSession | |
| from pyspark.sql.functions import * | |
| from pyspark.sql.types import StructType, StructField, StringType, TimestampType | |
| if __name__ == "__main__": | |
| spark = SparkSession \ | |
| .builder \ | |
| .master('local') \ | |
| .appName('pyspark-test-run') \ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| MySQL Shell 8.0.28 | |
| Copyright (c) 2016, 2022, Oracle and/or its affiliates. | |
| Oracle is a registered trademark of Oracle Corporation and/or its affiliates. | |
| Other names may be trademarks of their respective owners. | |
| Type '\help' or '\?' for help; '\quit' to exit. | |
| MySQL JS > \connect thanooj@localhost | |
| Creating a session to 'thanooj@localhost' | |
| Please provide the password for 'thanooj@localhost': ******** |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Hadoop 3.2.2 | |
| Start-up commands: | |
| -------------------------------- | |
| 1. Stop the dfs and yarn first. | |
| 2. Remove the datanode and namenode directories as specified in the core-site.xml file. | |
| 3. Re-create the directories. | |
| 4. hdfs namenode -format | |
| 5. Then re-start the dfs and the yarn as follows. | |
| start-dfs.sh | |
| start-yarn.sh |