This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Describe a complex data modeling challenge you faced in a previous project. How did you approach the problem, and what factors influenced your decisions regarding the data model? | |
A complex data modelling challenge I faced was designing a data model for a multinational e-commerce company that wanted to analyze customer behaviour across multiple countries. | |
The complexity in this scenario arises from the need to handle diverse data sources, different types of data (structured, semi-structured, and unstructured), and the requirement to support multi-language data. Additionally, I was required to implement the data model to be scalable to handle the increasing volume of data as the company grows. | |
My first approach was understanding the business requirements and the type of analysis that the company wants to perform. Second, I explored the available data sources to understand the data’s structure, quality, and content. Third, based on the requirements and the nature of the data, as a data engineer, I chose a hy |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
docker compose up run |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ docker build -t <image-name> . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FROM python:latest | |
WORKDIR app | |
COPY . /app | |
RUN python3 -m pip install -r requirements.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
psycopg2==2.9.3 | |
pytest== 7.1.2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
version: "3.9" | |
services: | |
postgres: | |
image: postgres:10.5 | |
restart: always | |
environment: | |
- POSTGRES_USER=<username> | |
- POSTGRES_PASSWORD=<password> | |
ports: | |
- '5433:5432' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Key Docker Commands | |
docker build --pull --no-cache --tag=<image-name> . | |
docker-compose up | |
docker-compose down |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Transformation | |
df = df.withColumn("latitude", col("latitude").cast(DoubleType()))\ | |
.withColumn("longitude", col("longitude").cast(DoubleType())) | |
df.printSchema() | |
df.show(10) | |
#More Insight |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark.sql import SparkSession | |
from pyspark.sql.types import DoubleType | |
from pyspark.sql.functions import col | |
jar_path = 'rds_jar_driver.jar' | |
spark = SparkSession \ | |
.builder \ | |
.appName("AWS REDSHIFT PYSPARK APP") \ | |
.config("spark.jars", jar_path)\ | |
.config('spark.driver.extraClassPath', jar_path) \ |
NewerOlder