- Serving Static Content (HTML, CSS, JavaScript)
- Reverse Proxy (load balancing ): forward requests from clients to backend servers which handle the requests.
- API Gateway(distributing traffic): Acts as a single entry point for all API requests and distributing traffic
- SSL/TLS Termination: it decrypts incoming traffic, inspects it, and then re-encrypts it before sending it to the backend servers.
- Caching: NGINX can also be used as an HTTP cache, which can improve website performance by caching frequently requested content and serving it directly from memory
- load balancing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
to Convert json data to Rows and Columns | |
[{"data":[["2","2xg","2Q"],["1","3xg","3Q"]],"schema":[{"columnName":"CASE_UID","ordinal":0,"dataTypeName":"varchar"},{"columnName":"QUOTE_ID","ordinal":1,"dataTypeName":"varchar"},{"columnName":"OPP_NO","ordinal":2,"dataTypeName":"varchar"}]}] | |
""" | |
from pyspark.sql.functions import * | |
from pyspark.sql.types import * |
Note : For Mongo Cloud
- Set Password: Select Cluster > Security >Select User > Edit
- Add IP : Select Cluster > Security > Network Access > Add IP Address
ref:
SQL VS MongoDB
- Database = Database
Pre-Req:
- Install Python 3.9
- Find the location of python (
$which python
) and Keep it handy pip3 install ipython #optional
pip3 install pyspark
- Download apache spark zip > Unzip to a Path
- input data (features or predictors)
- Example - student's age, gender, previous grades, etc.
- numpy array or pandas DataFrame
- denoted by the variable X.
- target data (response or labels)
- Eg = student's final grade or pass/fail status.
- numpy array or pandas DataFrame
- denoted by the variable Y.
- Calender Versioning
- Semantic Versioning
- ubuntu 16.04 = Ubuntu October,2016
- Pycharm 2022.3.2
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyspark.sql.window import Window | |
""" | |
Aggregate: min, max, avg, count, and sum. | |
Ranking: rank, dense_rank, percent_rank, row_num, and ntile | |
Analytical: cume_dist, lag, and lead | |
Custom boundary: rangeBetween and rowsBetween | |
""" | |
Issues in Spark :
- Cannot update /change date
- No schema enforcement
- No delta load
- Data can be messed in overwrite
Adv of Delta Lake