Skip to content

Instantly share code, notes, and snippets.

View matheus-rossi's full-sized avatar
🎯
Data Engineer

Matheus Rossi matheus-rossi

🎯
Data Engineer
View GitHub Profile
apiVersion: v2
name: datahub-prerequisites
description: A Helm chart for packages that Datahub depends on
type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
version: 0.0.14
dependencies:
- name: elasticsearch
version: 7.17.3
@matheus-rossi
matheus-rossi / spark_tips_01.py
Created February 21, 2024 18:26
spark_tips_01
from pyspark.sql import SparkSession
spark = (
SparkSession
.builder
.appName("spark_parameterized_queries")
.getOrCreate()
)
##### Criando dois datasets de teste #####
import sys
# List comprehension
list_comprehension = [i for i in range(10_000_000)]
print(f"List comprehension memory: {sys.getsizeof(list_comprehension) / (1024 * 1024)} MB")
# Yield generator
def generator():
for i in range(10_000_000):
yield i
import string, yaml
def load_yaml(file_path: str, context: dict = None):
def string_constructor(loader, node):
t = string.Template(node.value)
value = t.substitute(context)
return value
l = yaml.SafeLoader
l.add_constructor('tag:yaml.org,2002:str', string_constructor)