Skip to content

Instantly share code, notes, and snippets.

@zoltanctoth
zoltanctoth / pyspark-udf.py
Last active July 15, 2023 13:23
Writing an UDF for withColumn in PySpark
from pyspark.sql.types import StringType
from pyspark.sql.functions import udf
maturity_udf = udf(lambda age: "adult" if age >=18 else "child", StringType())
df = spark.createDataFrame([{'name': 'Alice', 'age': 1}])
df.withColumn("maturity", maturity_udf(df.age))
df.show()
@sean-m
sean-m / log-reader-config.ps1
Created May 13, 2014 19:25
Read Windows event logs with Get-WinEvent cmd-let and serialize to json for shipping with logstash-forwarder. Much lighter weight than using logstash with the eventlog codec. Runtime memory usage is ~30Mb vs ~320Mb. Note: must be run with elevated privileges to read Security event logs.
<#
Configuration for log-reader.ps1
This file is only read if log-reader.ps1 is invoked without arguments
from the command-line. This file will configure any tunable variabes
in log-reader.ps1. Thus far, the only setting is which logs to track.
#>
[string[]]$logname = $("Application", "System", "Security")