Hui Zheng hui-zheng

9 followers · 12 following

Semios
Vancouver
http://hui-zheng.github.io/

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

hui-zheng / progressive-vs-conservative.md

Last active May 15, 2025 21:44

进步主义和保守主义底层逻辑的简单介绍与对比

进步主义 vs 保守主义

话题庞大，必有疏漏错误，抛砖引玉，启发思考
presentation deck

政治光谱上的两棵大树

西方社会和文明在当今的两个主流政治思潮
左边：进步主义（Progressivism）和右边：保守主义（Conservatism）
政治是文化和信仰的下游（当然，进步主义者不一定怎么认为）

hui-zheng / cat_with_filenames.sh

Last active May 27, 2020 17:01

bash script recipes for all

	# cat multiple files and show filenames
	grep ^ /dev/null $@

hui-zheng / find_and_display_duplicates

Created March 14, 2020 19:49

[BigQuery Advanced SQL] the most flexible script to detect and display duplicate records and remove duplicates (dedup)

	-- base_time has t
	WITH rows_by_key AS(
	SELECT
	surrogate_key,
	array_agg(base_table) as _rows,
	count(*) as _count
	FROM `gcp_project.data_set.original_table` as base_table
	WHERE stamp BETWEEN "2020-03-12T00:00:00" AND "2020-03-14T00:00:00"
	GROUP BY surrogate_key
	)

hui-zheng / clean_kubernetes_jobs.sh

Last active March 5, 2020 16:39

kubernetes DevOps Operation Script

	BY_LIST_FILE="NONE"
	COMPLETED="1/."
	AGE="3"
	NAME_PATTERN=".*"


	while [[ $# -gt 0 ]]; do
	key="$1"

	case $key in

hui-zheng / SQL non-null greatest for multiple columns

Last active February 7, 2024 04:01

[BigQuery Advanced SQL] find greatest/largest/max non-null values among multiple columns

	-- Below is a fancy version of non-null-greatest() for multi-columns.
	-- it is more extensible for more two columns.
	WITH base AS (
	SELECT
	(SELECT ARRAY_AGG (x IGNORE NULLS) AS Y FROM UNNEST ([col_1, col_2, col_3, col_4]) AS x)
	AS array,
	FROM source_table AS nl
	)
	SELECT
	(SELECT MAX(y) FROM UNNEST(array) AS Y

hui-zheng / BQ_partition_dedup.sql

Last active December 13, 2024 00:57

This list provides BigQuery SQL templates that remove duplicates for large size timestamp partitioned table (using MERGE statement) and for small size table or a non-partition table (Using REPLACE TABLE statement)

	-- WARNING: back up the table before this operation
	-- FOR large size timestamp partitioned table
	-- -------------------------------------------
	-- -- To de-duplicate rows of a given range of a partition table, using surrage_key as unique id
	-- -------------------------------------------

	DECLARE dt_start DEFAULT TIMESTAMP("2019-09-17T00:00:00", "America/Los_Angeles") ;
	DECLARE dt_end DEFAULT TIMESTAMP("2019-09-22T00:00:00", "America/Los_Angeles");

	MERGE INTO `gcp_project`.`data_set`.`the_table` AS INTERNAL_DEST