Skip to content

Instantly share code, notes, and snippets.

View mdrakiburrahman's full-sized avatar
💤
living the data dream

Raki mdrakiburrahman

💤
living the data dream
View GitHub Profile
DROP TABLE IF EXISTS foo_dim;

DROP TABLE IF EXISTS bar_fact;

CREATE TABLE IF NOT EXISTS foo_dim
  (
     id    INT,
     name  STRING,
 score INT
#!/usr/bin/env bash
COUNT=${1:-10000000000}
for ((i=1; i<=COUNT; i++)); do
NOW=$(date +%s)000000000
curl -sS -X POST http://127.0.0.1:4320/v1/logs \
-H 'Content-Type: application/json' \
--data-binary @- <<JSON
{"resourceLogs":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"vs-code"}}]},"scopeLogs":[{"scope":{"name":"my.library","version":"1.0.0"},"logRecords":[{"timeUnixNano":"${NOW}","observedTimeUnixNano":"${NOW}","severityNumber":10,"severityText":"Information","traceId":"5B8EFFF798038103D269B633813FC60C","spanId":"EEE19B7EC3C1B174","body":{"stringValue":"Coming to you live from the Visual Studio Code (${i}/${COUNT})"},"attributes":[{"key":"string.attribute","value":{"stringValue":"some string"}},{"key":"boolean.attribute","value":{"boolValue":true}},{"key":"int.attribute","value":{"intValue":"10"}},{"key":"double.attribute","value":{"doubleValue":637.704}},{"key":"iteration","value":{"intValue":"${i}"}}]}]}]}]}
JSON

OPENIVM_VALIDATE failures: gold.fact_watches + gold.fact_holdings

Owner: next agent Goal: Reproduce the two SF=10 BATCH_1=100 BATCH_2=1 BATCH_3=1 ENGINES=spark-openivm OPENIVM_VALIDATE=1 failures inside spark-ext/ivm-it's TpcDiSpec so they reproduce on ./spark-ext/dev/dev.sh test 'testOnly org.openivm.spark.parity.TpcDiSpec' — no Docker, no bench, no ~30-minute round-trip — and fix them.

Reproduction state at handoff

  • mdrakiburrahman/openivm-spark @ 06678f2 (latest pins)
  • mdrakiburrahman/openivm @ 2b712a0a3
  • mdrakiburrahman/lpts @ 2b2ff63 (openivm-spark-glibc-2.35; see LPTS-BACKPORT-PLAN.md for the consolidation plan)
@mdrakiburrahman
mdrakiburrahman / address.csv
Created April 15, 2026 00:26
dbt-adventureworks seed CSV data for dbt-fabricspark integration tests
We can make this file beautiful and searchable if this error is corrected: It looks like row 8 should actually have 9 columns, instead of 8 in line 7.
addressid,addressline1,addressline2,city,stateprovinceid,postalcode,spatiallocation,rowguid,modifieddate
1,1970 Napa Ct.,,Bothell,79,98011,E6100000010CAE8BFC28BCE4474067A89189898A5EC0,9aadcb0d-36cf-483f-84d8-585c2d4ec6e9,2007-12-04 00:00:00
2,9833 Mt. Dias Blv.,,Bothell,79,98011,E6100000010CD6FA851AE6D74740BC262A0A03905EC0,32a54b9e-e034-4bfb-b573-a71cde60d8c0,2008-11-30 00:00:00
4,9539 Glenside Dr,,Bothell,79,98011,E6100000010C813A0D5F9FDE474011A5C28A7C955EC0,e5946c78-4bcc-477f-9fa1-cc09de16a880,2009-02-03 00:00:00
5,1226 Shoe St.,,Bothell,79,98011,E6100000010C61C64D8ABBD94740C460EA3FD8855EC0,fbaff937-4a97-4af0-81fd-b849900e9bb0,2008-12-19 00:00:00
6,1399 Firestone Drive,,Bothell,79,98011,E6100000010CE0B4E50458DA47402F12A5F80C975EC0,febf8191-9804-44c8-877a-33fde94f0075,2009-02-13 00:00:00
7,5672 Hale Dr.,,Bothell,79,98011,E6100000010C18E304C4ADE1474011A5C28A7C955EC0,0175a174-6c34-4d41-b3c1-4419cd6a0446,2009-12-11 00:00:00
8,6387 Scenic Avenue,,Bothell,79,98011,E6100000010C0029A5D93BDF4740E248962FD5975EC0,3715
@mdrakiburrahman
mdrakiburrahman / address.csv
Last active April 1, 2026 03:04
dbt-adventureworks seed data for feldera integration tests
We can make this file beautiful and searchable if this error is corrected: It looks like row 8 should actually have 9 columns, instead of 8 in line 7.
addressid,addressline1,addressline2,city,stateprovinceid,postalcode,spatiallocation,rowguid,modifieddate
1,1970 Napa Ct.,,Bothell,79,98011,E6100000010CAE8BFC28BCE4474067A89189898A5EC0,9aadcb0d-36cf-483f-84d8-585c2d4ec6e9,2007-12-04 00:00:00
2,9833 Mt. Dias Blv.,,Bothell,79,98011,E6100000010CD6FA851AE6D74740BC262A0A03905EC0,32a54b9e-e034-4bfb-b573-a71cde60d8c0,2008-11-30 00:00:00
4,9539 Glenside Dr,,Bothell,79,98011,E6100000010C813A0D5F9FDE474011A5C28A7C955EC0,e5946c78-4bcc-477f-9fa1-cc09de16a880,2009-02-03 00:00:00
5,1226 Shoe St.,,Bothell,79,98011,E6100000010C61C64D8ABBD94740C460EA3FD8855EC0,fbaff937-4a97-4af0-81fd-b849900e9bb0,2008-12-19 00:00:00
6,1399 Firestone Drive,,Bothell,79,98011,E6100000010CE0B4E50458DA47402F12A5F80C975EC0,febf8191-9804-44c8-877a-33fde94f0075,2009-02-13 00:00:00
7,5672 Hale Dr.,,Bothell,79,98011,E6100000010C18E304C4ADE1474011A5C28A7C955EC0,0175a174-6c34-4d41-b3c1-4419cd6a0446,2009-12-11 00:00:00
8,6387 Scenic Avenue,,Bothell,79,98011,E6100000010C0029A5D93BDF4740E248962FD5975EC0,3715
"""Fabric Lakehouse ODBC query runner.
Loads query definitions from a YAML file, builds a WHERE clause from caller-
supplied scope (months, services, teams, severities), and executes queries
in parallel via ThreadPoolExecutor — all sharing a single ODBC connection
(``ReuseSession=true``) so only one Livy session is created.
A warm-up ``SELECT 1`` runs first to ensure the Livy session is alive.
Then all real queries fire in parallel using cursors from the same connection.
"""
"""Fuzzy title bucketing: TF-IDF clustering → Soundex rebalancing.
Groups incident titles that are semantically similar (e.g. same alert
with a different region suffix, or "errors" vs "failures" variants) into a
single bucket label.
Pipeline
--------
1. **Normalise** — strip bracketed prefixes (``[topic=…]``), quoted strings,
``Region: …`` labels, UUIDs, IPs, timestamps, and uppercase region codes.
package me.rakirahman.quality.table.deltalake
import me.rakirahman.config.DeltaLakeConfiguration
import me.rakirahman.logging.level.LoggingConstants
import me.rakirahman.metastore.MetastoreOperations
import me.rakirahman.quality.deequ.repository.metric.spark.table.DataQualityMetadata
import me.rakirahman.quality.table.TableAnalyzer
import io.delta.tables._
package me.rakirahman.connection.fabric.sql
import me.rakirahman.connection.fabric.MetadataManager
import me.rakirahman.feeds.authentication.jwt.JwtScopeExtensions._
import me.rakirahman.feeds.authentication.jwt.JwtScopes
import com.azure.core.credential.{TokenCredential, TokenRequestContext}
import org.apache.http.client.methods.{HttpGet, HttpPost}
import org.apache.http.entity.{ContentType, StringEntity}
{
"eventTime":"2026-01-09T23:18:22.634Z",
"producer":"https://github.com/OpenLineage/OpenLineage/tree/1.23.0/integration/spark",
"schemaURL":"https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent",
"eventType":"START",
"run":{
"runId":"019ba50d-c8ab-798b-86ae-e437366a5a3f",
"facets":{
"parent":{
"_producer":"https://github.com/OpenLineage/OpenLineage/tree/1.23.0/integration/spark",