Skip to content

Instantly share code, notes, and snippets.

View tridungle's full-sized avatar

Trí Dũng Lê tridungle

  • Independent
  • Ho Chi Minh
View GitHub Profile
@tridungle
tridungle / docker_operator_example.py
Created May 26, 2021 08:56 — forked from anteverse/docker_operator_example.py
DockerOperator illustration
from airflow import DAG
from airflow.operators.docker_operator import DockerOperator
dag = DAG(dag_id="example DAG", default_args=default_args)
run_task_with_docker = DockerOperator(
task_id='run_task_with_docker',
# Assuming this image is already pulled
@tridungle
tridungle / Airflow external triggers
Created May 24, 2021 10:59 — forked from bartosz25/Airflow external triggers
Apache Airflow external trigger example
curl -X POST http://localhost:8081/api/experimental/dags/hello_world_a/dag_runs -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -d '{"conf":"{\"task_payload\":\"payload1\"}"}'
curl -X POST http://localhost:8081/api/experimental/dags/hello_world_a/dag_runs -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -d '{"conf":"{\"task_payload\":\"payload2\"}"}'
curl -X POST http://localhost:8081/api/experimental/dags/hello_world_a/dag_runs -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -d '{"conf":"{\"task_payload\":\"payload3\"}"}'
@tridungle
tridungle / node-scheduler.md
Created May 20, 2021 06:21
The simplest way to run Node.js code on a schedule

In the era of the cloud, it’s also outdated. Cron jobs are simple only if you cut your teeth on Linux and can effectively administer a server. This is an inaccessible option for many modern devs who operate far up the stack.

So it’s no surprise that we have other choices for scheduling code in 2020:

AWS Lambda + CloudWatch Events
Google Cloud Scheduler
Airflow
Kubernetes CronJobs
ECS Tasks

etc.

@tridungle
tridungle / FbContactCrawler.py
Created May 11, 2021 09:08 — forked from nix010/FbContactCrawler.py
Crawl Facebook profile of user by user fbid
import json
from pprint import pprint
import requests
from bs4 import BeautifulSoup as BS
import facebook
class FbBaseCrawler(object):
default_headers = {
'Accept' :'*/*',
@tridungle
tridungle / PySpider
Created May 11, 2021 08:34 — forked from T31337/PySpider
Scrapy & BeautifulSoup Based Python Spider
from bs4 import BeautifulSoup, SoupStrainer
from scrapy.selector import HtmlXPathSelector
from urllib import request
import requests
from scrapy.linkextractors import LinkExtractor
import requests.utils
import scrapy.link
import scrapy
import scrapy.spiders
from scrapy.spiders.crawl import CrawlSpider, Rule
AWSTemplateFormatVersion: "2010-09-09"
Transform: "AWS::Serverless-2016-10-31"
Description: >
CloudFormation in Action: An example with seven AWS services (SNS, SQS, Lambda, Kinesis, S3, Glue and Athena)
Use case: How to set-up an environment in AWS to perform data analysis on the daily numbers of Covid-19?
Author: Muttalip Kucuk
Date: 01-11-2020
Globals:
const mongoose = require("mongoose");
const Schema = mongoose.Schema;
const hotelSchema = new Schema({
hotel_name: {type: String},
location: {type: String},
city: {type: String},
state: {type: String},
present_price: {type: String},
features: {type: [String]},
hotelsNg_link: {type: String},
const crawler = require("web-crawljs");
/**//
* @description removes duplicate hotels for the array
* @param arr
* @returns {Array.<T>|*}
*/
function makeUnique(arr) {
"use strict";
const key = {};
function getPageType() {
var path = window.location.pathname;
if (path.split('-')[0] == "/Hotel_Review") { return "search"; }
if (path.split('-')[0] == "/Hotels") { return "details"; }
return "unrecognized";
}
function get_location_name() {
var PT = getPageType();
if (PT === "details") {
@tridungle
tridungle / crawl.php
Created May 11, 2021 07:59 — forked from jimbojsb/crawl.php
Entertainment.com Crawler
<?php
/**
{
"require": {
"symfony/dom-crawler": "2.*",
"symfony/css-selector": "2.*"
}
}
*/