Skip to content

Instantly share code, notes, and snippets.

View mratsim's full-sized avatar
:shipit:

Mamy Ratsimbazafy mratsim

:shipit:
  • Paris
View GitHub Profile
@cluno
cluno / denotational.scala
Last active March 7, 2021 14:32
Erik Meijer's Denotational Semantics Code Example in ScalaDay 2014
import Denotational._
/**
* Video: http://www.parleys.com/play/53a7d2c9e4b0543940d9e553
* Slide: https://dl.dropboxusercontent.com/u/7083182/scaladays-slides/DenotationalSemantics.compressed.pdf
*
* For being a better Functional Programmer
* "Fold and unfold for program semantics" by Graham Hutton
* - http://eprints.nottingham.ac.uk/230/1/semantics.pdf
*
@fasiha
fasiha / README.md
Last active August 16, 2023 06:56
Understanding overlapping memory transfers and kernel execution for simple CUDA workflows

Understanding overlapping memory transfers and kernel execution for very simple CUDA workflows

Executive summary

This small exploration started when Dr Jon Rogers mentioned that one could get overlapping memory transfer and kernel execution by using device-mapped page-locked host memory (see section 3.2.4, Page-Locked Host Memory, of the CUDA C Programming Guide version 6.0) even for simple CUDA workflows, i.e., copying some data from the host to device, operating on that data on the device, and

@lawliet89
lawliet89 / gist:9677319
Created March 21, 2014 00:54
OpenCL Inline PTX for 256 Bits unsigned addition & multiplication
// Credits: http://goo.gl/NtaADC
// Inline PTX assembly
uint add_asm(uint *result, const uint *a, const uint *b) {
uint carry;
asm("{\n\t"
"add.cc.u32 %0, %9, %17; \n\t"
"addc.cc.u32 %1, %10, %18; \n\t"
"addc.cc.u32 %2, %11, %19; \n\t"
"addc.cc.u32 %3, %12, %20; \n\t"
@mangecoeur
mangecoeur / concurrent.futures-intro.md
Last active July 20, 2024 10:30
Easy parallel python with concurrent.futures

Easy parallel python with concurrent.futures

As of version 3.3, python includes the very promising concurrent.futures module, with elegant context managers for running tasks concurrently. Thanks to the simple and consistent interface you can use both threads and processes with minimal effort.

For most CPU bound tasks - anything that is heavy number crunching - you want your program to use all the CPUs in your PC. The simplest way to get a CPU bound task to run in parallel is to use the ProcessPoolExecutor, which will create enough sub-processes to keep all your CPUs busy.

We use the context manager thusly:

with concurrent.futures.ProcessPoolExecutor() as executor:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define rgbtoy(b, g, r, y) \
y=(unsigned char)(((int)(30*r) + (int)(59*g) + (int)(11*b))/100)
#define rgbtoyuv(b, g, r, y, u, v) \
@Faxn
Faxn / fun.py
Last active October 15, 2020 05:11
We are given the following problem: Given a text file with 999,999 lines, one number per line, numbers in random order from 1 to 1,000,000 but a single number is missing, figure out what number is missing. Source: http://blog.moertel.com/posts/2013-12-14-great-old-timey-game-programming-hack.html#comment-1165807320
import argparse, locale, timeit, random
from math import *
"""
We are given the following problem:
Given a text file with 999,999 lines, one number per line,
numbers in random order from 1 to 1,000,000 but a single number is
missing, figure out what number is missing.
Source: http://blog.moertel.com/posts/2013-12-14-great-old-timey-game-programming-hack.html#comment-1165807320
Tom Moertel <[email protected]>
2013-12-16
We are given the following problem:
Given a text file with 999,999 lines, one number per line,
numbers in random order from 1 to 1,000,000 but a single number is
missing, figure out what number is missing.
Source: http://blog.moertel.com/posts/2013-12-14-great-old-timey-game-programming-hack.html#comment-1165807320
anonymous
anonymous / UsageTrackingBlock.cs
Created November 22, 2013 03:48
A tracker of slot usage for a region spanning a total of 2^14 (16384) slots, in a total of 3072 bytes using a combination of a Fenwick tree and a bitmap vector.
namespace TryOuts
{
using System.Linq;
using System.Diagnostics;
using System.Collections.Generic;
/// <summary>
/// Tracks usage of a block of 2^14 (16384) slots using a bitmap that
/// contains the 16384 used/free bit flags for each slot (taking up a
/// total space of 2048 bytes, accessed as an array of 512 ints), in
anonymous
anonymous / FenwickTreeBitmap32.cs
Created November 22, 2013 03:18
A Fenwick tree for slot usage tracking of 32 slots in a 64 bit bitmap
namespace TryOuts
{
using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
/// <summary>
/// Tracker for usage of a block of 32 consecutive slots. Implemented as a
/// Fenwick tree with indexing optimized for queries. Encodes this
/// information into a long (64 bit) value, while leaving the high bit