This small exploration started when Dr Jon Rogers mentioned that one could get overlapping memory transfer and kernel execution by using device-mapped page-locked host memory (see section 3.2.4, Page-Locked Host Memory, of the CUDA C Programming Guide version 6.0) even for simple CUDA workflows, i.e., copying some data from the host to device, operating on that data on the device, and
import Denotational._ | |
/** | |
* Video: http://www.parleys.com/play/53a7d2c9e4b0543940d9e553 | |
* Slide: https://dl.dropboxusercontent.com/u/7083182/scaladays-slides/DenotationalSemantics.compressed.pdf | |
* | |
* For being a better Functional Programmer | |
* "Fold and unfold for program semantics" by Graham Hutton | |
* - http://eprints.nottingham.ac.uk/230/1/semantics.pdf | |
* |
// Credits: http://goo.gl/NtaADC | |
// Inline PTX assembly | |
uint add_asm(uint *result, const uint *a, const uint *b) { | |
uint carry; | |
asm("{\n\t" | |
"add.cc.u32 %0, %9, %17; \n\t" | |
"addc.cc.u32 %1, %10, %18; \n\t" | |
"addc.cc.u32 %2, %11, %19; \n\t" | |
"addc.cc.u32 %3, %12, %20; \n\t" |
As of version 3.3, python includes the very promising concurrent.futures
module, with elegant context managers for running tasks concurrently. Thanks to the simple and consistent interface you can use both threads and processes with minimal effort.
For most CPU bound tasks - anything that is heavy number crunching - you want your program to use all the CPUs in your PC. The simplest way to get a CPU bound task to run in parallel is to use the ProcessPoolExecutor, which will create enough sub-processes to keep all your CPUs busy.
We use the context manager thusly:
with concurrent.futures.ProcessPoolExecutor() as executor:
#include <sys/mman.h> | |
#include <sys/stat.h> | |
#include <fcntl.h> | |
#include <stdio.h> | |
#include <stdlib.h> | |
#include <unistd.h> | |
#define rgbtoy(b, g, r, y) \ | |
y=(unsigned char)(((int)(30*r) + (int)(59*g) + (int)(11*b))/100) | |
#define rgbtoyuv(b, g, r, y, u, v) \ |
import argparse, locale, timeit, random | |
from math import * | |
""" | |
We are given the following problem: | |
Given a text file with 999,999 lines, one number per line, | |
numbers in random order from 1 to 1,000,000 but a single number is | |
missing, figure out what number is missing. | |
Source: http://blog.moertel.com/posts/2013-12-14-great-old-timey-game-programming-hack.html#comment-1165807320 |
Tom Moertel <[email protected]> | |
2013-12-16 | |
We are given the following problem: | |
Given a text file with 999,999 lines, one number per line, | |
numbers in random order from 1 to 1,000,000 but a single number is | |
missing, figure out what number is missing. | |
Source: http://blog.moertel.com/posts/2013-12-14-great-old-timey-game-programming-hack.html#comment-1165807320 |
namespace TryOuts | |
{ | |
using System.Linq; | |
using System.Diagnostics; | |
using System.Collections.Generic; | |
/// <summary> | |
/// Tracks usage of a block of 2^14 (16384) slots using a bitmap that | |
/// contains the 16384 used/free bit flags for each slot (taking up a | |
/// total space of 2048 bytes, accessed as an array of 512 ints), in |
namespace TryOuts | |
{ | |
using System; | |
using System.Diagnostics; | |
using System.Runtime.CompilerServices; | |
/// <summary> | |
/// Tracker for usage of a block of 32 consecutive slots. Implemented as a | |
/// Fenwick tree with indexing optimized for queries. Encodes this | |
/// information into a long (64 bit) value, while leaving the high bit |
Original link: http://www.concentric.net/~Ttwang/tech/inthash.htm
Taken from: http://web.archive.org/web/20071223173210/http://www.concentric.net/~Ttwang/tech/inthash.htm
Reformatted using pandoc
Thomas Wang, Jan 1997
last update Mar 2007