Skip to content

Instantly share code, notes, and snippets.

@shamoya
shamoya / rdma_bench.py
Created May 4, 2017 14:17 — forked from llhe/rdma_bench.py
Benchmark with RDMA
"""Benchmark tensorflow distributed by assigning a tensor between two workers.
Usage:
Start worker 1:
python rdma_bench.py --workers="hostname1:port,hostname2:port" --protocol=grpc+verbs --task 0
Start worker 2:
python rdma_bench.py --workers="hostname1:port,hostname2:port" --protocol=grpc+verbs --task 1
Run the tests:
@shamoya
shamoya / local_distributed_benchmark.py
Created April 26, 2017 11:30 — forked from yaroslavvb/local_distributed_benchmark.py
Benchmark distributed tensorflow locally by adding vector of ones on worker2 to variable on worker1 as fast as possible
"""Benchmark tensorflow distributed by adding vector of ones on worker2
to variable on worker1 as fast as possible.
On 2014 macbook, TensorFlow 0.10 this shows
Local rate: 2175.28 MB per second
Distributed rate: 107.13 MB per second
"""
import tensorflow as tf
from tensorflow.python.framework import ops
import numpy as np
# Define custom py_func which takes also a grad op as argument:
def py_func(func, inp, Tout, stateful=True, name=None, grad=None):
# Need to generate a unique name to avoid duplicates:
rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))