Skip to content

Instantly share code, notes, and snippets.

View tuan3w's full-sized avatar
👋

Tuan Nguyen tuan3w

👋
View GitHub Profile
name: tensorflow
dependencies:
- backports=1.0=py27_0
- decorator=4.0.10=py27_0
- get_terminal_size=1.0.0=py27_0
- ipython=5.0.0=py27_0
- ipython_genutils=0.1.0=py27_0
- libgfortran=3.0.0=1
- mkl=11.3.3=0
- numpy=1.11.1=py27_0
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import math
IMAGE_PIXELS = 28
# Flags for defining the tf.train.ClusterSpec
tf.app.flags.DEFINE_string("ps_hosts", "",
"Comma-separated list of hostname:port pairs")
tf.app.flags.DEFINE_string("worker_hosts", "",
@tuan3w
tuan3w / Intro.md
Last active May 7, 2021 14:02
Pre-trained model for English -> Vietnamese NMT

Datasets

I had such a bad time trying to create english-vietnamese parallel corpus from bilingual stories, but it sucks. It just wastes a lot of time. So I try to find out as much corpora as possible throughout the internet. My final dataset consists of about 2.5M pair of sentences. You can find all corpora here: link

Model

I use OpenNMT to train my nmt model. Thanks Systran and HavardNLP for open source this project. It will help me and many others to understand how a industral translation system might work. The parameters of my model are as follow:

  • Preprocesssing: Using aggressive tokenizer provided by OpenNMT
@tuan3w
tuan3w / SignRandomProjectionLSH.scala
Last active February 9, 2017 07:56
SignRandomProjectionLSH
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
import tensorflow as tf
from tensorflow.contrib.framework import arg_scope
from tensorflow.contrib.layers.python.layers.utils import smart_cond
from tensorflow.python.ops.gen_array_ops import _concat_v2 as concat_v2
from layers import *
class Model(object):
def __init__(self, config,
inputs, labels, enc_seq_length, dec_seq_length, mask,
@tuan3w
tuan3w / docker_install.sh
Last active May 3, 2017 02:27
docker_install.sh
#!/bin/bash
sudo apt-get update
sudo apt-get install -y \
linux-image-extra-$(uname -r) \
linux-image-extra-virtual
sudo apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
@tuan3w
tuan3w / docker-compose.sh
Created March 7, 2017 09:05
docker-compose.sh
curl -L "https://github.com/docker/compose/releases/download/1.11.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
@tuan3w
tuan3w / VertxHystrixSample.java
Created July 28, 2017 04:45
sample circuit breaker with vertx
import com.soundcloud.prometheus.hystrix.HystrixPrometheusMetricsPublisher;
import com.vcc.bigdata.micro.cmd.FailCommand;
import io.vertx.circuitbreaker.CircuitBreaker;
import io.vertx.circuitbreaker.CircuitBreakerOptions;
import io.vertx.circuitbreaker.HystrixMetricHandler;
import io.vertx.core.AbstractVerticle;
import io.vertx.core.http.HttpServer;
import io.vertx.core.http.HttpServerOptions;
import io.vertx.ext.web.Router;
@tuan3w
tuan3w / swarm_register.py
Created October 7, 2017 15:56
swarm_register.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# vim:fenc=utf-8
#
# Copyright © 2017 zc <zc@www>
#
# Distributed under terms of the MIT license.
"""
Service registrator agent
@tuan3w
tuan3w / setup.py
Created November 16, 2017 05:50
setup.py
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
import os
from os.path import join as pjoin
import numpy as np