Felipe Forbeck fforbeck

Interesse dos usuários

Motivação

Para empresas que trabalham com publicidade online, mais precisamente DSPs e suas plataformas de Real-time Bidding é muito importante coletar a analisar informações a cerca do comportamento e interesses dos usuários enquanto navegam na internet. Sendo assim, descrevo nesse graph-gist uma abordagem básica que pode ser utilizada para analisar tais dados, considerando um determinado período de tempo e os produtos visualizados por cada usuário. Vamos considerar que alguns personagens de Breaking Bad navegaram na internet há alguns dias e encontraram alguns produtos, elementos químicos e, possuem interesse em comprá-los. Tal informação será de extrema importância no momento de dar um lance em um leilão de publicidade, sabendo o perfil e interesse de um determinado usuário. Então armazenamos tais usuários, produtos e datas das visualizações para que possamos extrair essas informações futuramente.

Interest of users

Portuguese version

Motivation

For companies that work with online advertising, more precisely DSPs and their Real-time Bidding platforms is very important to consider collecting information about the behavior and interests of users while they surfing on the internet. Therefore, in this graph gist, I describe a basic approach that can be used to analyze such data, considering a certain period of time and the products visualized by each user. Let’s consider some characters from Breaking Bad surfed on the internet a few days ago and found some products interesting, chemical elements, and they are thinking about buying them. Such information is extremely important when making a bid at auction advertising, knowing the profile and interests of a given user. Then we store these users, products and dates of views so we can extract this information in the future.

	#!/usr/bin/env bash
	URL="http://localhost:15672/cli/rabbitmqadmin"

	VHOST="<>"
	USER="<>"
	PWD="<>"

	QUEUE="<>"
	FAILED_QUEUE="<>"

	##
	# http://stackoverflow.com/questions/19967472/elasticsearch-unassigned-shards-how-to-fix
	##
	NODE="YOUR NODE NAME"
	IFS=$'\n'
	for line in $(curl -s 'localhost:9200/_cat/shards' \| fgrep UNASSIGNED); do
	INDEX=$(echo $line \| (awk '{print $1}'))
	SHARD=$(echo $line \| (awk '{print $2}'))

	curl -XPOST 'localhost:9200/_cluster/reroute' -d '{

	If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:

	- Use create in the index API (assuming you can).
	- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
	- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
	- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
	- Increase the memory allocated to elasticsearch node. By default its 1g.
	- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
	- Increase the number of machines you have so

	// installed Clojure packages:
	//
	// * BracketHighlighter
	// * lispindent
	// * SublimeREPL
	// * sublime-paredit

	{
	"word_separators": "/\\()\"',;!@$%^&\|+=[]{}`~?",
	"paredit_enabled": true,

	{:user {:dependencies [[org.clojure/tools.namespace "0.2.3"]
	[spyscope "0.1.3"]
	[criterium "0.4.1"]]
	:injections [(require '(clojure.tools.namespace repl find))
	; try/catch to workaround an issue where `lein repl` outside a project dir
	; will not load reader literal definitions correctly:
	(try (require 'spyscope.core)
	(catch RuntimeException e))]
	:plugins [[lein-pprint "1.1.1"]
	[lein-beanstalk "0.2.6"]

	# The command deletes the parameter after all messages are moved origin to target queue
	rabbitmqctl set_parameter -p <vhost> shovel "<origin.queue.name>" '{"src-uri":"amqp://<user>:<pwd>@/<vhost_name>","src-queue":"<origin.queue.name>","dest-uri":"amqp://<user>:<pwd>@/<vhost_name>","dest-exchange":"<target.queue.name>","prefetch-count":1,"reconnect-delay":5,"add-forward-headers":false,"ack-mode":"on-confirm","delete-after":"queue-length"}'

	import static org.apache.commons.lang.StringUtils.isBlank;

	import javax.servlet.http.HttpServletRequest;
	import javax.servlet.http.HttpServletResponse;

	import org.apache.log4j.Logger;

	import net.tanesha.recaptcha.ReCaptchaImpl;
	import net.tanesha.recaptcha.ReCaptchaResponse;

	#New graph
	START root=node(0)
	CREATE
	(User1 { name:'User1' }),
	(User2 { name: 'User2' }),
	(User3 { name: 'User3' }),

	(Mac { name: 'Mac' }),
	(Samsung { name: 'Samsung' }),
	(Brastemp { name: 'Brastemp' }),