Para empresas que trabalham com publicidade online, mais precisamente DSPs e suas plataformas de Real-time Bidding é muito importante coletar a analisar informações a cerca do comportamento e interesses dos usuários enquanto navegam na internet. Sendo assim, descrevo nesse graph-gist uma abordagem básica que pode ser utilizada para analisar tais dados, considerando um determinado período de tempo e os produtos visualizados por cada usuário. Vamos considerar que alguns personagens de Breaking Bad navegaram na internet há alguns dias e encontraram alguns produtos, elementos químicos e, possuem interesse em comprá-los. Tal informação será de extrema importância no momento de dar um lance em um leilão de publicidade, sabendo o perfil e interesse de um determinado usuário. Então armazenamos tais usuários, produtos e datas das visualizações para que possamos extrair essas informações futuramente.
#!/usr/bin/env bash | |
URL="http://localhost:15672/cli/rabbitmqadmin" | |
VHOST="<>" | |
USER="<>" | |
PWD="<>" | |
QUEUE="<>" | |
FAILED_QUEUE="<>" |
# The command deletes the parameter after all messages are moved origin to target queue | |
rabbitmqctl set_parameter -p <vhost> shovel "<origin.queue.name>" '{"src-uri":"amqp://<user>:<pwd>@/<vhost_name>","src-queue":"<origin.queue.name>","dest-uri":"amqp://<user>:<pwd>@/<vhost_name>","dest-exchange":"<target.queue.name>","prefetch-count":1,"reconnect-delay":5,"add-forward-headers":false,"ack-mode":"on-confirm","delete-after":"queue-length"}' |
## | |
# http://stackoverflow.com/questions/19967472/elasticsearch-unassigned-shards-how-to-fix | |
## | |
NODE="YOUR NODE NAME" | |
IFS=$'\n' | |
for line in $(curl -s 'localhost:9200/_cat/shards' | fgrep UNASSIGNED); do | |
INDEX=$(echo $line | (awk '{print $1}')) | |
SHARD=$(echo $line | (awk '{print $2}')) | |
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ |
If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example: | |
- Use create in the index API (assuming you can). | |
- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval). | |
- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap. | |
- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000). | |
- Increase the memory allocated to elasticsearch node. By default its 1g. | |
- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine. | |
- Increase the number of machines you have so |
// installed Clojure packages: | |
// | |
// * BracketHighlighter | |
// * lispindent | |
// * SublimeREPL | |
// * sublime-paredit | |
{ | |
"word_separators": "/\\()\"',;!@$%^&|+=[]{}`~?", | |
"paredit_enabled": true, |
{:user {:dependencies [[org.clojure/tools.namespace "0.2.3"] | |
[spyscope "0.1.3"] | |
[criterium "0.4.1"]] | |
:injections [(require '(clojure.tools.namespace repl find)) | |
; try/catch to workaround an issue where `lein repl` outside a project dir | |
; will not load reader literal definitions correctly: | |
(try (require 'spyscope.core) | |
(catch RuntimeException e))] | |
:plugins [[lein-pprint "1.1.1"] | |
[lein-beanstalk "0.2.6"] |
import static org.apache.commons.lang.StringUtils.isBlank; | |
import javax.servlet.http.HttpServletRequest; | |
import javax.servlet.http.HttpServletResponse; | |
import org.apache.log4j.Logger; | |
import net.tanesha.recaptcha.ReCaptchaImpl; | |
import net.tanesha.recaptcha.ReCaptchaResponse; |
#New graph | |
START root=node(0) | |
CREATE | |
(User1 { name:'User1' }), | |
(User2 { name: 'User2' }), | |
(User3 { name: 'User3' }), | |
(Mac { name: 'Mac' }), | |
(Samsung { name: 'Samsung' }), | |
(Brastemp { name: 'Brastemp' }), |
For companies that work with online advertising, more precisely DSPs and their Real-time Bidding platforms is very important to consider collecting information about the behavior and interests of users while they surfing on the internet. Therefore, in this graph gist, I describe a basic approach that can be used to analyze such data, considering a certain period of time and the products visualized by each user. Let’s consider some characters from Breaking Bad surfed on the internet a few days ago and found some products interesting, chemical elements, and they are thinking about buying them. Such information is extremely important when making a bid at auction advertising, knowing the profile and interests of a given user. Then we store these users, products and dates of views so we can extract this information in the future.