This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# a basic crawler in bash | |
# https://github.com/jashmenn/bashpider | |
# usage: crawl.sh urlfile.txt <numprocs> | |
URLS_FILE=$1 | |
BANDWIDTH=2300 | |
CRAWLERS=$2 | |
mkdir -p data/pages |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Atualmente sou programador. PHP+Python+JS além de HTML e CSS. | |
Atuar como sysadmin de sistemas LAMP (Debian) também é uma experiência que faz parte do meu dia-a-dia. | |
Trabalhei em diferente áreas relacionadas a comunicação social. | |
Já criei mais de 100 peças gráficas para diferentes clientes, diagramei informativos e outras publicações. | |
Coloquei mais de 10 empresas e profissionais online através de sites, a maioria dos quais ainda está no ar. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Explicando melhor, a carta deve conter as principais atividades realizadas pelo trabalhador. Se não tiver mais nada escrito, só a relação das atividades, já está meio que implícito que o trabalhador não fez diferença na empresa, o que não será bem visto pelo próximo empregador. Agora, se além da lista das atividades o chefe escreve algum elogio em poucas palavras (por exemplo: fez os custos diminuírem, aumentou as vendas em x%...), aí sim o contratado será visto de uma maneira diferente pelo futuro empregador. E se ele for demitido? Também receberá a carta de recomendação e pode ser que conste o motivo da demissão, mas não é obrigatório (até para a empresa não se expor). Então, tudo dependerá da "emoção" que o chefe passará com a carta. Encontrei um site em alemão que ajuda a interpretar as frases escritas pelo chefe e conto para vocês: | |
Será muito bom quando estiver escrito algo como: | |
• "Nós ficamos em todos os sentidos extremamente satisfeitos com o seu trabalho".(A ideia era colocar só um exemplo de cada n |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from sklearn.datasets import load_svmlight_file | |
from sklearn.naive_bayes import MultinomialNB | |
from sklearn.svm.sparse import LinearSVC | |
from sklearn.cross_validation import StratifiedKFold | |
from sklearn import metrics | |
import numpy as np | |
X, y = load_svmlight_file("fr.vec") | |
y[y == -1] = 0 | |
kf = StratifiedKFold(y, k = 10, indices=True) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Many times when crawling we run into problems where content that is rendered on the page is generated with Javascript and therefore scrapy is unable to crawl for it (eg. ajax requests, jQuery craziness). However, if you use Scrapy along with the web testing framework Selenium then we are able to crawl anything displayed in a normal web browser. | |
# | |
# Some things to note: | |
# You must have the Python version of Selenium RC installed for this to work, and you must have set up Selenium properly. Also this is just a template crawler. You could get much crazier and more advanced with things but I just wanted to show the basic idea. As the code stands now you will be doing two requests for any given url. One request is made by Scrapy and the other is made by Selenium. I am sure there are ways around this so that you could possibly just make Selenium do the one and only request but I did not bother to implement that and by doing two requests you get to crawl the page with Scrapy too. | |
# | |
# This is quite powerful |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import Cheetah.Filters | |
class UnicodeHarder(Cheetah.Filters.Filter): | |
def filter(self, val, | |
encoding='utf8', | |
str=str, | |
**kw): | |
""" Try our best to unicode our strings """ | |
if not val: | |
return u'' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# recursively replace text in all files with a certain file ending | |
find . -type f -iname '*.html' -exec sed -i 's,href="../css/stylesheet.css",href="../../css/stylesheet.css",g' {} + | |
# download Springer Link Books via University Proxy and add the ".pdf" file ending | |
export http_proxy="http://proxy.zfn.uni-bremen.de:3128"; | |
wget -r -l 1 --reject html,js,css,jpg,png --proxy-user STUD_IP_USERNAME --proxy-passwd STUD_IP_PASSWORD LINK_TO_BOOK; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class Classificador (object): | |
def __init__(self): | |
tsents = mac_morpho.tagged_sents() | |
tsents = [[(w.lower(),t) for (w,t) in sent] for sent in tsents if sent] | |
tagger0 = nltk.DefaultTagger('N') | |
tagger1 = nltk.UnigramTagger(tsents[100:], backoff=tagger0) | |
self.tagger = nltk.BigramTagger(tsents[100:], backoff=tagger1) | |
#classifica as palavras do texto |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
//*************************English Description***************************// | |
// Class to convert Latitude/Longitude Coordinates // | |
// Developed by: Diêgo Garrido de Almeida ([email protected]) // | |
// Location: Conselheiro Lafaiete - Minas Gerais / Brazil // | |
// License: None, this class can be used without credits // | |
// Recommended use: To convert the Google Earth standard coordinates // | |
// to Google Maps API standard coordinates, to do this, // | |
// use the method GeoConversao::DMS2Dd. // | |
// eg: $GeoConversao->DMS2Dd('45º22\'38"') -> 45.3772 // |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
administracao geral e configuracao de todas as utilidades da ferramentas | |
utilizado primariamente para definir as instruçoes de coleta de dados e separar os documentos coletados em conjuntos e subconjuntos para funcionar como base para definir um ponto de partida para cada tipo de processamento | |
*/ | |
# projeto é a maior unidade possivel e indica o objetivo da coleta e processamento de documentos | |
CREATE TABLE projects ( |