We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 3.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Category were the 30 under 30 person was selected,Forbes Name,Forbes Age,Forbes Country,Role in the Company,Company,Description,LinkedIn Profile,linkedin_name,Exact match between the name ?,Did I had o perform a manual searh to find the linkedin profile ? | |
30 Under 30 - Europe - Social Entrepreneurs,Margherita Pagani,29,Argentina,Founder, Impacton.org,"Pagani aims to create an encyclopedia of blueprints for of purpose-driven projects for impact investing. The Italian-born and Argentina-based entrepreneur believes works with universities, governments and private companies to co-design programs based on proven models.",https://www.linkedin.com/in/magheritapagani,Margherita Pagani - CEO and Founder - Impacton.org ,FALSE,NO | |
30 Under 30 - Europe - Media & Marketing,Mohamed Khairat,25,Australia,Cofounders, Egyptian Streets,"Amin and Khairat founded Egyptian Streets back in 2012, less than two years after the Arab Spring. The digital publication that strives to address challenging issues--such as sexual harassment an |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import glob | |
import os | |
import xml.etree.cElementTree as ET | |
#TODO Interator over a specifc folder, | |
#TODO increase value by 100000 for each new file | |
input_folder = './../../cygnus_output/output/' | |
output_folder = './../../step_11_ready2merge_output/' | |
start_number = 1 |
We can't make this file beautiful and searchable because it's too large.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"user_username","article_url","image_count","post_tags","recommends","reading_time","title","text","link_count" | |
"neuroecology","https://medium.com/@neuroecology/punctuation-in-novels-8f316d542ec4","22","{Writing,Literature,""Data Visualization""}","2670","3.67641509433962","Punctuation in novels","","1" | |
"eklimcz","https://medium.com/truth-labs/designing-data-driven-interfaces-a75d62997631","14","{""Data Visualization"",""Design Thinking"",UX}","2660","7.83867924528302","Designing Data-Driven Interfaces","","2" | |
"quincylarson","https://medium.com/free-code-camp/the-economics-of-working-remotely-28d4173e16e2","5","{Tech,""Life Lessons"",""Data Science"",Travel,Startup}","2068","3.95786163522013","Fitter. Happier. More productive. Working remotely.","Travel the world as a digital nomad. Surf a new beach every morning. Eat a different local cuisine each night. | |
Or just stay home all day in your pajamas. | |
It doesn’t really matter. You can get your work done either way. | |
More than 10% of Americans now work remotely. | |
I’ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
---------------------------SELECT -------------------------------------------- | |
select mps.user_username, -- 1st column | |
mps.article_url, -- 2nd column | |
mps.image_count, -- 3rd column | |
mps.post_tags, -- 4th column | |
mps.recommends, -- 5th column | |
mps.reading_time, -- 6th column | |
mps.title, -- 7th column | |
mpl.link_count, -- 8th column - we get this data from the left join, were we do a subquery | |
'' full_text, -- dummy 9th column |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
echo 'Based on the work of Frederik Ramm https://lists.openstreetmap.org/pipermail/osmosis-dev/2013-October/001613.html' | |
CMDLINE=` | |
echo "--read-xml $1" | |
echo "--sort" | |
shift | |
while [[ $# > 0 ]] | |
do | |
echo "--read-xml $1" | |
echo "--sort" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
select s.*,tag_name,title,mps.user_username,post_tags,article_url from ( | |
SELECT post_id,ts_headline(text, keywords, 'MaxFragments=35,MaxWords=50,MinWords=6') as result | |
-- tweak the setting to reflect what you want. the text column is where i have the text | |
FROM medium_posts_text mptxt, plainto_tsquery('pg_catalog.english','training') as keywords | |
--change bot with the word that you are searching | |
WHERE to_tsvector(text) @@ keywords | |
) s | |
inner join medium_posts_tags mpt on mpt.post_id = s.post_id | |
inner join medium_posts_stats mps on mps.post_id = s.post_id |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
select * from ( | |
select | |
regexp_split_to_table(lower(post_text), '\s+') as word | |
, count(1) as word_count | |
from | |
(select post_text from |
We can't make this file beautiful and searchable because it's too large.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"user_username","article_url","image_count","post_tags","recommends","reading_time","title","link_count" | |
"ageitgey","https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471","12","{""Machine Learning""}","4132","14.0216981132075","Machine Learning is Fun!","12" | |
"tonyaub","https://medium.com/swlh/no-ui-is-the-new-ui-ab3f7ecec6b3","12","{Design,""Artificial Intelligence"",UI}","3666","7.7877358490566","No UI is the New UI","9" | |
"cdixon","https://medium.com/@cdixon/eleven-reasons-to-be-excited-about-the-future-of-technology-ef5f9b939cb2","32","{Technology,""Artificial Intelligence"",Future,Robotics,Space}","3658","11.1047169811321","Eleven Reasons To Be Excited About The Future of Technology","17" | |
"2noame","https://medium.com/basic-income/deep-learning-is-going-to-teach-us-all-the-lesson-of-our-lives-jobs-are-for-machines-7c6442e37a49","5","{""Artificial Intelligence"",""Machine Learning"",""Basic Income""}","3101","13.7276729559748","Deep Learning Is Going to Teach Us All the Lesson of Our Lives: Jobs |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
User Username | Recommends | |
---|---|---|
stevenlevy | 12,636 | |
ageitgey | 9,605 | |
cdixon | 4,519 | |
perborgen | 4,215 | |
tonyaub | 3,666 | |
olivercameron | 3,552 | |
2noame | 3,151 | |
GilFewster | 2,608 | |
intercom | 2,270 |