Skip to content

Instantly share code, notes, and snippets.

@dbuscombe-usgs
Created July 26, 2013 02:20
Show Gist options
  • Save dbuscombe-usgs/6085561 to your computer and use it in GitHub Desktop.
Save dbuscombe-usgs/6085561 to your computer and use it in GitHub Desktop.
Craigslist part 1, use matlab to search the website and retrieve data
s = urlread('http://flagstaff.craigslist.org/search/apa?query=&srchType=T&minAsk=500&maxAsk=1250&bedrooms=2&addThree=wooof&hasPic=1');
% find the hyperlinks
ind=regexp(s,'href');
useind=[];
for k=1:length(ind)
if ~isempty(regexp(s(ind(k):ind(k)+100),'apa', 'once'))
useind=[useind;k];
end
end
use=[];
for k=1:length(useind)
if ~isempty(regexp(s(ind(useind(k)):ind(useind(k))+200),'\$', 'once'))
use=[use;ind(useind(k))];
end
end
prices=cell(1,length(use));
for k=1:length(use)
prices{k}=str2double(s(use(k)+regexp(s(use(k):use(k)+200),'\$'):...
use(k)+regexp(s(use(k):use(k)+200),'\$')+3));
end
prices=cell2mat(prices);
urls=cell(1,length(use));
for k=1:length(use)
urls{k}=s(use(k)+regexp(s(use(k):use(k)+200),'http')-1:...
use(k)+regexp(s(use(k):use(k)+200),'.html')+3);
end
urls=char(urls');
[a,b]=sort(prices);
urls_sortprice=urls(b,:);
filter_out_community=[];
for k=1:length(use)
tmp=urlread(urls_sortprice(k,:));
if ~isempty(regexpi(tmp,'community '))
filter_out_community=[filter_out_community;k];
end
end
urls_sortprice_no_community=urls_sortprice;
urls_sortprice_no_community(filter_out_community,:)=[];
prices_no_community=a;
prices_no_community(filter_out_community)=[];
maps=cell(1,size(urls_sortprice_no_community,1));
for k=1:size(urls_sortprice_no_community,1)
tmp=urlread(urls_sortprice_no_community(k,:));
maps{k}=tmp(regexp(tmp,'http://maps.google'):regexp(tmp,'">google map')-1);
end
fid=fopen('myFile.csv','wt');
for i=1:length(maps)
fprintf(fid,'%s\n',[urls_sortprice_no_community(i,:),', ',maps{i},', ',...
num2str(prices_no_community(i))]);
end
fclose(fid);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment