Skip to content

Instantly share code, notes, and snippets.

Sampling to calculate buckets

I ws thinking on using the new TABLESAMPLE feature included in postgres 9.5 in order to calculate clusters for visualization. Obvioulsy doing sampling adds some error to the final clusters but in the other hand provides (theorically) some time improvemens (avoid full scans with SYSTEM method and reduces CPU usage).

I only tested with this table, it should be tested with different data distributions and datasizes but looks promising for some clustering methods.

timing

jenks

@javisantana
javisantana / gmaps.js
Created April 25, 2016 06:50
first version of google maps source code (from archive.org)
/*
FILE ARCHIVED ON 3:09:27 feb 10, 2005 AND RETRIEVED FROM THE
INTERNET ARCHIVE ON 19:57:07 abr 24, 2016.
JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.
ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
SECTION 108(a)(3)).
*/
// Copyright 2005 Google
function ci(vg, Pf, yh) {
@javisantana
javisantana / cartodb_postgres96.md
Last active December 29, 2015 09:48
cartodb on postgres 9.6devel

Compile and install 9.6 devel

In OSX you can do with:

./configure --with-python --with-uuid=e2fs --with-libxml
make && sudo make install
<!DOCTYPE html>
<html>
<head>
<title>Leaflet multilayer example | CartoDB.js</title>
<meta name="viewport" content="initial-scale=1.0, user-scalable=no" />
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
<link rel="shortcut icon" href="http://cartodb.com/assets/favicon.ico" />
<style>
html, body, #map {
height: 100%;
@javisantana
javisantana / speed.sql
Created August 12, 2015 11:05
speed from a track table
WITH deltas as (
SELECT
st_distance(the_geom::geography, lag(the_geom::geography, 1) over(order by timestamp)) as ddist,
timestamp - lag(timestamp, 1) over(order by timestamp) as dt
from out_2 offset 1000
)
select avg(ddist/dt) as speed from deltas
CREATE OR REPLACE FUNCTION tile (z integer, x integer, y integer, query text) RETURNS TABLE(id int8, geom geometry)
AS $$
DECLARE
sql TEXT;
BEGIN
sql := 'with _conf as (
select
CDB_XYZ_resolution(' || z || ') as res,
1.0/CDB_XYZ_resolution(' || z || ') as invres,
st_xmin(CDB_XYZ_Extent(' || x || ',' || y || ',' || z ||')) as tile_x,
test_postgis=# select count(*) from sigpac_47_valladolid_parfe;
count
--------
342691
test_postgis=# select sum(st_memsize(the_geom_webmercator)) from sigpac_47_valladolid_parfe;
sum
-----------
254827296
(1 row)
select _postgis_stats('tm_world_borders_s_11'::regclass ,'the_geom');
-[ RECORD 1 ]--+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
_postgis_stats | {"ndims":2,"size":[6,7],"extent":{"min":[-181.8,-90.8679],"max":[181.8,84.4381]},"table_features":246,"sample_features":246,"not_null_features":246,"histogram_features":246,"histogram_cells":42,"cells_covered":246}
import requests
import json
import random
import urllib
class MapInstance(object):
def __init__(self, definition, url, headers=None):
self.url = url
self.definition = definition
vizjson({
"version": "0.1.0",
"title": "fake",
"scrollwheel": true,
"zoom": 3,
"bounds": [
[
-44.46515101351962,
-133.06640625
],