javi santana javisantana

Sampling to calculate buckets

I ws thinking on using the new TABLESAMPLE feature included in postgres 9.5 in order to calculate clusters for visualization. Obvioulsy doing sampling adds some error to the final clusters but in the other hand provides (theorically) some time improvemens (avoid full scans with SYSTEM method and reduces CPU usage).

I only tested with this table, it should be tested with different data distributions and datasizes but looks promising for some clustering methods.

timing

jenks

Compile and install 9.6 devel

In OSX you can do with:

./configure --with-python --with-uuid=e2fs --with-libxml
make && sudo make install

	/*
	FILE ARCHIVED ON 3:09:27 feb 10, 2005 AND RETRIEVED FROM THE
	INTERNET ARCHIVE ON 19:57:07 abr 24, 2016.
	JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.

	ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
	SECTION 108(a)(3)).
	*/
	// Copyright 2005 Google
	function ci(vg, Pf, yh) {

	<!DOCTYPE html>
	<html>
	<head>
	<title>Leaflet multilayer example \| CartoDB.js</title>
	<meta name="viewport" content="initial-scale=1.0, user-scalable=no" />
	<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
	<link rel="shortcut icon" href="http://cartodb.com/assets/favicon.ico" />
	<style>
	html, body, #map {
	height: 100%;

	WITH deltas as (
	SELECT
	st_distance(the_geom::geography, lag(the_geom::geography, 1) over(order by timestamp)) as ddist,
	timestamp - lag(timestamp, 1) over(order by timestamp) as dt
	from out_2 offset 1000
	)

	select avg(ddist/dt) as speed from deltas

	CREATE OR REPLACE FUNCTION tile (z integer, x integer, y integer, query text) RETURNS TABLE(id int8, geom geometry)
	AS $$
	DECLARE
	sql TEXT;
	BEGIN
	sql := 'with _conf as (
	select
	CDB_XYZ_resolution(' \|\| z \|\| ') as res,
	1.0/CDB_XYZ_resolution(' \|\| z \|\| ') as invres,
	st_xmin(CDB_XYZ_Extent(' \|\| x \|\| ',' \|\| y \|\| ',' \|\| z \|\|')) as tile_x,

	test_postgis=# select count(*) from sigpac_47_valladolid_parfe;
	count
	--------
	342691

	test_postgis=# select sum(st_memsize(the_geom_webmercator)) from sigpac_47_valladolid_parfe;
	sum
	-----------
	254827296
	(1 row)

	select _postgis_stats('tm_world_borders_s_11'::regclass ,'the_geom');
	-[ RECORD 1 ]--+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
	_postgis_stats \| {"ndims":2,"size":[6,7],"extent":{"min":[-181.8,-90.8679],"max":[181.8,84.4381]},"table_features":246,"sample_features":246,"not_null_features":246,"histogram_features":246,"histogram_cells":42,"cells_covered":246}

	import requests
	import json
	import random
	import urllib

	class MapInstance(object):

	def __init__(self, definition, url, headers=None):
	self.url = url
	self.definition = definition

	vizjson({
	"version": "0.1.0",
	"title": "fake",
	"scrollwheel": true,
	"zoom": 3,
	"bounds": [
	[
	-44.46515101351962,
	-133.06640625
	],