Skip to content

Instantly share code, notes, and snippets.

@monken
Created March 11, 2011 18:27
Show Gist options
  • Save monken/866323 to your computer and use it in GitHub Desktop.
Save monken/866323 to your computer and use it in GitHub Desktop.
curl -XDELETE http://localhost:9200/test/;
curl -XPUT http://localhost:9200/test/;
curl -XPUT http://localhost:9200/test/foo/_mapping -d '
{
"tweet" : {
"properties" : {
"message" : {"type" : "string", "store" : "yes"}
}
}
}
';
curl -XPUT http://localhost:9200/test/foo/1?refresh=true -d '
{"message":"NAME\n ODF::lpOD::Table - Table management\n\nDESCRIPTION\n The present manual page introduces the way lpOD allows the user to handle\n ODF *tables* and their components, namely the *columns*, *rows* and\n *cells*.\n\n The lpOD API doesn''t make differences between document types in this area.\n So, tables are dealed with in the same way for a spreadsheet document\n (whose content is just a set of tables) as for any other document.\n\n A table is an instance of the lpOD \"odf_table\" class.\n\n An \"odf_table\" object is a structured container that holds two sets of\n objects, a set of *rows* and a set of *columns*, and that is optionally\n associated with a *table style*.\n\n The basic information unit in a table is the *cell*. Every cell is\n contained in a row. Table columns don''t contain cells; an ODF column holds\n information related to the layout of a particular column at the display\n time, not content data.\n\n A cell can directly contain one or more paragraphs. However, a cell may be\n used as a container for high level containers, including lists, tables,\n sections and frames.\n\n Every table is identified by a name (which must be unique for the\n document) and may own some optional properties.\n\nTable creation and retrieval\n Like any other \"odf_element\" table may be created either from scratch\n according to various parameters or by cloning an existing table using the\n generic \"clone\" method of \"odf_element\". The second way is the most\n recommended one because, while it looks very easy to create a table with a\n default appearance, a typical convenient layout may require a lot of style\n definitions and is much more difficult to specify by program than through\n a point-and-click interface.\n\n A table is created using \"odf_create_table\" with a mandatory name as its\n first argument and the following optional parameters:\n\n \"width\", \"length\": the initial size of the new table (rows and\n columns), knowing that it''s zero-sized by default (beware: because\n cells are contained in rows, no cell is created as long as \"width\" is\n less than 1);\n\n\n\n \"size\": specifies a length and a width (in this order) as a single\n string (the two values are comma-separated); may replace \"length\" and\n \"width\";\n\n\n\n \"style\": the name of a table style, already existing or to be defined;\n\n\n\n \"cell style\": the style to use by default for every cell in the table;\n\n\n\n \"protected\": a boolean that, if \"TRUE\", means that the table should be\n write-protected when the document is edited through a user-oriented,\n interactive application (of course, such a protection doesn''t prevent\n an lpOD-based tool from modifying the table)(default is \"FALSE\");\n\n\n\n \"protection key\": a (supposedly encrypted) string that represents a\n password; if this parameter is set and if \"protected\" is \"TRUE\", a\n end-user interactive application should ask for a password that\n matches this string before removing the write-protection (beware, such\n a protection is *not* a security feature);\n\n\n\n \"print\": boolean, tells that the table should be printable; default is\n \"TRUE\";\n\n\n\n \"print ranges\": the cell ranges to be printed, if some areas are not\n to be printed; the value of this parameter is a space-separated list\n of cell ranges expressed in spreadsheet-style format (ex: \"E6:K12\").\n\n Once created, a table may be incorporated somewhere using \"insert_element\"\n of \"append_element\", like any other \"odf_element\".\n\n *Caution: a table should not be inserted in any context. For example, a\n table should not be inserted within a paragraph. A bad placement may\n corrupt the document structure. Right contexts are, for example, the\n document body (in a spreadsheet or text document), a section (in a text\n document) or a table cell (knowing that the ODF standard allows nested\n tables).*\n\n The style of a table may be retrieved or changed at any time using the\n generic \"get_style()\" and \"set_style()\" accessors.\n\n A table may be retrieved in a document according to its unique name using\n the context-based \"get_table_by_name\" with the name as argument. It may be\n selected by its sequential position in the list of the tables belonging to\n the context, using \"get_table_by_position\", with a zero-based numeric\n argument (possibly counted back from the end if the argument is negative).\n A \"get_table()\" method is provided, that works like\n \"get_table_by_position()\" if the argument is numeric or like\n \"get_table_by_name()\" otherwise (of course, if the name of the desired\n table looks like a number, there is no choice but \"get_table_by_name()\" to\n retrieve it by name). Without argument, \"get_table()\" returns the first\n table in the context (if any). In addition, it''s possible to retrieve a\n table according to its content, through \"get_table_by_content\"; this\n method returns the first table (in the order of the document) whose text\n content matches the given argument, which is regarded as a regular\n expression.\n\n In addition, an application can get all the tables of a given context\n using the \"get_tables()\" method, without argument.\n\n An application may retrieve a table from any element that belong to it,\n thanks to the \"get_parent_table()\" method. This method returns \"undef\" if\n the calling element is not in a table. Knowing that a paragraph may be\n included in a table cell and that a table cell indirectly belongs to a\n table, the following sequence selects a paragraph matching a given\n expression and, if the paragraph belongs to a table, displays the table\n name:\n\n $p = $context->get_paragraph(content => \"xyz\");\n die \"Content not found\\n\" unless $p;\n $t = $p->get_parent_table;\n say $t ? $t->get_name : \"Not in a table\";\n\nTable content retrieval\n A table object provides methods that allow to retrieve any column, row or\n cell using its logical position. A position may be expressed using either\n zero-based numeric coordinates, or alphanumeric, spreadsheet-like\n coordinates. For example the top left cell should be addressed either by\n \"(0,0)\" or by \"A1\". On the other hand, numeric coordinates only allow the\n user to address an object relatively to the end of the table; for example,\n \"(-1,-1)\" designates the last cell of the last row whatever the table\n size.\n\n Table object selection methods return a null value, without error, when\n the given address is out of range.\n\n The number of rows and columns may be got using the \"odf_table\" \"get_size\"\n method.\n\n An individual cell is selected using \"get_cell\" with either a pair of\n numeric arguments corresponding to the row then the column, or an\n alphanumeric argument whose first character is a letter. The second\n argument, if provided, is ignored as soon as the first one begins with a\n letter.\n\n The two following instructions are equivalent and return the second cell\n of the second row in a table (assuming that $t is a previously selected\n table):\n\n $cell = $t->get_cell(''B2'');\n $cell = $t->get_cell(1, 1);\n\n \"get_row()\" allows the user to select a table row as an ODF element. This\n method requires a zero-based numeric value.\n\n \"get_column()\" works according to the same logic and returns a table\n column ODF element.\n\n The full set of row and column objects may be selected using the\n table-based \"get_rows()\" and \"get_columns()\" methods. By default these\n methods return respectively the full list of rows or columns. They can be\n restricted to a specified range of rows or columns. The restriction may be\n expressed through two numeric, zero-based arguments indicating the\n positions of the first and the last item of the range. Alternatively, the\n range may be specified using a more \"spreadsheet-like\" syntax, in only one\n alphanumeric argument representing the visible representation of the range\n through a GUI; this argument is the concatenation of the visible numbers\n of the starting and ending elements, separated by a \":\", knowing that \"1\"\n is the visible number of the row zero while \"A\" is the visible number or\n the column zero. As a consequence, the two following instructions are\n equivalent and return a list including the rows from 5 to 10 belonging to\n the table *t*:\n\n @rows = $t->get_rows(5, 10);\n @rows = $t->get_rows(''6:11'');\n\n According to the same logic, each of the two instruction below returns the\n columns from 8 to 15:\n\n @cols = $t->get_columns(8, 15);\n @cols = $t->get_columns(''I:P'');\n\n Once selected, knowing that cells are contained in rows, a row-based\n \"get_cell()\" method is provided. When called from a row object,\n \"get_cell()\" requires the same parameter as the table-based \"get_column()\"\n method. For example, the following sequence returns the same cell as in\n the previous example:\n\n $r = $t->get_row(1);\n $c = $r->get_cell(1);\n\n A column-based \"get_cell()\" method is provided, too, but it''s much less\n efficient. In addition, the column-based \"get_cell()\" may fail with a\n warning when used in *read optimize* mode (see below).\n\n A row set may be selected according to the content of a specified column,\n thanks to \"get_rows_by_index()\". The following example selects all the\n rows (if any) where the ''C'' cell (i.e. the cell at the 3rd position)\n contains \"XYZ\":\n\n @rows = $table->get_rows_by_index(C => \"XYZ\");\n\n Note that this method allows an alternative syntax; the column may be\n specified by its numeric (zero-based) position:\n\n @rows = $table->get_rows_by_index(2, \"XYZ\");\n\n The first argument (or the key in hash notation) specifies the \"index\"\n (i.e. the column that must match a given condition) while the second\n argument is the search value. The result set is selected according to a\n smart match.\n\n Alternatively, \"get_row_by_index()\" returns the first matching row, like\n \"get_rows_by_index()\" in scalar context.\n\n Remember that there is no real index in a spreadsheet table; this method\n mimics the use of an arbitrary column as the \"key\" to select a data set,\n but the underlying mechanism is not a database engine; the rows are\n scanned sequentially, so take care of possible performance issues with\n large tables.\n\nCell range selection\n \"get_cells\" extracts rectangular ranges of cells in order to allow the\n applications to store and process them out of the document tree, through\n regular 2D tables. The range selection is defined by the coordinates of\n the top left and the bottom right cells of the target area. \"get_cells\"\n allows two possible syntaxes, i.e. the spreadsheet-like one and the\n numeric one. The first one requires an alphanumeric argument whose first\n character is a letter and which includes a '':'', while the second one\n requires four numeric arguments. As an example, the two following\n instructions, which are equivalent, return a bi-dimensional array\n corresponding to the cells of the \"B2:D15\" area of a table:\n\n @cells = $table->get_cells(\"B2:D15\");\n @cells = $table->get_cells(1,1,14,3);\n\n Note that, after such a selection, $cells[0][0] contains the \"B2\" cell of\n the ODF table.\n\n If \"get_cells\" is called without argument, the selection covers the whole\n table.\n\n A row object has its own \"get_cell()\" method. The row based version of\n \"get_cells()\" returns, of course, a one-row table of cell objects. When\n used without argument, it selects all the cells of the row. It may be\n called with either a pair of numeric arguments that represent the start\n and the end positions of the cell range, or an alphanumeric argument\n (whose the numeric content is ignored and should be omitted) corresponding\n to the start and end columns in conventional spreadsheet notation. The\n following example shows two ways to select the same cell range (beginning\n at the 2nd position and ending at the 26th one) in a previously selected\n row:\n\n @cells = $r->get_cells(''B:Z'');\n @cells = $r->get_cells(1, 25);\n\n The elements of the Perl table returned by \"get_cells\" are references to\n the cells of the ODF table (not copies); the Perl table just maps an ODF\n table area, and any cell property change made through this Perl table\n affects the underlying ODF cell.\n\n A column-based version of \"get_cells()\" is available, too, but it should\n be avoided with large tables, and it may explicitly fail in \"read\n optimize\" mode.\n\nRow and column customization\n The objects returned by \"get_row\" and \"get_column\" can be customized using\n the standard \"set_attribute\" or \"set_attributes\" method. Possible\n attributes are:\n\n * \"default cell style name\": the default style which apply to each cell\n in the column or row unless this cell has no defined style attribute;\n\n * \"visibility\": specifies the visibility of the row or column; legal\n values are ''visible'', ''collapse'' and ''filter''.\n\n The style may be get or set using \"get_style\" or \"set_style\".\n\nTable expansion and shrinking\n Row and column insertion\n A table may be expanded vertically and horizontally, using its \"add_row\"\n and \"add_column\" methods.\n\n \"add_row\" allows the user to insert one or more rows at a given position\n in the table. The new rows are copies of an existing one. Without\n argument, a single row is just appended as the end. A \"number\" named\n parameter specifies the number of rows to insert.\n\n An optional \"before\" named parameter may be provided; if defined, the\n value of this parameter must be a row number (in numeric, zero-based form)\n in the range of the table; the new rows are created as clones of the row\n existing at the given position then inserted at this position, i.e.\n *before* the original reference row. A \"after\" parameter may be provided\n instead of \"before\"; it produces a similar result, but the new rows are\n inserted *after* the reference row. Note that the two following\n instructions produce the same result (assuming $t is a previously selected\n or created table):\n\n $t->add_row(number => 1, after => -1);\n $t->add_row();\n\n The instruction below creates new rows at the beginning of the table:\n\n $t->add_row(number => 4, before => 0);\n\n The inserted rows are initialized as clones of the row used as the\n reference through the \"after\" or \"before\" or of the last existing row if\n the new row in appended at the end. So the new rows (and their cells)\n inherit the same style and content as an existing one.\n\n However, a few options allow the applications to override this default\n behavior:\n\n * \"empty\", if set to \"TRUE\", specifies that the new cells will be\n created without content and without data type;\n\n * \"style\" allows to specify a particular style for the new row; if this\n parameter is provided but set to \"undef\", the new rows are created\n without style (i.e. they take neither the style of the cloned row nor\n any other style);\n\n * \"cell style\" allows to specify a particular style for every cell in\n the new rows; if this parameter is provided but set to \"undef\", the\n cells of the new rows are created without style.\n\n The \"add_column\" method does the same thing with columns as \"add_row\" for\n rows, and allows the same options. However, because the cells belong to\n rows, it works according to a very different logic. \"add_column\" inserts\n new column objects (clones of an existing column), then it goes through\n all the rows and inserts new cells (cloning the cell located at the\n reference position) in each one.\n\n Of course, it''s possible to use \"insert_element\" in order to insert a row,\n a column or a cell externally created (or copied from an other table from\n another document), provided that the user carefully checks the consistency\n of the resulting construct. As an example, the following sequence appends\n a copy of the first row of $t1 after the 5th row of $t2:\n\n $to_be_inserted = $t1->get_row(0)->clone;\n $t2->insert_element($to_be_inserted, after => $t2->get_row(5));\n\n While a table may be expanded vertically using \"add_row\", each row may be\n expanded using the \"odf_row\" \"add_cell\" method whose parameters and\n behavior are the same as the table-based \"add_row\" method.\n\n Row and column deletion\n Rows and columns may be individually deleted using \"delete_row()\" and\n \"delete_column()\", respectively. The required argument for these methods\n is the row or column position in the table, i.e. the same as \"get_row()\"\n or \"get_column()\".\n\n The common \"delete()\" method may be used from a previously selected row or\n column object. So, the two snippets below are equivalent:\n\n # with delete_row\n $table->delete_row($row_number);\n \n # without delete_row\n $row = $table->get_row($row_number);\n $row->delete;\n\n Knowing that table cells are contained in row, removing a row\n automatically removes the corresponding cells. The internal logic of\n \"delete_column()\", that removes the cells of the deleted column, behaves\n as if the cells were contained in the columns, too. However, it''s possible\n to delete a column without deleting the corresponding cells. To do so, a\n \"propagate\" option must be provided and set to \"FALSE\". Such option may\n put the table in an inconsistent state, so it should be used for very\n special purposes only (such as cleaning an inconsistent table).\n\n The \"delete()\" method should not be confused with the \"clear()\" method\n that, when called from a row or column object, removes the content of\n every cell in the row or column but doesn''t remove any cell, row or\n column.\n\nRow and column group handling\n The content expansion and content selection methods above work with the\n table body. However it''s possible to manage groups of rows or columns. A\n group may be created with existing adjacent rows or columns, using\n \"set_row_group()\" and \"set_column_group()\" respectively. These methods\n take two arguments, which are the numeric positions of the starting and\n ending elements of the group. However, these numeric arguments may be\n replaced by a single alphanumeric range definition argument, so the\n following instructions are equivalent; both create a group including the\n same 3 columns (\"C\" to \"E\"):\n\n $column_group = $table->set_column_group(3, 5);\n $column_group = $table->set_column_group(\"C:E\");\n\n The same idea apply to row groups; however, beware that in range\n alphanumeric notation, the numbers represents the spreadsheet end-user\n point of view, so they are one-based; as an example, the two following\n instructions, that create a row group including the rows 3 to 5, are\n equivalent:\n\n $row_group = $table->set_row_group(3, 5);\n $row_group = $table->set_row_group(\"4:6\");\n\n In addition, an optional \"display\" named boolean parameter may be provided\n (default=\"TRUE\"), instructing the applications about the visibility of the\n group.\n\n Both \"set_row_group()\" and \"set_column_group()\" return an object which can\n be used later as a context object for any row, column or cell retrieval or\n processing. An existing group may be retrieved according to its numeric\n position using \"get_row_group()\" or \"get_column_group()\" with the position\n as argument, or without argument to get the first (or the only one) group.\n\n A group can''t bring a particular style; it''s just visible or not. Once\n created, its visibility may be turned on and off by changing its \"display\"\n value through \"set_attribute()\".\n\n Knowing that cells depends on rows, a row group provides the same\n \"get_cell()\" method as a table. It provides a \"get_row()\" method, while a\n column group provides a \"get_column()\" one.\n\n A row group provides a \"add_row()\" method, while a column group provides a\n \"add_column()\" method. These methods work like their table-based versions,\n and they allow the user to expand the content of a particular group.\n\n Row and column group may be collapsed or expanded using their \"collapse()\"\n and \"uncollapse()\" methods.\n\n It''s possible to delete all the cell contents of a group using \"clear()\".\n This method doesn''t remove any row or column; it just erases the content\n and, if any, the style and the annotation of every cell. Beware that the\n column group based version of \"clear()\" is much slower than the row group\n based version.\n\nTable headers\n One or more rows or columns in the beginning of a table may be organized\n as a *header*. Row and columns headers are created using the\n \"set_row_header()\" and \"set_column_header()\" table-based methods, and\n retrieved using \"get_row_header()\" and \"get_column_header()\". A row header\n object brings its own \"add_row()\" method, which works like the table-based\n \"add_row()\" but appends the new rows in the space of the row header. The\n same logic applies to column headers which have a \"add_column()\" method.\n An optional positive integer argument may specify the number or rows or\n columns to include in the header (default=1).\n\n Note that a *column header* is a *row* or a set of *rows* containing\n column titles that should be automatically repeated on every page if the\n table does not fit on a single page, while a *row headers* is a *column*\n or a set of *columns* containing *row titles*. In the present version,\n *row headers* are not fully supported.\n\n A table can''t directly contain more than one row header and one column\n header. However, a column group can contain a column header, while a row\n group can contain a row header. So the header-focused methods above work\n with groups as well as with tables.\n\n A table header doesn''t bring particular properties; it''s just a construct\n allowing the author to designate rows and columns that should be\n automatically repeated on every page if the table doesn''t fit on a single\n page.\n\n The ``get_xxx()`` table-based retrieval methods ignore the content of the\n headers. However, it''s always possible to select a header, then to used it\n as the context object to select an object using its coordinates inside the\n header. For example, the first instruction below gets the first cell of a\n table body, while the third and third instructions select the first cell\n of a table header::\n\n c1 = table.get_cell(0,0)\n header = table.get_header()\n c2 = header.get_cell(0,0)\n\nIndividual cell property handling\n A cell owns both a *content* and some *properties* which may be processed\n separately.\n\n The cell content is a list of one or more ODF elements. While this content\n is generally made of a single paragraph, it may contain several paragraphs\n and various other objects. The user can attach any content element to a\n cell using the standard \"insert_element\" method. However, for the simplest\n (and the most usual) cases, it''s possible to use \"set_text\". The\n cell-based \"set_text\" method diffs from the generic \"odf_element\"\n \"set_text\": it removes the previous content elements, if any, then creates\n a single paragraph with the given text as the new content. In addition,\n this method accepts an optional \"style\" named parameter, allowing the user\n to set a paragraph style for the new content. To insert more content (i.e.\n additional paragraphs and/or other ODF elements), the needed objects have\n to be created externally and attached to the cell using \"insert_element\"\n or \"append_element\". Alternatively, it''s possible to remove the existing\n content (if any) and attach a full set of content elements in a single\n instruction using \"set_content\"; this last cell method takes a list of\n arbitrary ODF elements and appends them (in the given order) as the new\n content.\n\n The generic \"group()\" method may be used to grab a list of paragraphs in\n order to move them in the cell. As an example, the following instruction\n moves all the paragraphs containing a given substring in a given cell:\n\n $table->get_cell(\"B4\")->group(\n $doc->get_body->get_paragraphs(content => \"XYZ\")\n );\n\n The \"get_content\" cell method returns all the content elements as a list.\n For the simplest cases, the cell-based \"get_text\" method directly returns\n the text content as a flat string, without any structural information and\n whatever the number and the type of the content elements.\n\n The cell properties may be read or changes using \"get_xxx\" and \"set_xxx\"\n methods, where \"xxx\" stands for one of the following:\n\n * \"style\": the name of the cell style;\n\n * \"type\": the cell value type, which may be one of the ODF supported\n data types, used when the cell have to contain a computable value (may\n be omitted with text cells, knowing that the default type is\n ''string'');\n\n * \"value\": the numeric computable value of the cell, used when the\n \"type\" is defined (for a string cell, \"get_value\" and \"set_value\" are\n equivalents of \"get_text\" and \"set_text\");\n\n * \"currency\": the international standard currency unit identifier (ex:\n EUR, USD), used when the \"type\" is ''currency'';\n\n * \"formula\": a calculation formula whose result is a computable value\n (the grammar and syntax of the formula is application-specific and not\n checked by the lpOD API (it''s stored as flat text and not\n interpreted);\n\n * \"protect\": boolean (default \"FALSE\"), tells the applications that the\n cell can''t be edited.\n\n If \"set_currency\" is used with a non-null value, then the \"type\" of the\n cell is automatically set to ''currency''. If \"set_type\" forces a type that\n is not ''currency'', then the cell currency is unset.\n\n A cell may be annotated using \"set_annotation()\". The cell-based version\n of this method works like the paragraph-based version, described in\n ODF::lpOD::TextElement, but the positioning options are ignored. A cell\n annotation is not linked to a text position and may be attached to an\n empty cell. A \"display\" boolean option (whose default is \"FALSE\") may be\n provided in order to make the annotation automatically visible in the\n sheet.\n\n It''s possible to remove all the content and the properties of a cell but\n its style, including any possible formula, annotation, and so on, with the\n \"clear()\" method. In addition, \"clear()\" removes any multi-row or multi-\n column span.\n\n Note that it''s possible to clear the content of all the cells of a row, a\n column, a row group, a column group, or a table, with the respective\n \"clear()\" methods of these objects. These methods don''t remove the cells\n themselves. However, remember that the column and column group based\n versions of \"clear()\" are very slow.\n\n The cell coordinates may be retrieved using \"get_position()\". In scalar\n context, this method returns the local position in the row. In array\n context, it returns the table name, the row number and the column number.\n In addition, \"get_parent_table()\" returns the table object itself, while\n \"get_parent_row()\" returns the including row.\n\nSpecial cell value extractors\n A few access methods are available to directly get the value(s) of one ore\n more specified cells, without explicit access to the cell objects. These\n accessors are not syntactic sugar only; they may allow better performances\n in some situations.\n\n Individual cell value extraction\n An application may directly get the value of a specified cell without\n previous selection of the cell object. As an example, the two following\n instructions produce the same result:\n\n $value = $table->get_cell($row, $column)->get_value;\n $value = $table->get_cell_value($row, $column);\n\n Data set extraction or aggregate computation\n Alongside the \"get_cells()\" method, a \"get_cell_values()\" method allows\n the user to get either value lists or basic value aggregates. This method\n requires a regular cell data type as its first argument, followed by a\n cell range specification according to the same logic as \"get_cells()\". The\n cells whose data type is not the given type are ignored. As an example,\n the following example creates a value list whose content comes from all\n the \"currency\" cells of the \"E2:G10\" range:\n\n @values = $table->get_cell_values(''currency'', ''E2:G10'');\n\n The allowed types are \"string\", \"date\", \"time\", \"float\", \"currency\", and\n \"boolean\". However, a special \"all\" indicator may be used as first\n argument instead of a regular data type; if so, all the non-empty cells\n are selected.\n\n In the resulting 2D list, \"undef\" values occupy the places of non-matching\n or empty cells, in order to provide a consistent mapping of the\n corresponding table area.\n\n \"get_cell_values()\" may be used as a *row* or *column* method. The most\n efficient one is the row-based version. Both return a one-dimension list,\n without null value (the non-matching and empty cells are ignored). So the\n instruction below produces a list of all the \"currency\" amounts found\n between (and including) the 3rd and the 8th cells of the 4th row of a\n table:\n\n $row = $table->get_row(3);\n @amounts = $row->get_cell_values(''currency'', ''C:H'');\n\n \"get_cell_values()\", when used in scalar context, returns a small array\n ref whose item 0 is the number of non-empty cells matching the given type\n in the range, and whose the following items depend on the data type. The\n two following positions are the min and the max values for every type but\n \"boolean\"; for booleans, they respectively contain the number of true\n values and the number of false values. For the \"string\" type, the min and\n the max are selected by default according to the standard Perl \"cmp\"\n string comparison function (that is not always convenient for\n international character sets), but the user may provide a custom function\n (whose external behavior must comply with \"cmp\", i.e. whose possible\n results are -1, 0, 1). An additional item, containing the arithmetic sum,\n is provided at the last position for the \"float\", \"currency\" and\n \"percentage\" types only. As an example, the following code displays the\n count, the min, the max and the sum of the \"float\" cells in the \"E2:G10\"\n range:\n\n $r = $table->get_cell_values(''float'', ''E2:G10'');\n say \"I found $r->[0] values\";\n say \"...from $r->[1] to $r->[2]\";\n say \"...and the grand total is $r->[3]\";\n\n Flat text export\n A special \"get_text()\" method is provide with tables or row groups.\n Knowing that a table shouldn''t directly contain text (the text content, if\n any, belong to cells), this method returns the concatenated contents of\n all the cells as a flat string. It''s useful only to allow the applications\n to quickly check if at least one cell contains something, or if a\n particular substring is present somewhere in the table. Note that the\n returned text doesn''t always reflect the visible content of the cells: for\n non-string cells, the exported content is the *value*, not its formatted\n representation.\n\nCell span expansion\n A cell may be expanded in so it covers one or more adjacent columns and/or\n rows. The cell-based \"set_span()\" method allows the user to control this\n expansion. It takes \"rows\" and \"columns\" as parameters, specifying the\n number of rows and the number of columns covered. The following example\n selects the \"B4\" cell then expands it over 4 columns and 3 rows:\n\n $cell = $table->get_cell(''B4'');\n $cell->set_span(rows => 3, columns => 4);\n\n The existing span of a cell may be get using \"get_span()\", which returns\n the \"rows\" and \"columns\" values.\n\n This method changes the previous span of the cell. The default value for\n each parameter is 1, so a \"set_span()\" without argument reduces the cell\n at its minimal span.\n\n When a cell is covered due to the span of another cell, it remains present\n and holds its content and properties. However, it''s possible to know at\n any time if a given cell is covered or not through the boolean\n \"is_covered()\" cell method. In addition, the span values of a covered cell\n are automatically set to 1, and \"set_span()\" is forbidden with covered\n cells.\n\n Note that a cell that spreads over multiple rows and/or columns is reduced\n to the minimal size by \"clear()\".\n\nPerformance issues\n The table-oriented access methods perform relatively well against tables\n including up to thousands, if not tens of thousands of cells. So there is\n no performance issue with tables belonging to text documents. On the other\n hand, spreadsheet documents may contain tables whose size in potentially\n unlimited. As soon as you are faced to wrong response times and overloaded\n CPUs, you may consider using the following workarounds, which can\n (sometimes) improve the performances, possibly at the cost of a reduced\n functionality.\n\n Accessing cells from rows\n Remember that cells belong to rows and rows belong to tables. As a\n consequence, accessing a cell is faster from the row than from the table.\n So, each time you need to get several cells belonging to the same row, you\n should first get the row then use it as the context for subsequent cell\n accesses. As an illustration, each of the two following code snippets\n scans a whole table and loads the text of every cell in a list, but the\n second one is faster:\n\n # table scan, way 1\n my @text = ();\n my ($l, $w) = $table->get_size;\n for (my $i = 0 ; $i < $l ; $i++) {\n for (my $j = 0 ; $j < $w ; $j++) {\n push @text, $table->get_cell($i, $j)->get_text;\n }\n }\n\n # table scan, way 2\n my @text = ();\n my ($l, $w) = $table->get_size;\n for (my $i = 0 ; $i < $l ; $i++) {\n my $row = $table->get_row($i);\n for (my $j = 0 ; $j < $w ; $j++) {\n push @text, $row->get_cell($j)->get_text;\n }\n }\n\n At a higher level but for the same reasons, \"get_cell()\" and \"get_cells()\"\n are slower as column group methods that as table or row group methods. In\n other words, when a cell belongs to the intersection of a row group and a\n column group, it may be accessed faster from the table or the row group\n than from the column group.\n\n Selecting cell values instead of cells\n Each time an application needs to get cells in order to extract their\n values without update, the special \"get_cell_value()\" and\n \"get_cell_values()\" methods should be preferred.\n\n As an example, the two following instructions produce the same result but\n the second one is more efficient in a large table:\n\n $value = $table->get_cell($row, $column)->get_value;\n $value = $table->get_cell_value($row, $column);\n\n Similarly, the two following snippets produce the same result set but the\n second one is more efficient (and not only code-saving) than the first one\n in a large spreadsheet:\n\n # first form\n @values = ();\n push @values, scalar $_->get_value\n for $row->get_cells($start, $end);\n \n # second form\n @values = $row->get_cell_values($start, $end);\n\n Mapping ODF tables with Perl lists\n Thanks to \"get_cells()\", you can easily associate a Perl table to a\n selected area in a document table. As an example, the following\n instruction produces a 2D Perl list that maps the \"B4:Z50\" area in a given\n table:\n\n my @cells = $table->get_cells(\"B4:Z50\");\n\n While \"get_cells()\" is a costly method, it provides an array of\n pre-selected cells. Beware that \"get_cells()\" returns the cells\n themselves, not copies, so, after the instruction above, $cells[0][0] is\n the \"B4\" cell of the ODF table, while $cells[-1][-1] is the \"Z50\" cell,\n and so on. As a consequence, the 2 instructions below are functionally\n equivalent, but the second one in much faster because there is no need to\n look for the cell in the XML data structure:\n\n $text = $table->get_cell(\"C5\")->get_text;\n $text = $cells[1][1]->get_text;\n\n Using such a mapping doesn''t significantly improve the overall\n performances, but it allows the applications to execute the slow job once\n for all, then provide a good interactivity. However, be careful about very\n large areas: using \"get_cells()\" to load hundreds of thousands of cells is\n too slow to be practical. In addition, the mapping is no longer accurate\n as soon as the structure of the underlying table is changed due to\n row/column insertions or deletions. For read-only access, have a look at\n the \"read optimized\" option (introduced below) that could help.\n\n Working area limitation\n The global size of a typical spreadsheet table is by far larger than the\n size of the really used part. As an example, your spreadsheet processor\n may silently store a 65536x1024 table while the last really used cell is,\n say, Z50, so the size of the useful part is 50x26. In such a situation,\n lpOD can''t automatically decide what is the useful size, so it processes\n the full size. The first result is a huge time and resource waste. As soon\n as you know the useful size of a table, you can instruct the \"odf_table\"\n instance to ignore the extra area, thanks to \"set_working_area()\". The\n instruction below tells that, for the current session, the table will be\n processed as if its size was 500x100:\n\n my $table = $doc->get_body->get_table(\"Sheet1\");\n $table->set_working_area(500, 100);\n\n Note that this operational restriction has no effect if the real size is\n smaller than the given size. On the other hand, \"set_working_area()\"\n doesn''t destroy the table content that resides out of the working area, if\n any; it just prevents you from accessing any object beyond your declared\n limits through the official table-oriented methods, namely \"get_row()\",\n \"get_cell()\", and \"get_column()\". However, the \"hidden\" area remains\n available for low-level hacking with basic element handling methods (for\n example, if you issue a \"get_paragraphs()\" from the table object, it will\n look for all the paragraphs belonging to all the real cells of the calling\n table).\n\n The working area restriction doesn''t produce any persistent effect when\n the document is saved.\n\n Note that the \"get_size()\" method itself is affected by\n \"set_working_area()\"; it returns the declared size, unless the real size\n is smaller.\n\n You can change the working area according to your current needs.\n Successive calls of \"set_working_area()\" are allowed, so the working area\n may be enlarged or reduced at will.\n\n The working area restriction may be removed using \"set_working_area()\"\n without argument.\n\n Read-optimization\n As soon as an object is selected using any official table component\n selector (such as \"get_cell()\", \"get_row()\", and so on), lpOD acts by\n default as if this object could be updated or deleted, and as if something\n (a row, a column or a cell) could be inserted before or after it. As a\n consequence, the internal data structure of the spreadsheet may be\n changed, resulting in useless processing if case of read-only access.\n However, lpOD allows the applications to use tables in \"read optimized\"\n mode, so it may avoid any update preparation, allowing better response\n times. To activate this mode, the user must set the \"read optimized\" flag\n to \"TRUE\" using \"read_optimize()\" like that:\n\n my $table = $doc->get_body->get_table(\"Sheet1\");\n $table->read_optimize(TRUE);\n\n Caution: \"read_optimize()\" means that you *assume* that you will not make\n updates; it doesn''t *prevent* you from updating cells, deleting rows, and\n so on. So, be careful: you can corrupt the table and get very strange and\n unpredictable results as soon as you make updates in read optimized mode.\n\n This optimization option is useful for large table area scans,\n particularly with very sparse tables (i.e. tables where significant cells\n are separated by large empty areas). On the other hand, it''s not\n efficient, and at worst may increase the response time, for individual\n access to a cell. In addition, it''s completely useless with small tables\n as well as with *dense* tables (i.e. tables without large empty areas and\n without large sequences of identical objects). So don''t use it without\n testing. In some cases, the read-optimize mode inhibits the column-based\n cell retrieval methods, while it may improve the response times of table-\n and row-based retrieval methods.\n\n Note that you can switch this mode off and restore the default behavior at\n any time. You just have to recall \"read_optimize()\" with \"FALSE\" as\n argument. Like \"set_working_area()\", \"read_optimize()\" doesn''t produce any\n persistent effect.\n\n However, there is a possible trap, illustrated by the next (wrong)\n example:\n\n $table->read_optimize(TRUE);\n $cell = $table->get_cell(\"Z26\");\n $table->read_optimize(FALSE);\n $cell->set_value(1234);\n\n In this sequence, we selected a cell while the table was in \"read\n optimized\" mode, then we canceled this mode and executed an update. The\n result is not predictable (it will be sometimes right, sometimes wrong).\n The general principle is: avoid updating an object selected in read\n optimized mode. However, there is an important exception: a cell that was\n selected in read- optimized mode may be safely updated if (and only if)\n its differs from the two neighbour cells and if it''s belong to a row that\n differs (by at least one cell) from the two neighbour rows. These\n conditions are almost always met with tables whose one of the columns\n contains identifiers and each of the other columns displays data of\n various types and formats.\n\n On the other hand, the read optimize flag is useless with methods that\n return values and not objects (i.e. \"get_cell_value\", \"get_cell_values\").\n\n Compacting empty areas\n Cells, rows, row groups, tables, columns and column groups own a \"clear()\"\n method. When the calling context is a *table*, a *row*, or a *row group*,\n a \"compact\" boolean option, whose default is \"FALSE\", is allowed. If this\n option is set to \"TRUE\", the execution of \"clear()\" is faster and the\n physical storage of the processed cells is compacted.\n\n This option is recommended in spreadsheet documents only.\n\n Caution: the benefits of the \"compact\" option are not effective if the\n cleared area is immediately used as the target of a lot of individual cell\n accesses using \"get_cell()\", knowing that in such case lpOD will have to\n un-compact a lot of cells in the area. As a consequence, this option is\n not recommended when \"clear()\" is used to prepare a massive table update.\n\nAUTHOR/COPYRIGHT\n Developer/Maintainer: Jean-Marie Gouarne\n <http://jean.marie.gouarne.online.fr> Contact: [email protected]\n\n Copyright (c) 2010 Ars Aperta, Itaapy, Pierlis, Talend. Copyright (c) 2011\n Jean-Marie Gouarne.\n\n This work was sponsored by the Agence Nationale de la Recherche\n (<http://www.agence-nationale-recherche.fr>).\n\n License: GPL v3, Apache v2.0 (see LICENSE).\n\nPOD ERRORS\n Hey! The above document had some coding errors, which are explained below:\n\n Around line 46:\n You can''t have =items (as at line 52) unless the first thing after the\n =over is an =item\n\n"}';
curl -XPOST http://localhost:9200/test/foo/_search -d '{"query":{"query_string":{"query":"get_column","default_operator":"AND"}},"fields":["pod"],"highlight":{"fields":{"message":{}}}}
';
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment