Friday, October 24, 2014

how to collect statistic on Netezza tables

nz_genstats  [-info]  [-full|-express|-basic]  <database>  [table ...]

Purpose:  Generate statistics on those tables that don't have up-to-date statistics.

          The optimizer uses statistics to guide its decisions on how best to
          execute a query.  The more reliable and up-to-date the statistics are,
          the more accurate the optimizer's decisions are likely to be.

          You can easily "GENERATE STATISTICS;" for an entire database;

          Or you could issue a  "GENERATE STATISTICS ON <table>;"
                          or a  "GENERATE EXPRESS STATISTICS ON <table>;".
          In this case, you must issue the sql statement on a table-by-table basis.

          But what if a given table already has up-to-date statistics?  Then
          regenerating statistics for it would be a waste of your time and the
          system's resources.

          This script first checks the state of the statistics for each table,
          and then only regenerates statistics on those tables that have outdated
          statistics.  (Statistics are either 100% up-to-date+accurate+reliable,
          or they aren't at 100%.  There is no other statistical measure that
          measures statistics).

          Note: If you wish to get/display the current statistic values for any
          given table you can use the script "nz_get"

Inputs:   The database name is required.

          The table name is optional.  If not specified then all tables in the
          database will be checked, and the statistics refreshed as appropriate.

          Or, you can specify one (or many) table names to be checked.

          If you have a file that contains a list of tablenames to be checked,
          it can be used via the following syntax:

                 nz_genstats  DBNAME  `cat /tmp/the_list_of_tables`


          -info     An optional switch

          If included, then statistics won't actually be generated on any table.
          Instead, this script will simply report information about what tables
          have up-to-date statistics and what tables need to have statistics
          generated against them.


          [-full|-express|-basic]     An optional switch

          You can control whether FULL, EXPRESS, or BASIC stats are generated for
          the tables.  (For an explanation of the difference between the three
          choices, see "nz_get -help" ).  If not specified, then this script will
          adhere to whatever type of stats each individual table currently has
          associated with it.

          Full statistics take the longest to generate -- but provide more accurate
          dispersion information (the # of unique values) for each column.

          Express statistics use a faster (but less accurate) mathematical formula
          to determine the dispersion.

          Basic statistics can be generated the fastest -- because they skip the
          dispersion calculation entirely.  (This is new as of NPS Version 4.6).

          Full statistics are recommended for smaller dimension tables ... which
          usually have fewer rows ... where the accuracy of the dispersion value
          tends to have more of an impact on query planning.

          For larger fact tables, express statistics or basic statistics are
          typically used because of the time savings in generating the statistics.

          [-min <nnn>]     Optional  switches     (default is -min 0)
          [-max <nnn>]                            (default is -max 99999999999999999)

          These switches allow you to control which tables are to have a GENSTATS
          run against them ... based on their table rowcount being between the -min
          and -max values specifed.  The default is for all tables to be included.
          Example: Specifying  '-max 1000000'  will result in only tables with
          <= 1M rows having a GENSTATS performed against them (and then only if
          necessary ... if the statistics aren't already up-to-date).

Outputs:  Sample output.  The first run included the "-info" switch.  No changes
          to the database were made.  The script simply lists what it WOULD have
          done if you let it.

          The second run actually invoked GENSTATS on three of the tables.  The
          elapsed runtime for the individual operations is included.

No comments:

Post a Comment