Friday, October 24, 2014

how to check statistic value of table in NPS

use nz_get to get the value of statistic status.

Usage:    nz_get  [ database [ table ]]

Purpose:  Get (and display) the statistics for a user table or a system table.

          Statistics are used by the NPS optimizer in order to plan the best
          way to run each SQL query (for queries that are run against user data,
          and also for queries that are run against the system catalogs).

          Some of these statistics are automatically maintained by the system
          (i.e., 'always-there' statistics).  Other values are dynamically
          calculated+saved when a "GENERATE [EXPRESS] STATISTICS" command is
          issued.

          This script will dump out all of the known statistics for a table.

Inputs:   The database and table names are optional.  If not specified, then
          all tables in all databases will be reported upon.

          In lieu of a table name, you may instead specify a synonym name or
          materialized view name -- in which case the statistics for the
          underlying table will be displayed.

          The table name may also be any one of the "SYSTEM TABLE" or
          "MANAGEMENT TABLE" names.

          o     For a list of these names, see 'nz_get_sysmgmt_table_names'

          o     By default, normal (non-ADMIN) users do not have access
                to any of these SYSTEM/MANAGEMENT tables.  They are only
                allowed to access them indirectly (thru SYSTEM/MANAGEMENT
                views).

          o     Some of these tables are global in nature, so it does not
                matter which database name you specify on the command line.
                The results will always be the same.  e.g. _VT_SPU_ZMAP_INFO

          o     Some of these tables are local in nature (database specific)
                so the database you specify is relevant.  e.g. _T_ATTRIBUTE

          o     If you wanted to look at the statistics for all of the
                SYSTEM/MANAGEMENT tables, you could issue a command such as this:

                for TABLE in `nz_get_sysmgmt_table_names system`; do nz_get system $TABLE; done

Outputs:  The output of this script will include the following information

            Database:     name, objid

               Table:     name, objid, distribution clause, row count(statistic)

          Per-Column:     attnum              the logical column number (from 1-1600)
                          Column Name
                          Statistics Status   the validity/freshness of the statistics
                          Minimum Value
                          Maximum Value
                          # of Unique Values  also known as the dispersion value
                          # of NULLs
                          MaxLen              The MaxLen+AvgLen are only collected for
                          AvgLen              columns of type VARCHAR, NCHAR, NVARCHAR


          Regarding the "Statistic Status" ... the following indicates that statistics
          have been explicitly generated against this table/column (via a GENSTATs
          command) and that they are 100% up-to-date.  The ONLY difference between the
          three is in how the "# of Unique Values" (the dispersion) is generated.
          The disperion value is the most complex statistic to gather, in terms of time +
          memory + cpu usage.

          Full        Similar to doing a "COUNT(DISTINCT columname)".  It generates the
                      most accurate dispersion value, but has the most overhead associated
                      with it.
          Express     It uses math to estimate the dispersion value (a hash is generated
                      for each column value, and then the number of unique hash keys is
                      added up).  Much faster, but less precise.
          Basic       The dispersion calculation is skipped entirely.  All of the other
                      "basic" statistics are still collected.

          So, statistics are either 100% correct and up-to-date, or they're not.  Any change
          to a table ... even if it involves just 1 row being inserted/updated/deleted  ...
          results in the statistics no longer being 100% up-to-date.  In other words, they
          would be outdated.  This doesn't mean that they would be bad or wrong.  Just outdated.

          When rows are nzload'ed/INSERT'ed into a table, NPS automatically compares
          each column value against the MIN/MAX statistic for the column, and updates the
          table's statistics accordingly.  This is also referred to as "always there
          statistics".  This applies to all columns/all datatypes -- EXCEPT for text
          columns (CHAR, NCHAR, VARCHAR, NVARCHAR).  So that means that the MIN/MAX values
          for the column are still OK -- they are still up-to-date (at the same time, the
          table's rowcount statistic is also updated).  But the other statistics values
          (the dispersion and # of nulls) are not up-to-date as they can only be recomputed
          via an explicit GENSTATS.  In this case, the "Statistics Status" will show

          Full    Min/Max OK
          Express Min/Max OK
          Basic   Min/Max OK

          As mentioned, this doesn't apply to text columns.  So all of the statistics that
          are maintained for text columns would be outdated ... and can only be refreshed
          via an explicit GENSTATS.  In this case, the "Statistics Status" will show

          Full    Outdated
          Express Outdated
          Basic   Outdated

          If you never ever bothered to do a GENSTATS on a particular table, then the only
          per-column statistics that would be available are the "always there statistics"
          that NPS automatically collects.  So this means you will have MIN/MAX statistics
          (and only MIN/MAX statistics) for those columns.  Which will be displayed as

          Min/Max Only

          As mentioned, "always there statistics" aren't collected for text columns.  So
          if you've never done an explicit GENSTATS ... and if NPS has never automatically
          collected any statistics ... then there will be no statistics at all available
          to the optimizer.  In this case the "Statistics Status" will show

          Unavailable

          Other situations in which "Unavailable" will be used
          o   if the table is empty
          o   if you've loaded a compressed external file into an empty table

          We don't attempt to collect statistics on very wide text columns, which are
               CHAR/VARCHAR   columns with a defined length >= 24565
               NCHAR/NVARCHAR columns with a defined length >=  6142
          The "Statistics Status" for those columns will be displayed as

          not maintained


          Regarding "always there statistics" ...

          when rows are nzload'ed/INSERT'ed into a table we can easily compare the column
          values against the MIN+MAX statistics, and update them accordingly (if needed)

          when rows are DELETE'd from a table, even though you might be deleting a row
          that matches a particular MIN value or a particular MAX value, there is no
          way to answer the following
               a)  is this the only row in the table with that particular value ?
               b)  and if so, what is the next MIN value or the next MAX value ?
          The only way to determine those answers is via a GENSTATS (which processes
          every row in the table).  Thus, a DELETE operation never shrinks the
          MIN/MAX values.  But we still know that we can use+trust those statistics ...
          that there is no value in the table column that is outside the current
          MIN/MAX range.

          when rows are UPDATE'd, we actually do an INSERT+DELETE.  See above.


          Regarding GENERATE [EXPRESS] STATISTICS ...

          as of 4.6, there is really only one GENERATE STATISTICS command now.  Both
          GENERATE STATISTICS and GENERATE EXPRESS STATISTICS do the same thing.
          GENERATE EXPRESS STATISTICS will someday be retired.

          the only difference between the two has been in how they calculate the
          "# of Unique Values" ... the dispersion value ... for a column.  That
          still occurs, but now NPS decides for itself which method to use.

          If       the table's rowcount is <= 10 * SPU_COUNT
          Then     FULL stats will be collected
                   where we actually try to do a COUNT(DISTINCT on_each_column)
                   # Of Columns Processed Per Scan:  set STATS_COL_GRP_LIMIT = 10;

          If       the table's rowcount is <= JIT_DISP_MIN_ROWS (the default is 500,000,000)
          Then     EXPRESS stats will be collected
                   where we hash the column values to approximate the dispersion value
                   # Of Columns Processed Per Scan:  set SAMPLED_STATS_COL_GRP_LIMIT = 50;

          Else     we skip the dispersion calculation alltogether (all other statistics
                   for the column will still be gathered)
                   # Of Columns Processed Per Scan:  currently unbounded

No comments:

Post a Comment