Discussion:
BUG #7853: Incorrect statistics in table with many dead rows.
(too old to reply)
j***@seagate.com
2013-02-05 18:41:06 UTC
Permalink
The following bug has been logged on the website:

Bug reference: 7853
Logged by: James Skaggs
Email address: ***@seagate.com
PostgreSQL version: 8.4.14
Operating system: RHEL6
Description:

After "analyze verbose", the table shows 158 million rows. A select count(1)
yields 13.8 million rows.

INFO: analyzing "public.stream_file"
INFO: "stream_file": scanned 30000 of 2123642 pages, containing 184517 live
rows and 2115512 dead rows; 30000 rows in sample, 158702435 estimated total
rows

Here are the table statistics.

Sequential Scans 81853
Sequential Tuples Read 578848425234
Index Scans 1976513672
Index Tuples Fetched 2183339860
Tuples Inserted 65122575
Tuples Updated 308883671
Tuples Deleted 51238760
Tuples HOT Updated 2242897
Live Tuples 163981972
Dead Tuples 7056493
Heap Blocks Read 43483331819
Heap Blocks Hit 43121456487
Index Blocks Read 134539277
Index Blocks Hit 13606451182
Toast Blocks Read
Toast Blocks Hit
Toast Index Blocks Read
Toast Index Blocks Hit
Last Vacuum 2013-02-04 10:06:44.058743-07
Last Autovacuum 2013-02-04 16:11:34.289823-07
Last Analyze 2013-02-04 14:22:27.848547-07
Last Autoanalyze 2013-02-01 17:37:29.855553-07
Table Size 17 GB
Toast Table Size none
Indexes Size 34 GB
Query returned successfully with no result in 4094 ms.


Bad statistics led to a bad plan. We will cluster the table today to see if
that fixes it, but I think statistics should be correct, regardless of the
state of a table. BTW, Coverity product requries 8.x, and we'll upgrade to
8.4.15 today. Didn't see anything about better statistics in the 8.4.15
changelog.
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Jeff Janes
2013-02-10 20:10:57 UTC
Permalink
Post by j***@seagate.com
INFO: analyzing "public.stream_file"
INFO: "stream_file": scanned 30000 of 2123642 pages, containing
184517 live rows and 2115512 dead rows; 30000 rows in sample,
158702435 estimated total rows
184517 live rows in 30000 randomly sampled pages out of 2123642
total pages, means that the statistics predict that a select
count(*) will find about 13 million live rows to count.
Post by j***@seagate.com
After "analyze verbose", the table shows 158 million rows. A
select count(1) yields 13.8 million rows.
OK, the estimate was 13 million and there were actually 13.8
million, but it is a random sample used to generate estimates.
That seems worse than average, but close enough to be useful.
The 158.7 million total rows includes dead rows, which must be
visited to determine visibility, but will not be counted because
they are not visible to the counting transaction.
To clarify here, the 158.7 million estimate does not *intentionally*
include dead rows. As you say, the ANALYZE did get a very good
instantaneous estimate of the number of live rows. However, ANALYZE
doesn't over-write the old estimate, it averages its estimate into the
old one. After the table shape changes dramatically, the ANALYZE
needs to be run repeatedly before the estimate will converge to the
new reality. (Of course a cluster or vacuum full will blow away the
old statistics, so the next analyze after that will solely determine
the new statistics.)

I agree, not a bug.

Cheers,

Jeff
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
James R Skaggs
2013-02-22 23:41:55 UTC
Permalink
Okay, I have some more info.

Some background info. This one table gets so many changes, I CLUSTER it
each night. However, after I do this. The statistics still appear to be
incorrect. Even after I do a "select pg_stat_reset();" Followed by 3
ANALYZE at default_statistics_target as 1, 10, and 100

select relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_del,
n_tup_hot_upd
from pg_stat_all_tables
('stream_file', 109061143L, 103913868L, 0L, 14201L, 0L)

Is it possible that there are still dead tuples after a CLUSTER?

Explain Analyze thinks we have 112M rows (in fact there are 10M), so it
needs do do a disc sort with work_mem=1GB..

"Aggregate (cost=23622814.39..23622814.40 rows=1 width=12) (actual
time=57512.462..57512.462 rows=1 loops=1)"
" -> GroupAggregate (cost=21536989.70..23425506.64 rows=11274728
width=53) (actual time=50825.396..57457.421 rows=202394 loops=1)"
" Output: max(sf.current_code_line_count),
max(sf.current_comment_line_count), max(sf.current_blank_line_count)"
" -> Sort (cost=21536989.70..21818857.90 rows=112747280 width=53)
(actual time=50825.386..55004.916 rows=7683730 loops=1)"
" Output: sf.current_code_line_count,
sf.current_comment_line_count, sf.current_blank_line_count,
sf.file_path_id, sf.current_source_md5"
" Sort Key: sf.file_path_id, sf.current_source_md5"
" Sort Method: external merge Disk: 502288kB"
" -> Seq Scan on stream_file sf (cost=0.00..2604208.80
rows=112747280 width=53) (actual time=0.033..27922.485 rows=7683730
loops=1)"
" Output: sf.current_code_line_count,
sf.current_comment_line_count, sf.current_blank_line_count,
sf.file_path_id, sf.current_source_md5"
"Total runtime: 57693.835 ms"

Now we do many, many ANALYZE VERBOSE, and converge on the correct value,
which is known to be about 10M rows.

INFO: analyzing "public.stream_file"
INFO: "stream_file": scanned 30000 of 1476736 pages, containing 158846
live rows and 2175512 dead rows; 30000 rows in sample, 112747282 estimated
total rows
Query returned successfully with no result in 9172 ms.

... 200X !...

INFO: analyzing "public.stream_file"
INFO: "stream_file": scanned 30000 of 1480611 pages, containing 158776
live rows and 2170410 dead rows; 30000 rows in sample, 9769236 estimated
total rows
Query returned successfully with no result in 441 ms.

Now, the optimizer thinks we can do a has aggregate in memory and we get
better performance.

"Aggregate (cost=1734729.12..1734729.14 rows=1 width=12) (actual
time=33816.049..33816.049 rows=1 loops=1)"
" -> HashAggregate (cost=1700534.50..1717631.81 rows=976989 width=53)
(actual time=33535.083..33712.787 rows=202404 loops=1)"
" Output: max(sf.current_code_line_count),
max(sf.current_comment_line_count), max(sf.current_blank_line_count)"
" -> Seq Scan on stream_file sf (cost=0.00..1578410.89
rows=9769889 width=53) (actual time=392.435..26278.143 rows=7710223
loops=1)"
" Output: sf.id, sf.current_blank_line_count,
sf.current_code_line_count, sf.current_comment_line_count,
sf.current_source_md5, sf.component_id, sf.current_file_instance_id,
sf.current_file_state_id, sf.file_path_id, sf.stream_element_id"
"Total runtime: 33822.707 ms"

But later in the day, the statistics revert back to the 100M number! Any
ideas? Is there some kind of cache that is remembering the old statistics.?


Thanks and Regards,

James Skaggs
IT/FIS Longmont
SeaTel: 8 684 1048
General: +1 612 367 6224
Post by Jeff Janes
Post by j***@seagate.com
INFO: analyzing "public.stream_file"
INFO: "stream_file": scanned 30000 of 2123642 pages, containing
184517 live rows and 2115512 dead rows; 30000 rows in sample,
158702435 estimated total rows
184517 live rows in 30000 randomly sampled pages out of 2123642
total pages, means that the statistics predict that a select
count(*) will find about 13 million live rows to count.
Post by j***@seagate.com
After "analyze verbose", the table shows 158 million rows. A
select count(1) yields 13.8 million rows.
OK, the estimate was 13 million and there were actually 13.8
million, but it is a random sample used to generate estimates.
That seems worse than average, but close enough to be useful.
The 158.7 million total rows includes dead rows, which must be
visited to determine visibility, but will not be counted because
they are not visible to the counting transaction.
To clarify here, the 158.7 million estimate does not *intentionally*
include dead rows. As you say, the ANALYZE did get a very good
instantaneous estimate of the number of live rows. However, ANALYZE
doesn't over-write the old estimate, it averages its estimate into the
old one. After the table shape changes dramatically, the ANALYZE
needs to be run repeatedly before the estimate will converge to the
new reality. (Of course a cluster or vacuum full will blow away the
old statistics, so the next analyze after that will solely determine
the new statistics.)
I agree, not a bug.
Cheers,
Jeff
Jeff Janes
2013-03-02 19:19:05 UTC
Permalink
On Fri, Feb 22, 2013 at 3:41 PM, James R Skaggs
Post by James R Skaggs
Okay, I have some more info.
Some background info. This one table gets so many changes, I CLUSTER it
each night. However, after I do this. The statistics still appear to be
incorrect. Even after I do a "select pg_stat_reset();" Followed by 3
ANALYZE at default_statistics_target as 1, 10, and 100
select relname, n_live_tup, n_dead_tup, n_tup_ins, n_tup_del,
n_tup_hot_upd
from pg_stat_all_tables
('stream_file', 109061143L, 103913868L, 0L, 14201L, 0L)
Is it possible that there are still dead tuples after a CLUSTER?
Yes. A cluster must bring along any tuples which are possibly visible to
any open transaction. Your root problem seems to be that you have
long-open transactions which are preventing vacuum from doing its thing,
which leads you try clustering, but the long-open transaction prevents that
from doing its things effectively as well.

Perhaps PG could deal with this situation more gracefully, but
fundamentally you have to figure why you have these ancient transactions
lying around, and fix them or kill them.

Cheers,

Jeff

Jeff Janes
2013-02-24 00:17:24 UTC
Permalink
Post by Jeff Janes
OK, the estimate was 13 million and there were actually 13.8
million, but it is a random sample used to generate estimates.
That seems worse than average, but close enough to be useful.
The 158.7 million total rows includes dead rows, which must be
visited to determine visibility, but will not be counted because
they are not visible to the counting transaction.
To clarify here, the 158.7 million estimate does not *intentionally*
include dead rows. As you say, the ANALYZE did get a very good
instantaneous estimate of the number of live rows. However, ANALYZE
doesn't over-write the old estimate, it averages its estimate into the
old one. After the table shape changes dramatically, the ANALYZE
needs to be run repeatedly before the estimate will converge to the
new reality. (Of course a cluster or vacuum full will blow away the
old statistics, so the next analyze after that will solely determine
the new statistics.)
I was incredibly wrong here. The cluster or vacuum do not blow away the
stats so that the next analyze gets to solely determine them. Rather, they
impose their own idea of live tuples, and then analyze can only update that
incrementally as it averages itself into the old value.

Worse, the two methods have very different ideas of what constitutes a live
tuple. ANALYZE thinks tuples that are visible to a current/recent snapshot
are live. While CLUSTER and VACUUM think tuples that are possibly visible
to anyone are live.

I would say that this is a bug, or at least approaching to being one. It
is not obvious whether reltuples and n_live_tuples should count the
"recently dead", but it should either be one way or the other and not an
unholy mixture of the two.

As it is now, a cluster or simple vacuum will snap n_live_tuples so that it
counts recently dead, then analyze will slowly converge it to excludes
recently dead, and then the next vacuum will snap it back again.

Of course, all of this only comes into play in the presence of very
long-lived transactions that prevent tuples from going away. Otherwise the
number recently dead is small enough not to matter.


create table foo as select (random()*1000000)::integer as val from
generate_series(1,50000000);

In a different session, open a transaction and leave it open: begin; create
temp table adlfkj (x serial);

Back in the main session:

delete from foo where val > 100;

run this repeatedly and watch the rows estimate slowly decay:

ANALYZE verbose foo; explain select count(*) from foo;

Then run this and watch it instantly spring back:

VACUUM VERBOSE foo ; explain select count(*) from foo;

Cheers,

Jeff
jimbob
2013-02-25 19:00:10 UTC
Permalink
So, I have some observations. Is this what you are seeing as well?

So when we CLUSTER a table heavily-updated table:

CLUSTER does appear to reset *n_dead_tup*, *n_tup_ins*, *n_tup_del*,
*n_tup_hot_upd*, but NOT *n_live_tup*

pg_stat_reset() truly clears out all the statistics counters. I tried this
because *n_live_tup* is not correct.

A subsequent ANALYZE will update *n_dead_tup* and *n_live_tup* to some
values that could not possibly be based on the newly CLUSTERed table

So, how to get correct statistics for a heavily updated table? In my
experience, we only need to get the exponent correct, but we're not even
getting that.

BTW, I've upgraded to 8.4.15.



--
View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-7853-Incorrect-statistics-in-table-with-many-dead-rows-tp5743845p5746602.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
jimbob
2013-02-28 22:54:59 UTC
Permalink
BTW "jimbob" and "James.R.Skaggs" are the same person. I just didn't want to
use my "work" email for this....



--
View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-7853-Incorrect-statistics-in-table-with-many-dead-rows-tp5743845p5747000.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Loading...