Discussion:
BUG #8048: Text Search
(too old to reply)
l***@gmail.com
2013-04-09 09:47:17 UTC
Permalink
The following bug has been logged on the website:

Bug reference: 8048
Logged by: Luigi
Email address: ***@gmail.com
PostgreSQL version: 9.2.0
Operating system: Windows 7
Description:

I've configured 2 table like this

CREATE TABLE "User_Full_Text_Search" (
"Email" varchar(50),
"UserId" varchar(50),
"Full_Text_Search" varchar(4096)
)
WITH (OIDS=FALSE)
;

ALTER TABLE "User_Full_Text_Search" OWNER TO "postgres";

CREATE INDEX IX_FullText ON "User_Full_Text_Search" USING
gin(to_tsvector('italian', "Full_Text_Search"));


CREATE TABLE "User_Full_Text_Search_2" (
"Email" varchar(50),
"UserId" varchar(50),
"Full_Text_Search" varchar(4096),
"tsv" varchar(4096)
)
WITH (OIDS=FALSE)
;

ALTER TABLE "User_Full_Text_Search_2" OWNER TO "postgres";

CREATE INDEX IX_FullText_2 ON "User_Full_Text_Search_2" USING
gin("tsv");
CREATE TRIGGER "tsvectorupdate" BEFORE INSERT OR UPDATE ON
"User_Full_Text_Search_2"
FOR EACH ROW
EXECUTE PROCEDURE "tsvector_update_trigger"('tsv', 'pg_catalog.italian',
'Full_Text_Search');

Column Full_Text_Search (table User_Full_Text_Search) is just a single word
or max 2 words separeted by space " " (ex: test tester), and tsv (table
User_Full_Text_Search_2) is populate by materializing column with a
ts_vector of Full_Text_Search.

Now if i perform those 2 queries

select "UserId","Email" from "User_Full_Text_Search"
where to_tsvector('italian',"Full_Text_Search") @@ to_tsquery('italian',
'test|developer')
GROUP BY "UserId","Email"

select "UserId","Email" from "User_Full_Text_Search_2"
where "tsv" @@ to_tsquery('italian', 'test|developer')
GROUP BY "UserId","Email"

Records on Tables (are same) like 10 milion.

Execution time of 1st query is 120 seconds (result set like 750.000)
Execution time of 2st query is 270 seconds (result set like 750.000) same
records

I don't understand why a materialized column is more slow than a calculeted
one...
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Kevin Grittner
2013-04-09 15:20:10 UTC
Permalink
         CREATE TABLE "User_Full_Text_Search_2" (
         "Email" varchar(50),
         "UserId" varchar(50),
         "Full_Text_Search" varchar(4096),
         "tsv" varchar(4096)
         )
         WITH (OIDS=FALSE)
         ;
         CREATE INDEX IX_FullText_2 ON "User_Full_Text_Search_2"
           USING gin("tsv");
I don't understand why a materialized column is more slow than a
calculeted one...
Try materializing it as a tsvector instead of a varchar.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tom Lane
2013-04-09 15:23:59 UTC
Permalink
Post by l***@gmail.com
CREATE TABLE "User_Full_Text_Search_2" (
"Email" varchar(50),
"UserId" varchar(50),
"Full_Text_Search" varchar(4096),
"tsv" varchar(4096)
)
WITH (OIDS=FALSE)
;
You declared tsv as a plain varchar column, not a tsvector, so text
searches on it aren't optimized. I'm surprised the system even let
you build a GIN index on it --- maybe you have btree_gin installed?
Anyway that index isn't useful for answering a full-text-search
query, as you'll see if you compare EXPLAIN results.

regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Loading...