How to reproduce?
1) During an upgrade from 220.127.116.11 to 4.2.1 the script
The aim of the script is to remove a column node_deleted in the 4.0 alf_node table to code instead the information in the qname column.
To do this, the current (simplified) algorithm used is:
a) create a new table
b) select all the columns but node_deleted of the old table into the new table altering the qname column if the node is deleted
c) rebuild all the constraints and indexes
2) new proposed alogirithm:
a) do an in-place update of qname column only for the deleted nodes
b) drop the node_deleted column.
To measure the algorithms speeds I prefered to create a test mysql db rather than a full alfresco system.
a) create data files to inject 50000000 rows into mysql (as you did not answer my question about how many deleted nodes you have I made the test with 10% of deleted nodes):
To do this I ran the python script:
for j in range(0,filenb):
for i in range(0,perfile):
if i%10 == 0:
(I had to split into 50 files as mysql seems to have problems injecting larger files)
b) create a test table in a test database:
create table mytable (id int, isdeleted boolean, qname int, other varchar(255));
c) inject the data into mysql running a series of:
LOAD DATA INFILE '/tmp/file0.sql' INTO TABLE mytable;
LOAD DATA INFILE '/tmp/file49.sql' INTO TABLE mytable;
d) create indexes:
create index idx_id on mytable (id);
create index idx_isdeleted on mytable (isdeleted);
e) then test the speed of the algorithms:
create table mytable2 (id int, qname int,other varchar(255));
insert into mytable2 (id,qname,other) (select id,(case when isdeleted then 2 else qname end),other from mytable);
Query OK, 50000000 rows affected (15 min 50.77 sec)
You will then need to create indexes on the new table taking approx. 5 minutes for each column.
proposed algorithm (in place update):
update mytable set qname=2 where isdeleted;
Query OK, 5000000 rows affected (5 min 9.92 sec)
New algorithm 4 times faster (at least) than old algorithm
1) customer has 55+ millions nodes and upgrade time is longer than the maximum downtime time allowed for their application. Dividing the time by 4 or 5 would make it acceptable.
2) see also
3) customer raised a ticket with Mysql Support who proposed another way to do in-place update:
"...in short dropping or adding an FK constraint is an on-line inplace operation, but you need to disable foreign_key_checks in order to get that benefit, see: http://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_foreign_key_checks
I recommend that you alter their SQL script and disable foreign_key_checks for the duration of the script, then re-enable them at the end, this will reduce the number of table copies and should greatly reduce the time it takes to run your upgrade script."