Cinthiya(my sister) called me and said that the Annual Thanks Giving Service at Karunya is being hosted live on the web. Watching it gave me a nostalgic feeling of the good old days in Karunya where i was part of the Mega Play Team. I managed to record part of the stream though the video quality is not great. The below video is an extract containing the final choreography in which my sister was part of.
Every hour, we saw the diskspace on our root partition go up from 45 % to 90 % and come back to 45 %. Initially thought mysql was processing some temp tables and in the next hour we kept watching the temp location. In terms of files / diskspace usage using “du / ls” nothing changed. But lsof provided an interesting output.
Output 1 :
mysqld 31719 mysql 6u REG 8,1 0 149240 /var/tmp/ibOLLuRa (deleted)
mysqld 31719 mysql 7u REG 8,1 102 149243 /var/tmp/ibNuSvI9 (deleted)
mysqld 31719 mysql 8u REG 8,1 0 149245 /var/tmp/ibqq2wz8 (deleted)
mysqld 31719 mysql 9u REG 8,1 0 149255 /var/tmp/ibVN66Rh (deleted)
mysqld 31719 mysql 13u REG 8,1 0 149256 /var/tmp/ibslSJJA (deleted)
mysqld 31719 mysql 15u unix 0×000001085e445680 19044143 /tmp/mysql.sock
mysqld 31719 mysql 18u REG 8,1 1128267776 146656 /var/tmp/MLHbDVOq (deleted)
Output 2 :
mysqld 31719 mysql 6u REG 8,1 0 149240 /var/tmp/ibOLLuRa (deleted)
mysqld 31719 mysql 7u REG 8,1 102 149243 /var/tmp/ibNuSvI9 (deleted)
mysqld 31719 mysql 8u REG 8,1 0 149245 /var/tmp/ibqq2wz8 (deleted)
mysqld 31719 mysql 9u REG 8,1 0 149255 /var/tmp/ibVN66Rh (deleted)
mysqld 31719 mysql 13u REG 8,1 0 149256 /var/tmp/ibslSJJA (deleted)
mysqld 31719 mysql 15u unix 0×000001085e445680 19044143 /tmp/mysql.sock
mysqld 31719 mysql 18u REG 8,1 1396703232 146656 /var/tmp/MLHbDVOq (deleted)
The size occupied by a file marked deleted kept increasing Strange but it seems to be happening on every dataload. The size change in this file corroborates perfectly with the space usage reported by “df” but not with “du”.
We are setting up the tmp location to a filesystem with more space for now before we investigate further.
Problem Statement : 386 GB of MySQL 5.0 Compressed MyISAM tables (Merge engine used) need to be migrated to MySQL 5.1 Compressed/Partitioned tables. Also we had to add a few new columns and change the datatype of few existing columns.
Initial plan : Setup the MySQL 5.1 database with the schema with partitioning to which the data needs to be migrated and then run scripts to copy the .MYD ,.MYI and .frm files to the 5.1 database. Disable the keys using “Alter table <> disable keys“. Then migrate the data using the insert into newtable select * from old table. Then to compress the tables run myisampack and myisamchk.
As per plan we started the migration process. The plan worked perfectly though a bit on the slower side for most of the tables. For few of the tables, the import process kept dragging on and on. It was well past a week since the import was started and the system was writing at a speed of 1 – 1.5 Gig per hour on to disk. There was heavy contention on the key buffer and the behavior was seen on tables which had keys greater than the available buffer and those with where clause in the “insert into <> select from” statments. Then we started getting constrained on time, was looking for avenues to optimize this whole process.
After quite a number of tests and trials , we found the following 2 methods to be faster.
Method 1:
1. Dump the data using select into outfile in the primary key order in small chunks if the table is huge.
2. Alter table <> disable keys.
3. Load the data using load data infile statment.
4. Run myisampack and myisamchk to compress the data and rebuild the indexes respectively.
Method 1, was faster than the original plan.
Method 2 :
Method 2 would be a slight variation of method 1. Instead of “Alter table <> disable keys“, do the following :
For each partition on the table run myisamchk –keys-used=0 -rq <tablename>. This would disable all keys on the table. Now proceed from step 3.
With the second approach we, got a throughput of 1GB data written to disk in 1 minute approx. Will follow up this post with more detailed commands and steps. Whew ! What a week it has been so far !
One of our peer group developers contacted me today stating that they were inserting a value of 10 in one of the columns in their tables. But after the insert, when they query the table they were getting the value as 0.
Logged into the box to check what was happening. Luckily binlog was turned on that mysql instance. Ran the mysqlbinlog utility on the current binlog and greped for the particular primary key value.
It listed an insert statement and an update statement. There was the culprit, the value of 10 was getting inserted but soon the application was following it up with an update to value 0. Yeah it is always not the database
The slave failed with the error that the relay binlog is corrupt. It had copied close to 12 binlogs from the master and they were yet to be applied. Unfortunately those binlogs have been purged on the master. Now to sync up cleanly we might have to refresh data from the master which can be costly since it was a 290 GB database. We had the option of shutting down the server. We thought we can try our luck with a crazy hack. We shutdown the server. Tried reading the binlog using mysqlbinlog utility from the corrupt position. It failed as expected. Then we tried reading from the next immediate position and it went through fine. Now we had a proof that our hack might work. We opened the relay-log.info and incremented the second row by a value of one. Then we started the server. Boom, the slave started running and we were saved from a great pain of resyncing the slave.
PS : We might have missed one transaction in this hack, but that was ok for our use case.
Pass the start date + time and end date + time to the following script. It looks at the alert log file and obtains the no of archive logs that have been applied during media recovery in that time interval.
#!/usr/bin/perl -w
BEGIN { push @INC,"/home/user/TimeDate-1.16/lib/" }
use Date::Parse;
$start=str2time(@ARGV[0]);
chomp(@ARGV[1]);
$end=str2time(@ARGV[1]);
$log='path to the log file';
open(ALERTLOGFILE,$log) or die "Not able to open ".$log;
$progress=-1;
$count=0;
while(<ALERTLOGFILE>){
if(/^[A-Z][a-z]{2}\s[A-Z][a-z]{2}/){
$time=str2time($_);
if(($time >= $start) && ($time <= $end)){
$progress=1;
}else{
$progress=-1;
}
}
if(/^Media Recovery Log/){
if($progress == 1){
$count+=1;
}
}
}
print $count ."\n";