July 31, 2008

15% faster JPEG decoding on Windows x64 with IJG's JPEG library

Although Mozilla's JPEG library uses IJG's libjpeg6b, they adds native SSE2 optimization code for DCT using Intel's sample (AP-945. Origianl documents is removed from Intel server, but there is Japanese document in Intel Japan site). But this code in Mozilla is for MSVC for x86 only (Even if they make Linux version, most distributor uses system's libjpeg6b for their Firefox. So its result may becomes waste...).

How faster for decoding if its optimization is implemented for AMD64 platform? Although SSE2 of AMD CPU (Trion, Althon64 and Opetron. Not Intel CPU) is slower than Intel CPU (except to Pentium M), since MMX assembler doesn't work on MSVC for AMD64, I should try it.


It is 15% faster than original!

Benchmark program

cinfo.err = jpeg_std_error(&jerr);
cinfo.dct_method = JDCT_ISLOW;

jpeg_stdio_src(&cinfo, input_file);
jpeg_read_header(&cinfo, TRUE);


width = cinfo.output_width;
height = cinfo.output_height;

img = (JSAMPARRAY)malloc(sizeof(JSAMPROW)*height);

for(i = 0; i < height; ++i)
	img[i] = (JSAMPROW)calloc(sizeof(JSAMPLE), 3 * width);

while (cinfo.output_scanline < cinfo.output_height)
	jpeg_read_scanlines(&cinfo, img + cinfo.output_scanline, cinfo.output_height - cinfo.output_scanline);


Also, source code is available on http://hg.mozilla-x86-64.com/firefox-win64/. See jpeg directory.

Trackback URL: http://oldskool.s60.coreserver.jp/www.mozilla-x86-64.com/mt/mt-tb.cgi/77