Faster csvwrite/dlmwrite

Matlab’s builtin functions for exporting (saving) data to output files are quite sub-optimal (as in slowwwwww…). I wrote a few posts about this in the past (how to improve fwrite performance, and save performance). Today I extend the series by showing how we can improve the performance of delimited text output, for example comma-separated (CSV) or tab-separated (TSV/TXT) files.

The basic problem is that Matlab’s dlmwrite function, which can either be used directly, or via the csvwrite function which calls it internally, is extremely inefficient: It processes each input data value separately, in a non-vectorized loop. In the general (completely non-vectorized) case, each data value is separately converted into a string, and is separately sent to disk (using fprintf). In the specific case of real data values with simple delimiters and formatting, row values are vectorized, but in any case the rows are processed in a non-vectorized loop: A newline character is separately exported at the end of each row, using a separate fprintf call, and this has the effect of flushing the I/O to disk each and every row separately, which is of course disastrous for performance. The output file is indeed originally opened in buffered mode (as I explained in my fprintf performance post), but this only helps for outputs done within the row – the newline output at the end of each row forces an I/O flush regardless of how the file was opened. In general, when you read the short source-code of dlmwrite.m you’ll get the distinct feeling that it was written for correctness and maintainability, and some focus on performance (e.g., the vectorization edge-case). But much more could be done for performance it would seem.

This is where Alex Nazarovsky comes to the rescue.

Alex was so bothered by the slow performance of csvwrite and dlmwrite that he created a C++ (MEX) version that runs about enormously faster (30 times faster on my system). He explains the idea in his blog, and posted it as an open-source utility (mex-writematrix) on GitHub.

Usage of Alex’s utility is very easy:

mex_WriteMatrix(filename, dataMatrix, textFormat, delimiter, writeMode);

where the input arguments are:

filename – full path name for file to export
dataMatrix – matrix of numeric values to be exported
textFormat – format of output text (sprintf format), e.g. '%10.6f'
delimiter – delimiter, for example ',' or ';' or char(9) (=tab)
writeMode – 'w+' for rewriting file; 'a+' for appending (note the lowercase: uppercase will crash Matlab!)

Here is a sample run on my system, writing a simple CSV file containing 1K-by-1K data values (1M elements, ~12MB text files):

>> data = rand(1000, 1000);  % 1M data values, 8MB in memory, ~12MB on disk
 
>> tic, dlmwrite('temp1.csv', data, 'delimiter',',', 'precision','%10.10f'); toc
Elapsed time is 28.724937 seconds.
 
>> tic, mex_WriteMatrix('temp2.csv', data, '%10.10f', ',', 'w+'); toc   % 30 times faster!
Elapsed time is 0.957256 seconds.

Alex’s mex_WriteMatrix function is faster even in the edge case of simple formatting where dlmwrite uses vectorized mode (in that case, the file is exported in ~1.2 secs by dlmwrite and ~0.9 secs by mex_WriteMatrix, on my system).

Trapping Ctrl-C interrupts

Alex’s mex_WriteMatrix code includes another undocumented trick that could help anyone else who uses a long-running MEX function, namely the ability to stop the MEX execution using Ctrl-C. Using Ctrl-C is normally ignored in MEX code, but Wotao Yin showed how we can use the undocumented utIsInterruptPending() MEX function to monitor for user interrupts using Ctrl-C. For easy reference, here is a copy of Wotao Yin’s usage example (read his webpage for additional details):

/* A demo of Ctrl-C detection in mex-file by Wotao Yin. Jan 29, 2010. */
 
#include "mex.h"
 
#if defined (_WIN32)
    #include <windows.h>
#elif defined (__linux__)
    #include <unistd.h>
#endif
 
#ifdef __cplusplus 
    extern "C" bool utIsInterruptPending();
#else
    extern bool utIsInterruptPending();
#endif
 
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
    int count = 0;    
    while(1) {
        #if defined(_WIN32)
            Sleep(1000);        /* Sleep one second */
        #elif defined(__linux__)
            usleep(1000*1000);  /* Sleep one second */
        #endif
 
        mexPrintf("Count = %d\n", count++);  /* print count and increase it by 1 */
        mexEvalString("drawnow;");           /* flush screen output */
 
        if (utIsInterruptPending()) {        /* check for a Ctrl-C event */
            mexPrintf("Ctrl-C Detected. END\n\n");
            return;
        }
        if (count == 10) {
            mexPrintf("Count Reached 10. END\n\n");
            return;
        }
    }
}

Matlab performance webinars

Next week I will present live online webinars about numerous other ways to improve Matlab’s run-time performance:

Oct 9, 2017 (Mon) – Matlab performance tuning part 1 – $295 (buy)
Oct 10, 2017 (Tue) – Matlab performance tuning part 2 – $295 (buy)
==> or buy both Matlab performance tuning webinars for only $495 (buy)

These live webinars will be 3.5 hours long, starting at 10am EDT (7am PDT, 3pm UK, 4pm CET, 7:30pm IST, time in your local timezone), with a short break in the middle. The presentations content will be based on onsite training courses that I presented at multiple client locations (details). A recording of the webinars will be available for anyone who cannot join the live events.

Additional Matlab performance tips can be found under the Performance tag in this blog, as well as in my book “Accelerating MATLAB Performance“.

Email me if you would like additional information on the webinars, or an onsite training course, or about my consulting.

Parsing mlint (Code Analyzer) output – The Matlab Code Analyzer (mlint) has a lot of undocumented functionality just waiting to be used. ...
Explicit multi-threading in Matlab part 3 – Matlab performance can be improved by employing POSIX threads in C/C++ code. ...
Plot LimInclude properties – The plot objects' XLimInclude, YLimInclude, ZLimInclude, ALimInclude and CLimInclude properties are an important feature, that has both functional and performance implications....
Preallocation performance – Preallocation is a standard Matlab speedup technique. Still, it has several undocumented aspects. ...

Faster csvwrite/dlmwrite

Trapping Ctrl-C interrupts

Matlab performance webinars

Trending Articles

Benson Boone – Sorry I’m Here For Someone Else – Single [iTunes Plus M4A]

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Font Brazil World Cup 2004 kits

REQ: ReFX Nexus 5 Latin House Expansion

Practice Sheet of Right form of verbs for HSC Students

Moondru Mudichu 31-08-2017 – Polimer tv Serial

Chai Status, Funny Tea Quotes in Hindi, चाय पर शायरी

sunstar exam scanner pdf

SOFT COPY ZA NGAIZA CHEMISTRY

Shale Hill Secrets (Love-Joint) (ENG+RUS) [L] [10.56GB]

[GET] Dickie Bush and Nicholas Cole – Ghostwriter GPT ($350.00)

Tinny — Dzormo (Prod by Hammer)

Notorious Naushad of Ippa gang nabbed

Brunei reaffirms healthcare commitment

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

Aaron Powers

Mp3 Download: Mr Raw - Adamma ft. Flavour & Harry B

collect2: error: ld returned 1 exit status while compiling openssl

Building Instruments With Logic Pro Samplers TUTORiAL-HiDERA

GTA 5 PPSSPP Zip File Download For Android Mediafire 382 MB