Quantcast
Channel: Undocumented Matlab
Viewing all articles
Browse latest Browse all 219

Explicit multi-threading in Matlab – part 1

$
0
0

One of the limitations of Matlab already recognized by the community, is that it does not provide the users direct access to threads without the PCT (Parallel Computing Toolbox). For example, letting some expensive computations or I/O to be run in the background without freezing the main application. Instead, in Matlab there is either implicit multiprocessing which relies on built-in threading support in some MATLAB functions, or explicit multiprocessing using PCT (note: PCT workers use heavyweight processes, not lightweight threads). So the only way to achieve truly multi-threading in Matlab is via MEX, Java or .Net, or by spawning external standalone processes.

Note that we do not save any CPU cycles by running tasks in parallel. In the overall balance, we actually increase the amount of CPU processing, due to the multi-threading overhead. However, in the vast majority of cases we are more interested in the responsivity of Matlab’s main processing thread (known as the Main Thread, Matlab Thread, or simply MT) than in reducing the computer’s total energy consumption. In such cases, offloading work to asynchronous C++, Java or .Net threads could remove bottlenecks from Matlab’s main thread, achieving significant speedup.

Today’s article is a derivative of a much larger section on explicit multi-threading in Matlab, that will be included in my upcoming book MATLAB Performance Tuning, which will be published later this year.

Sample problem

In the following example, we compute some data, save it to file on a relatively slow USB/network disk, and then proceed with another calculation. We start with a simple synchronous implementation in plain Matlab:

tic
data = rand(5e6,1);  % pre-processing, 5M elements, ~40MB
fid = fopen('F:\test.data','w');
fwrite(fid,data,'double');
fclose(fid);
data = fft(data);  % post-processing
toc
 
Elapsed time is 9.922366 seconds.

~10 seconds happens to be too slow for our specific needs. We could perhaps improve it a bit with some fancy tricks for save or fwrite. But let’s take a different approach today, using multi-threading:

Using Java threads

Matlab uses Java for numerous tasks, including networking, data-processing algorithms and graphical user-interface (GUI). In fact, under the hood, even Matlab timers employ Java threads for their internal triggering mechanism. In order to use Java, Matlab launches its own dedicated JVM (Java Virtual Machine) when it starts (unless it’s started with the -nojvm startup option). Once started, Java can be directly used within Matlab as a natural extension of the Matlab language. Today I will only discuss Java multithreading and its potential benefits for Matlab users: Readers are assumed to know how to program Java code and how to compile Java classes.

To use Java threads in Matlab, first create a class that implements the Runnable interface or extends java.lang.Thread. In either case we need to implement at least the run() method, which runs the thread’s processing core.

Now let us replace the serial I/O with a very simple dedicated Java thread. Our second calculation (fft) will not need to wait for the I/O to complete, enabling much faster responsiveness on Matlab’s MT. In this case, we get a 58x (!) speedup:

tic
data = rand(5e6,1);  % pre-processing (5M elements, ~40MB)
javaaddpath 'C:\Yair\Code\'  % path to MyJavaThread.class
start(MyJavaThread('F:\test.data',data));  % start running in parallel
data = fft(data);  % post-processing (Java I/O runs in parallel)
toc
 
Elapsed time is 0.170722 seconds.   % 58x speedup !!!

Note that the call to javaaddpath only needs to be done once in the entire Matlab session, not repeatedly. The definition of our Java thread class is very simple (real-life classes would not be as simplistic, but the purpose here is to show the basic concept, not to teach Java threading):

import java.io.DataOutputStream;
import java.io.FileOutputStream;
public class MyJavaThread extends Thread
{
    String filename;
    double[] doubleData;
    public MyJavaThread(String filename, double[] data)
    {
        this.filename = filename;
        this.doubleData = data;
    }
    @Override
    public void run()
    {
        try
        {
            DataOutputStream out = new DataOutputStream(
                                     new FileOutputStream(filename));
            for (int i=0; i < doubleData.length; i++)
            {
                out.writeDouble(doubleData[i]);
            }
            out.close();
        } catch (Exception ex) {
            System.out.println(ex.toString());
        }
    }
}

Note: when compiling a Java class that should be used within Matlab, as above, ensure that you are compiling for a JVM version that is equal to, or lower than Matlab’s JVM, as reported by Matlab’s version function:

% Matlab R2013b uses JVM 1.7, so we can use JVMs up to 7, but not 8
>> version –java
ans =
Java 1.7.0_11-b21 ...

Matlab synchronization

Java (and C++/.Net) threads are very effective when they can run entirely independently from Matlab’s main thread. But what if we need to synchronize the other thread with Matlab's MT? For example, what if the Java code needs to run some Matlab function, or access some Matlab data? In MEX this could be done using the dedicated and documented MEX functions; in Java this can be done using the undocumented/unsupported JMI (Java-Matlab Interface) package. Note that using standard Java Threads without Matlab synchronization is fully supported; it is only the JMI package that is undocumented and unsupported.

Here is the relevant code snippet for evaluating Matlab code within a Java thread:

import com.mathworks.jmi.Matlab;  //in %matlabroot%/java/jar/jmi.jar
...
Matlab matlabEngine = new Matlab();
...
Matlab.whenMatlabReady(runnableClass);

Where runnableClass is a class whose run() method includes calls to com.mathworks.jmi.Matlab methods such as:

matlabEngine.mtEval("plot(data)");
Double value = matlabEngine.mtFeval("min",{a,b},1); //2 inputs 1 output

Unfortunately, we cannot directly call matlabEngine's methods in our Java thread, since this is blocked: in order to ensure synchronization Matlab only enables calling these methods from the MT, which is the reason for the runnableClass. Indeed, synchronizing Java code with MATLAB could be quite tricky, and can easily deadlock MATLAB. To alleviate some of the risk, I advise not to use the JMI class directly: use Joshua Kaplan's MatlabControl class, a user-friendly JMI wrapper.

Note that Java's native invokeAndWait() method cannot be used to synchronize with Matlab. M-code executes as a single uninterrupted thread (MT). Events are simply queued by Matlab's interpreter and processed when we relinquish control by requesting drawnow, pause, wait, waitfor etc. Matlab synchronization is robust and predictable, yet forces us to use the whenMatlabReady(runnableClass) mechanism to add to the event queue. The next time drawnow etc. is called in M-code, the event queue is purged and our submitted code will be processed by Matlab's interpreter.

Java threading can be quite tricky even without the Matlab synchronization complexity. Deadlock, starvation and race conditions are frequent problems with Java threads. Basic Java synchronization is relatively easy, using the synchronized keyword. But getting the synchronization to work correctly is much more difficult and requires Java programming expertise that is beyond most Java programmers. In fact, many Java programmers who use threads are not even aware that their threads synchronization is buggy and that their code is not thread-safe.

My general advise is to use Java threads just for simple independent tasks that require minimal interactions with other threads, Matlab engine, and/or shared resources.

Additional alternatives

In addition to Java threads, we can use other technologies for multi-threading in Matlab: Next week's article will explore Dot-Net (C#) threads and timers, and that will be followed by a variety of options for C++ threads and spawned-processes IPC. So don't let anyone complain any longer about not having multi-threading in Matlab. It's not trivial, but it's also not rocket science, and there are plenty of alternatives out there.

Still, admittedly MT's current single-threaded implementation is a pain-in-the-so-and-so, relic of a decades-old design. A likely future improvement to the Matlab M-code interpreter would be to make it thread-safe. This would enable automatic conversion of for loops into multiple threads running on multiple local CPUs/cores, significantly improving Matlab's standard performance and essentially eliminating the need for a separate parfor in PCT (imagine me drooling here). Then again, this might reduce PCT sales...

Advanced Matlab Programming course – London 10-11 March, 2014

If Matlab performance interests you, consider joining my Advanced Matlab Programming course in London on 10-11 March, 2014. In this course/seminar I will explore numerous other ways by which we can improve Matlab's performance and create professional code. This is a unique opportunity to take your Matlab skills to a higher level within a couple of days. Registration closes this Friday, so don't wait too long.

 
Related posts:
  1. Multi-line tooltips Multi-line tooltips are very easy to set up, once you know your way around a few undocumented hiccups....
  2. Multi-line uitable column headers Matlab uitables can present long column headers in multiple lines, for improved readability. ...
  3. Multi-column (grid) legend This article explains how to use undocumented axes listeners for implementing multi-column plot legends...
  4. JMI wrapper – local MatlabControl part 2 An example using matlabcontrol for calling Matlab from within a Java class is explained and discussed...
 

Viewing all articles
Browse latest Browse all 219

Trending Articles