Quantcast
Viewing all articles
Browse latest Browse all 219

Serializing/deserializing Matlab data

Last year I wrote an article on improving the performance of the save function. The article discussed various ways by which we can store Matlab data on disk. However, in many cases we are interested in a byte-stream serialization, in order to transmit information to external processes.

Image may be NSFW.
Clik here to view.
The request to get a serialized byte-stream of Matlab data has been around for many years (example), but MathWorks has never released a documented way of serializing and unserializing data, except by storing onto a disk file and later loading it from file. Naturally, using a disk file significantly degrades performance. We could always use a RAM-disk or flash memory for improved performance, but in any case this seems like a major overkill to such a simple requirement.

In last year’s article, I presented a File Exchange utility for such generic serialization/deserialization. However, that utility is limited in the types of data that it supports, and while it is relatively fast, there is a much better, more generic and faster solution.

The solution appears to use the undocumented built-in functions getByteStreamFromArray and getArrayFromByteStream, which are apparently used internally by the save and load functions. The usage is very simple:

byteStream = getByteStreamFromArray(anyData);  % 1xN uint8 array
anyData = getArrayFromByteStream(byteStream);

Many Matlab functions, documented and undocumented alike, are defined in XML files within the %matlabroot%/bin/registry/ folder; our specific functions can be found in %matlabroot%/bin/registry/hgbuiltins.xml. While other functions include information about their location and number of input/output args, these functions do not. Their only XML attribute is type = ":all:", which seems to indicate that they accept all data types as input. Despite the fact that the functions are defined in hgbuiltins.xml, they are not limited to HG objects – we can serialize basically any Matlab data: structs, class objects, numeric/cell arrays, sparse data, Java handles, timers, etc. For example:

% Simple Matlab data
>> byteStream = getByteStreamFromArray(pi)  % 1x72 uint8 array
byteStream =
  Columns 1 through 19
    0    1   73   77    0    0    0    0   14    0    0    0   56    0    0    0    6    0    0
  Columns 20 through 38
    0    8    0    0    0    6    0    0    0    0    0    0    0    5    0    0    0    8    0
  Columns 39 through 57
    0    0    1    0    0    0    1    0    0    0    1    0    0    0    0    0    0    0    9
  Columns 58 through 72
    0    0    0    8    0    0    0   24   45   68   84  251   33    9   64
 
>> getArrayFromByteStream(byteStream)
ans =
          3.14159265358979
 
% A cell array of several data types
>> byteStream = getByteStreamFromArray({pi, 'abc', struct('a',5)});  % 1x312 uint8 array
>> getArrayFromByteStream(byteStream)
ans = 
    [3.14159265358979]    'abc'    [1x1 struct]
 
% A Java object
>> byteStream = getByteStreamFromArray(java.awt.Color.red);  % 1x408 uint8 array
>> getArrayFromByteStream(byteStream)
ans =
java.awt.Color[r=255,g=0,b=0]
 
% A Matlab timer
>> byteStream = getByteStreamFromArray(timer);  % 1x2160 uint8 array
>> getArrayFromByteStream(byteStream)
 
   Timer Object: timer-2
 
   Timer Settings
      ExecutionMode: singleShot
             Period: 1
           BusyMode: drop
            Running: off
 
   Callbacks
           TimerFcn: ''
           ErrorFcn: ''
           StartFcn: ''
            StopFcn: ''
 
% A Matlab class object
>> byteStream = getByteStreamFromArray(matlab.System);  % 1x1760 uint8 array
>> getArrayFromByteStream(byteStream)
ans = 
  System: matlab.System

Serializing HG objects

Of course, we can also serialize/deserialize also HG controls, plots/axes and even entire figures. When doing so, it is important to serialize the handle of the object, rather than its numeric handle, since we are interested in serializing the graphic object, not the scalar numeric value of the handle:

% Serializing a simple figure with toolbar and menubar takes almost 0.5 MB !
>> hFig = handle(figure);  % a new default Matlab figure
>> length(getByteStreamFromArray(hFig))
ans =
      479128
 
% Removing the menubar and toolbar removes much of this amount:
>> set(hFig, 'menuBar','none', 'toolbar','none')
>> length(getByteStreamFromArray(hFig))
ans =
       11848   %!!!
 
% Plot lines are not nearly as "expensive" as the toolbar/menubar
>> x=0:.01:5; hp=plot(x,sin(x));
>> byteStream = getByteStreamFromArray(hFig);
>> length(byteStream)
ans =
       33088
 
>> delete(hFig);
>> hFig2 = getArrayFromByteStream(byteStream)
hFig2 =
	figure

The interesting thing here is that when we deserialize a byte-stream of an HG object, it is automatically rendered onscreen. This could be very useful for persistence mechanisms of GUI applications. For example, we can save the figure handles in file so that if the application crashes and relaunches, it simply loads the file and we get exactly the same GUI state, complete with graphs and what-not, just as before the crash. Although the figure was deleted in the last example, deserializing the data caused the figure to reappear.

We do not need to serialize the entire figure. Instead, we could choose to serialize only a specific plot line or axes. For example:

>> x=0:0.01:5; hp=plot(x,sin(x));
>> byteStream = getByteStreamFromArray(handle(hp));  % 1x13080 uint8 array
>> hLine = getArrayFromByteStream(byteStream)
ans =
	graph2d.lineseries

This could also be used to easily clone (copy) any figure or other HG object, by simply calling getArrayFromByteStream (note the corresponding copyobj function, which I bet uses the same underlying mechanism).

Also note that unlike HG objects, deserialized timers are NOT automatically restarted; perhaps the Running property is labeled transient or dependent. Properties defined with these attributes are apparently not serialized.

Performance aspects

Using the builtin getByteStreamFromArray and getArrayFromByteStream functions can provide significant performance speedups when caching Matlab data. In fact, it could be used to store otherwise unsupported objects using the save -v6 or savefast alternatives, which I discussed in my save performance article. Robin Ince has shown how this can be used to reduce the combined caching/uncaching run-time from 115 secs with plain-vanilla save, to just 11 secs using savefast. Robin hasn’t tested this in his post, but since the serialized data is a simple uint8 array, it is intrinsically supported by the save -v6 option, which is the fastest alternative of all:

>> byteStream = getByteStreamFromArray(hFig);
>> tic, save('test.mat','-v6','byteStream'); toc
Elapsed time is 0.001924 seconds.
 
>> load('test.mat')
>> data = load('test.mat')
data = 
    byteStream: [1x33256 uint8]
>> getArrayFromByteStream(data.byteStream)
ans =
	figure

Finally, note that as built-in functions, these functions could change without prior notice on any future Matlab release.

MEX interface – mxSerialize/mxDeserialize

To complete the picture, MEX includes a couple of undocumented functions mxSerialize and mxDeserialize, which correspond to the above functions. I assume that getByteStreamFromArray and getArrayFromByteStream call them internally. Back in 2007, Brad Phelan wrote a MEX wrapper that could be used directly in Matlab (mxSerialize.c, mxDeserialize.c). The C interface was very simple, and so was the usage:

#include "mex.h"
 
EXTERN_C mxArray* mxSerialize(mxArray const *);
EXTERN_C mxArray* mxDeserialize(const void *, size_t);
 
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    if (nlhs && nrhs) {
          plhs[0] = (mxArray *) mxSerialize(prhs[0]);
        //plhs[0] = (mxArray *) mxDeserialize(mxGetData(prhs[0]), mxGetNumberOfElements(prhs[0]));
    }
}

Unfortunately, MathWorks has removed the C interface for these functions from libmx in R2014a, keeping only their C++ interfaces (If you have ever used these C++ interfaces, please leave a comment below):

mxArray* matrix::detail::noninlined::mx_array_api::mxSerialize(mxArray const *anyData)
mxArray* matrix::detail::noninlined::mx_array_api::mxDeserialize(void const *byteStream, unsigned __int64 numberOfBytes)
mxArray* matrix::detail::noninlined::mx_array_api::mxDeserializeWithTag(void const *byteStream, unsigned __int64 numberOfBytes, char const* *tagName)

The roundabout alternative is of course to use mexCallMATLAB to invoke getByteStreamFromArray and getArrayFromByteStream. This is actually rather silly, but it works…

These are not the only MEX functions that were removed from libmx in R2014a. Hundreds of other C functions were also removed with them, some of them quite important. A few hundred new C++ functions were added in their place, but I fear that these are not readily accessible to MEX users. libmx has always changed between Matlab releases, but not so drastically for many years. If you rely on any undocumented MEX functions in your code, now would be a good time to recheck it, before R2014a is officially released.

p.s. – Happy 30th anniversary, MathWorks!

 
Related posts:
  1. Controlling plot data-tips Data-tips are an extremely useful plotting tool that can easily be controlled programmatically....
  2. Draggable plot data-tips Matlab's standard plot data-tips can be customized to enable dragging, without being limitted to be adjacent to their data-point. ...
  3. Accessing plot brushed data Plot data brushing can be accessed programmatically using very simple pure-Matlab code...
  4. Undocumented Matlab MEX API Matlab's MEX API contains numerous undocumented functions, that can be extremely useful. ...
 

Viewing all articles
Browse latest Browse all 219

Trending Articles