<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>asyndetic.com &#187; embedded</title>
	<atom:link href="http://www.asyndetic.com/blog/tag/embedded/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.asyndetic.com/blog</link>
	<description></description>
	<lastBuildDate>Tue, 17 Jan 2012 06:49:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>Visualizing the Heap on Embedded Systems – Part II</title>
		<link>http://www.asyndetic.com/blog/2010/07/30/visualizing-the-heap-on-embedded-systems-part-ii/</link>
		<comments>http://www.asyndetic.com/blog/2010/07/30/visualizing-the-heap-on-embedded-systems-part-ii/#comments</comments>
		<pubDate>Fri, 30 Jul 2010 06:13:52 +0000</pubDate>
		<dc:creator>Dan Savilonis</dc:creator>
				<category><![CDATA[embedded]]></category>
		<category><![CDATA[heap]]></category>
		<category><![CDATA[malloc]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://asyndetic.com/blog/?p=55</guid>
		<description><![CDATA[In the last article, I described a method to acquire heap allocation data from an embedded system. Next, I&#8217;ll describe how to visualize the data. First, though, to make things easier, I&#8217;ll acquire some real data from a regular Linux application. For simplicity, I profiled a Linux application built with gcc, but the same principle [...]]]></description>
			<content:encoded><![CDATA[<p>In the last article, I described a method to acquire heap allocation data from an embedded system. Next, I&#8217;ll describe how to visualize the data. First, though, to make things easier, I&#8217;ll acquire some real data from a regular Linux application.</p>
<p>For simplicity, I profiled a Linux application built with gcc, but the same principle applies to an embedded application. Using gcc, there is a useful shortcut to wrap the allocation function calls, which doesn&#8217;t require editing the code or modifying the objects directly. The linker ld provides a built-in option <em>&#8211;wrap</em> which will replace a symbol with <em>__wrap_symbol</em> which in turn can call<em> __real_symbol</em> to call the original function. You can pass this option through gcc to the linker as appropriate:</p>
<pre>CFLAGS = -g -O0 -Wall -Wl,--wrap,malloc -Wl,--wrap,free -Wl,--wrap,calloc -Wl,--wrap,realloc</pre>
<p>Next, just define your wrappers in a fashion similar to the previous article:</p>
<pre class="brush: cpp; title: ; notranslate">
#include &lt;stdio.h&gt;

void *__wrap_malloc(size_t size)
{
	void *ret;

	ret = __real_malloc(size);
	fprintf(stderr, &quot;m,%d,0x%8.8x\n&quot;, size, (unsigned int)ret);
	return ret;
}

void __wrap_free(void *ptr) {
	__real_free(ptr);
	fprintf(stderr, &quot;f,,0x%8.8x\n&quot;, (unsigned int)ptr);
	return;
}

void *__wrap_realloc(void *ptr, size_t size)
{
	void *ret = __real_realloc(ptr, size);
	fprintf(stderr, &quot;f,,0x%8.8x\n&quot;, (unsigned int)ptr);
	fprintf(stderr, &quot;m,%d,0x%8.8x\n&quot;, size, (unsigned int)ret);
	return ret;
}

void *__wrap_calloc(size_t nmemb, size_t size)
{
	void *ret = __real_calloc(nmemb, size);

	fprintf(stderr, &quot;c,%d,0x%8.8x\n&quot;, size*nmemb, (unsigned int)ret);
	return ret;
}
</pre>
<p>Now, I just ran the newly compiled program and directed stderr to a file, and then ran the malloc_analyze.py program as before.</p>
<p>Finally, we can produce something useful from this data to analyze our heap usage. First, it&#8217;s good to take a look at the overall heap usage over time (or in this case, over number of allocation calls). You can do this easily enough in a pylab session:</p>
<pre class="brush: python; title: ; notranslate">
x = csv2rec('fetchorigin_analyzed.txt', names=('type','size','addr','total','count'))
plot(x.total)
xlabel('Number of allocations')
ylabel('Heap Allocated (bytes)')
</pre>
<div id="attachment_77" class="wp-caption aligncenter" style="width: 310px"><a href="http://asyndetic.com/blog/wp-content/uploads/2010/07/heap_total.png"><img class="size-medium wp-image-77 colorbox-55" title="Total Heap Usage" src="http://asyndetic.com/blog/wp-content/uploads/2010/07/heap_total-300x219.png" alt="" width="300" height="219" /></a><p class="wp-caption-text">Total Heap Usage</p></div>
<p>With this graph, you can now choose some points at which you&#8217;d like to visualize the heap usage and fragmentation. For this example, I chose three points: 400, 800 and 900. I wrote a short script to take the analyzed data and plot it as a bar, color-coded to based on how much space is in use at a given point in the memory space. The general idea is:</p>
<ol>
<li>Find the first (lowest) address where heap memory is allocated.</li>
<li>Map each addressable byte in the memory region to a 1 or 0 depending on whether it is in use</li>
<li>Bin the data into reasonable sized chunk (e.g. 512 bytes) and calculate how much of the space is in use</li>
<li>Produce a graph with a colored box for each bin, coded by how full it is.</li>
</ol>
<p>I wrote a simple script to do this. It&#8217;s certainly not very efficient, but it works just fine for small data sets that I was dealing with. The script needs to be adjusted for the start address and points in time at which to produce a graph.</p>
<pre class="brush: python; title: ; notranslate">
import csv
import sys
import cPickle
import numpy as N
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
import time

binsize = int(sys.argv[2])

fig = plt.figure()
p = 1

for end in [400, 800, 900]:
    mem = csv.reader(open(sys.argv[1]), delimiter=',')

    ax = fig.add_subplot(9,1,p)
    p = p + 1

    mmap = []
    mmap.extend([0 for x in range(0,1024*160)])

    mdata = []
    for row in mem:
        mdata.append(row)

    for row in mdata[0:end]:

        size = long(row[1])
        addr = long(row[2], 16) - 0x12c0010

        if row[0] == 'm' or row[0] == 'c':
            for loc in range(addr, addr + size):
                mmap[loc] = 1

        elif row[0] == 'f':
            for loc in range(addr, addr + abs(size)):
                mmap[loc] = 0

    print str(len(mdata[0:end]))

    mmapbin = []
    mmapbin.extend([0 for x in range(0,len(mmap) / binsize)])

    for x in range(0, len(mmap)):
        mmapbin[x / binsize] = mmapbin[x / binsize] + mmap[x]

    patches = ax.bar(N.arange(len(mmapbin)), [1 for x in mmapbin], linewidth=0, width=1)

    fracs = [float(x) / binsize for x in mmapbin]
    norm = colors.normalize(0, max(fracs))

    for thisfrac, thispatch in zip(fracs, patches):
        color = cm.jet(norm(thisfrac))
        thispatch.set_facecolor(color)

    ax.set_yticks([0, 1])
    ax.set_ylabel(str(end))

    fig.show()
</pre>
<p>And the result you get is a nice picture that can give you an intuitive view of how your heap is fragmented.</p>
<div id="attachment_78" class="wp-caption aligncenter" style="width: 310px"><a href="http://asyndetic.com/blog/wp-content/uploads/2010/07/heap_view.png"><img class="size-medium wp-image-78 colorbox-55" title="Heap Visualization" src="http://asyndetic.com/blog/wp-content/uploads/2010/07/heap_view-300x214.png" alt="" width="300" height="214" /></a><p class="wp-caption-text">Visualization of the heap at three points in time</p></div>
<p>The graph could be cleaned up, but you can get a very good idea of what the heap looks like just from this graph. The x-axis units are in bin sizes, and could be converted to memory addresses to be more useful.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.asyndetic.com%2Fblog%2F2010%2F07%2F30%2Fvisualizing-the-heap-on-embedded-systems-part-ii%2F&amp;title=Visualizing%20the%20Heap%20on%20Embedded%20Systems%20%E2%80%93%20Part%20II" id="wpa2a_2"><img class="colorbox-55"  src="http://asyndetic.com/blog/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.asyndetic.com/blog/2010/07/30/visualizing-the-heap-on-embedded-systems-part-ii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Visualizing the Heap on Embedded Systems</title>
		<link>http://www.asyndetic.com/blog/2010/02/21/visualizing-the-heap-on-embedded-systems/</link>
		<comments>http://www.asyndetic.com/blog/2010/02/21/visualizing-the-heap-on-embedded-systems/#comments</comments>
		<pubDate>Sun, 21 Feb 2010 23:35:57 +0000</pubDate>
		<dc:creator>Dan Savilonis</dc:creator>
				<category><![CDATA[embedded]]></category>
		<category><![CDATA[heap]]></category>
		<category><![CDATA[malloc]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://asyndetic.com/blog/?p=13</guid>
		<description><![CDATA[Debugging a memory leak can seem trivial compared to debugging fragmentation. Faced with such a problem recently, I decided I really needed to visualize what the heap looked like to determine how to fix the problem. Many embedded systems avoid using dynamic memory allocation entirely to avoid just this kind of problem, among others. But [...]]]></description>
			<content:encoded><![CDATA[<p>Debugging a memory leak can seem trivial compared to debugging fragmentation. Faced with such a problem recently, I decided I really needed to visualize what the heap looked like to determine how to fix the problem.</p>
<p>Many embedded systems avoid using dynamic memory allocation entirely to avoid just this kind of problem, among others. But the larger your system is, the more likely you are to want it to make life easier and more efficient. If your system is large enough, you may end up running embedded Linux and have all the power that entails. But if you&#8217;re stuck in the middle, you have an embedded system running some kind of RTOS using malloc and not nearly enough memory to do much of any useful debugging on target. In my case, we do have some tools for memory tracking and enough memory to theoretically use them, but they didn&#8217;t do what I wanted.</p>
<p>I was left uninspired after a number of Google searches, so I decided to blaze my own path and develop my own tool. There are three primary steps to visualizing heap data in this approach:</p>
<ol>
<li>Instrument ([mc]|re)alloc and free calls</li>
<li>Process raw data</li>
<li>Visualize processed data</li>
</ol>
<p>I had previously worked on the first two steps, and saved my work, but it was really a one-off hack. This time I decided to automate the entire process.</p>
<h2>Instrument ([mc]|re)alloc and free calls</h2>
<p>The first step involves modifying your code to run on target with sufficient instrumentation to record memory allocation.</p>
<h3>Instrument memory allocation calls</h3>
<p>The first thing you need to do is modify ([mc]|re)alloc and free calls to so that they can do extra processing. If you&#8217;ve got the source code to your malloc and free, then you&#8217;re done; you can modify it directly. The same applies if you consistently use a malloc wrapper in your code. In my case, neither was true, so I had to do more work.</p>
<p>There are are least two options:</p>
<ol>
<li>Replace all calls with wrapped calls</li>
<li>Link in replacement versions of ([mc]|re)alloc and free</li>
</ol>
<p>The second option seemed more difficult than what I wanted to deal with since these functions are not weakly linked, and this would require some linker magic. So, I opted for option 1.</p>
<p>I wrote a simple Python script to replace all calls to malloc, calloc and free with instrumented versions: imalloc, icalloc and ifree. realloc is not used in our codebase. The script is not a lexer, but a few simple regular expressions were sufficient to catch all instances of these calls (and the ones inside comments, for good measure).</p>
<pre class="brush: python; title: ; notranslate">
import fileinput
import os
import fnmatch
import re
import sys

malloc_re = re.compile(r'\bmalloc[(](.*?)[)]')
calloc_re = re.compile(r'\bcalloc[(](.*?)[)]')
free_re = re.compile(r'\bfree[(](.*?)[)]')

for root, dirs, files in os.walk('.'):
    candidates = fnmatch.filter(files, '*.c')
    for name in candidates:
        print name

        for line in fileinput.input(os.path.join(root, name), inplace=1):

            line = re.sub(malloc_re, r'imalloc(\1)', line)
            line = re.sub(calloc_re, r'icalloc(\1)', line)
            line = re.sub(free_re, r'ifree(\1)', line)

            sys.stdout.write(line)
</pre>
<p>My litmus test for this script was whether the code linked. It did.</p>
<h3>Writing imalloc</h3>
<p>The next step is highly platform-dependent and will vary given your needs and restrictions. The basic idea is to write wrapper functions that record the parameter of malloc. So, first, just call the appropriate real function inside the wrappers. Next, decide how you want to collect your data. You could store it in memory, but that&#8217;s almost certainly not practical given the constraints mentioned in this article. I decided to push the data out through a UART running at 115200 baud. You can potentially use any external interface, though choosing one that requires use of malloc would likely be a bad idea. I choose the UART simply because it was the first thing that came to mind, but it in fact has a number of desirable features:</p>
<ol>
<li>Relatively fast (compared to some other choices, like the JTAG debugger messaging interface)</li>
<li>Extremely simple (doesn&#8217;t use malloc, simple interface requires little stack use)</li>
<li>Already bound to stdio in this application</li>
</ol>
<p>The second point is important to keep in mind. You may call malloc all over the place in your code, in the context of different threads. In my case, some of the threads are huge with 8K of stack space, but others are tiny (a couple hundred bytes) with little room to spare. It is safest to keep the instrumentation as lightweight as possible so you don&#8217;t blow the stack. I was overly optimistic the first time I tried this and just called printf directly. I ended up settling with an implementation that seems to work well enough:</p>
<pre class="brush: cpp; title: ; notranslate">
static char str[64];
...
str = sprintf(&quot;m,%8.8x,%8.8x&quot;, size, ptr);
puts(str);
</pre>
<p>This generates a line of CSV, with the type of call (malloc), size allocated, and address of the allocation. A similar implementation for free does the same thing, but leaves the size parameter blank since it is unknown. My very first attempt included str on the stack, but that blew up in some of the tighter threads. Hopefully you&#8217;ll realize the big mistake I made here without much thought quicker than I did. To my meager defense, when I initially did this, I was only interested in initial mallocs at startup.</p>
<p>Being lazy, I stuck this code in an existing C file and turned off the annoying stop on warnings compiler flag we had enabled so the new wrapper functions would link. I could have been more thorough and either modified stdlib.h or included a new header file, but I didn&#8217;t.</p>
<p>This ran fine until multiple threads started calling malloc and my analysis script started reporting errors. Of course, my wrapper function is missing the global lock that malloc uses for thread-safety. I added global lock calls around the wrappers to create the final code:</p>
<pre class="brush: cpp; title: ; notranslate">
void *imalloc(size_t size) {
    void *ptr;
    static char str[64];

    __global_lock_acquire();
    ptr = malloc(size);
    str = sprintf(&quot;m,%8.8x,%8.8x&quot;, size, ptr);
    puts(str);
    __global_lock_release();

    return ptr;
}
</pre>
<p>Running the instrumented code on target produces a nice long list of CSV which I can record indefinitely with a terminal program attached to the UART:</p>
<pre class="brush: plain; title: ; notranslate">
m,12,0x50000000
m,10,0x5000000C
c,40,0x50000034
f,,0x50000000
f,,0x5000000C
m,12,0x50000000
</pre>
<p>We&#8217;ve now acquired all the data we need from the target. By itself, it&#8217;s not very insightful, but once it is processed we can learn a great deal from it.</p>
<h2>Processing the Raw Data</h2>
<p>The raw data contains only the information known from the parameters of the functions called. Thus, we don&#8217;t have the size of the allocation freed in the free lines. This can be taken care of easily enough with a small script and linear search backward through the data. For example, given</p>
<p>f,,0&#215;50000000</p>
<p>We don&#8217;t know how much was allocated at 0&#215;50000000. However, if we step backward through the list, we can look for the first allocation line we find with the same address:</p>
<p>m,12,0&#215;50000000</p>
<p>We now know the allocation size (12) and can fill in the missing value and move on to the next entry.</p>
<p>My data was interlaced with other debug trace, so I passed it through a line filter first. Then, I wrote another script to backfill the free addresses:</p>
<pre class="brush: python; title: ; notranslate">
import csv
import sys

mem = csv.reader(open(sys.argv[1]), delimiter=',')

mallocd = {}
total = 0
num = 0

for row in mem:
    if row[0] == 'm' or row[0] == 'c':
        row[1] = str(int(row[1]))
        if row[2] in mallocd:
            # This section was for debugging problems with the data
            if int(row[1], 16) == int(mallocd[row[2]][1],16):
                pass
            else:
                pass
        mallocd[row[2]] = row
        total += int(row[1])
        row.append(str(total))
        row.append(str(num))
        print ','.join(row)
    elif row[0] is 'f':
        try:
            row[1] = str(-int(mallocd[row[2]][1]))
            del mallocd[row[2]]
        except KeyError:
            row[1] = str(0)
        total += int(row[1])
        row.append(str(total))
        row.append(str(num))
        print ','.join((row))
    else:
        print row
        raise Exception
    num += 1
</pre>
<p>The script parses the raw data as csv and does a simple reverse linear search for each free entry until it finds a malloc or calloc with matching address. If all goes well, a backfilled version of the csv file will print to stdout. However, it is important to ensure that your data are valid and the script above does very little validation. I added some additional debugging statements initially to ensure that everything matched up. There are a number of problems that can occur (and did occur for me):</p>
<ul>
<li><strong>Not all allocations are recorded</strong>. This could happen if your IO device (a UART in this case) is not initialized before the first allocation occurs.</li>
<li><strong>Allocations don&#8217;t match deallocations</strong>. This can happen because of missing output (IO not functional during certain periods) or if you get overzealous trying to save memory like I did. When I allocated the string array as a static function variable, it worked great until the RTOS was started and threads started overwriting the data. Don&#8217;t forget to acquire a lock!</li>
</ul>
<h2>What&#8217;s Next</h2>
<p>In a subsequent article, I&#8217;ll describe how to visualize the processed data.</p>
<p><a class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwww.asyndetic.com%2Fblog%2F2010%2F02%2F21%2Fvisualizing-the-heap-on-embedded-systems%2F&amp;title=Visualizing%20the%20Heap%20on%20Embedded%20Systems" id="wpa2a_4"><img class="colorbox-13"  src="http://asyndetic.com/blog/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.asyndetic.com/blog/2010/02/21/visualizing-the-heap-on-embedded-systems/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->
