Block Core Affinity

I just pushed an update to master/next of GNU Radio that allows us to specifically set the affinity of a block to a particular core. We can do this pretty easily because each block is its own thread in the Thread-Per-Block scheduler, so really what we are doing is setting the thread affinity.

The reason to introduce this behavior is to help us start developing an understanding of GNU Radio on many-core machines. For example, we've been contacted by the Parallella project, and this kind of functionality might be crucial to working well on that platform. On most current systems, we can talk about multi-core, which are a small (up to a dozen or two) number of the same types of processors with shared memory and local cache. The OS scheduler probably does a fantastic job with scheduling the threads in this situations, and kernels like Linux have alternative scheduling strategies we can use for specific purposes. But in a many-core system where we have heterogeneous cores and maybe different access to memory transfers or DMA to IO devices, the ability to selectively choose which cores to run on may be more important. Think about a situation where you want your UHD source block to run on the core that has the closest/fastest access to the GigE system. I'm not saying that this will actually work or help in these scenarios, but it's worth it to make this kind of research and experimentation available.

The API for this is pretty straight-forward, and block affinity can be set before and during the flowgraph run. On a given block, you just call the function 'set_processor_affinity()'.

Notice that the function takes a list (a Python list or a C++ vector of ints), so you can set the affinity to a group of cores if you want. There's an example of how to use this in gnuradio-core/src/examples/mp-sched/affinity_set.py. When you run it, watch your CPU usage monitor(s). This runs two gigantic filters that are designed to take up most of a core, one is set to core 0 and the other is set to either core 0 or 1. As long as at least 1 of the cores specified is present, the system will just use that, so this example will run even on single-core machines (but I have not tested this behavior in Windows). On systems with 2 or more cores, core 0 will be largely taken up by the first filter. The second filter could use core 0, but will likely be scheduled to use core 1.

One thing to note: this is not fully cross-platform. Setting thread affinity seems to be a delicate subject and not fully supported. In fact, the pthread library call pthread_setaffinity_np has that suffix 'np' to specify that it is implicitly non-portable. The GNU Radio Utilities and Etc. Library (GRUEL) has been updated to provide abstracted calls for the different OSes that we support for this. Right now, we have Linux and Windows support. I tried to get OSX support, but their API isn't exactly what we are looking for (they are interested in sharing cache lines, not specifically tying a thread to a core). So until things change, our GRUEL thread affinity calls in OSX are nops.

In the Doxygen manual, look for the Related Pages page named 'Block Thread Affinity' for more details.