Blog

  • Some notes on matplotlib

    Matplotlib has two modes, ‘implicit’, and ‘object oriented’. Most code I know seems to use the ‘implicit’ method, which is to say, they use

    import matplotlib.pyplot as plt
    

    Now, plt.figure is nice, but an implicit Axes object is created, and its kind of hard to get that object back. You might want Axes, so you could get the range in data coordinates, of the plot. So instead of using figure, use subplots:

    fig, ax = plt.subplots(figsize=(10,8))
    
    # plot some stuff here
    
    ax.viewLim.height # => the extent of the plot, in data units.
    

    I needed this to reposition some labels offset from a bar chart:

    # Add the text 'up' to days when close >= open, and vice versa
    # ax.viewLim returns bounding box in data units. This allows us to get a normalized padding
    padding = 0.01 * ax.viewLim.height
    for p in up_bars.patches:
      plt.text(p.get_x(), p.get_y() + p.get_height() + padding, 'up',
               color='green',
               fontsize='large',
               horizontalalignment='center',
               verticalalignment='bottom')
    

    It turns out this tom-foolery is not actually needed. matplotlib.Text allows us to specify in addition to an xy, an xytext and textcoords, which is the ‘coordinate system that xytext is given in’. This allows us to do:

    for p in up_bars.patches:
        plt.annotate('up',
                     (p.get_x(), p.get_y() + p.get_height()),
                     xytext=(0,5),
                     textcoords='offset points',
                     color='green',
                     fontsize='large',
                     horizontalalignment='center',
                     verticalalignment='bottom')
    

    Now, this is nice. xytext is specified in ‘offset points’, so no crazy conversion needed. But notice that this function is matplotlib.pyplot.annotate, not matplotlib.pyplot.text!

    And the signature is completely different! matplotlib.pyplot.text is x,y,str,**kwargs and matplotlib.pyplot.annotate is str,xy,**kwargs. Go figure.

    Also, it turns out that 3.5.0 has bar_label, so depending on the matplotlib version you’re on, you may be able to just use that. In any case, the deep dive into the source was fun.


  • A little bit about gamma

    Gamma is a kind of compression algorithm. It exploits the fact that our perception of brightness, as with many other sensory stimuli, follow the power law. Our eyes are much better at detecting the difference between 2 values that are darker, as opposed to two values that are brighter. From an evolutionary standpoint, this makes sense: it is much more advantageous to be able to see in the dark, than to accurately tell between two really bright shades.

    Gamma encoding exploits this so that each bit in the image data is used as efficiently as possible. Instead of storing the camera sensor data linearly, the data is encoded so that more bit space is given to darker ranges, and less to lighter ranges, mimicking our perception. This is why gamma encoded images are said to be perceptually linear.

    At the very simplest, gamma encoding uses a single value, gamma, which is typically 2.2. I cannot find any good explanation for where this value comes from, though my Macbook Pro monitor shows a gamma settings of 2.4. To gamma encode, we simply take pow(input,1/2.2). And to decode, we take pow(input, 2.2).

    Why is this important?

    Image algorithms are made to work on linear values, not perceptually linear values.

    Say we wanted the average of 2 pixels, 1 black, 1 white. If simply average their rgb values, we get 128. But 128 appears to be only 22% as bright, which is not what we expect. What we want is to get 50% brightness. So the correct way of doing this is to gamma decode the values first, do the average, then reencode them.

    These two python functions describe how one might encode and decode a value.

    def encode(x):
    	return math.pow(x/255.0, 1/2.2) * 255
    
    def decode(x):
    	return math.pow(x/255.0, 2.2) * 255
    

    Photography

    As a bonus, if you are a photography nerd and have messed around with tonal curves, then you already understand gamma intuitively. A higher slope in the tonal curve causes the values in the input range to be redistributed to a wider output range, resulting in more contrast. Gamma encoding is like a tonal curve applied to the entire input range.

    Helpful links:

    http://blog.johnnovak.net/2016/09/21/what-every-coder-should-know-about-gamma

    https://www.cambridgeincolour.com/tutorials/gamma-correction.htm


  • Concurrency vs Parallelism in Ruby Apps

    A thing that came up some weeks ago which confused me, is whether languages like Python and Ruby are multithreaded. This my attempt to explain to myself how it works in Ruby, and I hope it helps you too. Firstly, we need to distinguish concurrency and parallelism, which are conflated with multithreading, but are not the same. Concurrency can be thought of as interleaving, so if two jobs are switched back and forth very quickly, there is a sense that both are being done ‘at the same time’, but they are merely being done concurrently. For example, you may be eating food, and drinking beer. Take a bite, then sip, then bite, then sip, so you are concurrently drinking beer and eating food. But you are not literally drinking beer and eating food at the same time, that would require you to have both the cup to your lips and fork to your lips at the same time(in parallel), which is not possible. To do so you’d need two mouths.

    Second, Ruby has several implementations, the most popular of which is MRI(Matz’s Ruby Implementation), named after the Ruby creator Yukihiro Mastumoto. This is the canonical ‘Ruby’ that everyone refers to when they say ‘Ruby, the programming language’. In MRI, there is something called GIL(Global Interpreter Lock) that ensures that only one thread is ever running at once. Why the GIL is there in the first place will be a rabbit hole for another time. Which means when you call Thread.new in Ruby and schedule a job on it, it isn’t really running in parallel, because the GIL is locking Ruby code. There are other Ruby implementations like JRuby, that does not have the GIL and on those implementations, true parallelism is possible. On the Python side, the story is the same, with CPython(the default implementation) having a GIL.

    However, ruby threads are also native threads(only true as of Ruby 1.9). This means that every Ruby thread is backed by an OS thread. When a Ruby application blocks on i/o, the ruby runtime can actually switch to allow another thread to continue running, because this ‘blocking’ happens outside of the GIL. For example, if your Ruby application makes a network request, and is waiting for the network to respond, it can actually release its lock on the GIL, and allow another thread to serve an incoming request. When the network contents are fetched, the OS interrupts the blocked thread, and allows the thread to resume. And so it can be said that some amount of parallelism is happening in here! However, this only happens for I/O operations. To contrast it, if you had two threads handling incoming web requests, and two requests came in at the same time, you can bet that whether the first thread handles both requests, or whether they are each handled by different threads, the GIL will ensure that only one thread is only handling a request at any given time. In short: No parallelism during compute only operations. This is good news for Ruby and web applications, since the nature of web applications are that they are i/o bound. This means that most of the time, Ruby applications are blocked waiting for the database or the network. As discussed above, this block happens outside of the GIL, and so if a request arrives while the current thread is blocked on i/o, Ruby can execute the thread that serves that request.

    Now, what about Puma? Doesn’t that enable parallelism? Yes, parallelism happens with Puma, but through a different mechanism. Puma forks multiple OS processes, creating multiple copies of your apps in memory(multiprocessing). As a reminder, processes provides the resources needed to execute a program, and are isolated from each other by the OS, have their own virtual address space, executable code, environment variables, process identifier, and at least one thread of execution(the main thread). Threads on the other hand, is an entity within a process that can be scheduled for execution, but shares the processes’s virtual address space and system resources.

    So when Puma starts 5 worker processes, there are 5 copies(processes) of Rails apps running, isolated from each other by the OS. These copies live in memory(and thus takes up RAM), have their own db connection pools, and so on. However, if there are less than 5 cpu cores on the machine it is running on, our 5 workers would not be able to achieve full (compute) parallelism under peak load. Each worker then transparently schedules additional threads to serve the ruby application, so that the application doesn’t need to think about it.

    To come back to the original question, does Ruby support multithreading? If we simply define multithreading as having a thread primitive, then Ruby and Python are definitely multithreaded. But that doesn’t mean that those threads are running in parallel. But even if they aren’t running in parallel, multithreading in Ruby speeds up web applications because they are i/o heavy!

    Thanks to Tom Clark for reviewing drafts of this.