Blog

  1. Why Spark Suffers on Wide Iceberg Tables

    At my previous job, we struggled mightily with reading and writing wide iceberg tables with Spark. Watching this Future Data Systems talk by Russell Spitzer, I think I finally understand why. He mentions it almost as a footnote, in response to a question about the Iceberg REST catalog:

    One of the most common problems that people have with really wide tables…the way Parquet is constructed, even though they have columnar representations of your data, you have to keep all of the column vectors for the same row group in the same file…your files are very wide and your columns end up being very short to end up in the same file

    Read More

  2. p5.js and SSR

    Thought I would share some of my experience getting p5 to run in an SSR(server-side rendering) environnment. I would like to explain SSR in a separate blog post, but for now, you might want to do this because you build with React and want to render parts of website on the server, but also want to include a p5.js sketch.

    The issue is that p5.js references the ‘window’ object on import, so any attempt to include the library, even if the p5 module is not utilized on the server side, fails. Below is a bare bones minimum component definition that has worked for me.

    Read More

  3. Profiling and improving load times of static assets

    In my project on visualizing waste basket locations in NYC, I had been importing csv files in javascript, which was converted to modules through d3-dsv. While these typed imports, together with manually provided type definitions allow us to work with more confidence when these modules are imported, they come at the cost of large module sizes:

    Note that the javscript asset is 2MB, uncompressed.

    This is because the csv is serialized into objects, which then get packaged with the module. Looking into this js asset, we see that the object is serialized like so.

    Read More

  4. Smoother scrolling with densely rendered map points

    In a previous blog post, I had tried to speed up rendering of leaflet markers using an image. While I think this still presents an interesting approach for rendering dense map points, I didn’t think the 50k points should have caused a slowdown. Furthermore, there were two other issues:

    1. When zooming in closely, you can see the pixelation from the image. The generated image needs to balance size and generation speed.
    2. The leaflet native library does not scroll smoothly - there seems to be plugins that handle this but I wanted something that would work out of the box.

    This video shows the issue:

    Read More

  5. Speeding up Leaflet Markers

    I recently published a map of the collection of wastebaskets in NYC. Initially, I was curious about whether I can show that the density and availability of wastebaskets vastly differs depending on which borough you are in. It was a project that helped me understand how to work with geospatial data, everything from downloading OSM layers and objects, querying PostGIS, and understanding the conversion between different mapping projections. But right now I’d just like to discuss the performance of map markers on Leaflet.

    Read More

  6. Animating leaflet points

    A naive way of animating points in leaflet.
    Read More

  7. Control flow of promises vs Rx-Observables

    Coming from years of iOS development, I’ve come to be pretty fond of Rx-streams and observables as a way of taming state changes in a frontend-mobile environment. Javascript promises predates the development of Rx frameworks like ReactiveCocoa and RxSwift, so I was a little surprised to learn that Promises don’t offer the lazy evaluation that I’ve come to associate with reactive style programming.

    For example:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
    const p = new Promise((resolve) => { // (1)
      console.log("within promise");
      setTimeout(() => {
        resolve("hello world"); // (2)
        console.log("promise resolved");
      }, 3000);
    });
    
    console.log("test begins");
    console.log(await p)
    

    The order the resulting logs are:

    Read More