Blog
-
Profiling and improving load times of static assets
In my project on visualizing waste basket locations in NYC, I had been importing csv files in javascript, which was converted to modules through d3-dsv. While these typed imports, together with manually provided type definitions allow us to work with more confidence when these modules are imported, they come at the cost of large module sizes:
Note that the javscript asset is 2MB, uncompressed.
This is because the csv is serialized into objects, which then get packaged with the module. Looking into this js asset, we see that the object is serialized like so.
Can we improve this? Let’s see.
Downloading csvs as a separate asset
The first thing I tried was to not parse the csv file into a module, but to fetch and parse it at runtime.
This results in much better file sizes. Additionally, I also split out the parsing of the csv into a worker thread so as to not block the main thread. However, upon some profiling, I realized that this parse was happening very quickly(for a dataset of 20k points), the complication of spinning up a web workers was not really worth it. The webworker code however, was simple enough.
// main.ts const worker = new Worker(new URL('./worker.ts', import.meta.url), { type: 'module' }); const inventoryPromise: Promise<DSNYBasket[]> = new Promise((resolve, reject) => { worker.onmessage = (e: MessageEvent<DSNYBasket[]>) => { obs() resolve(e.data) } worker.onmessageerror = e => { reject(e) } }) // worker.ts export async function csvParse(url: string): Promise<any[]> { const res = await fetch(url); const text = await res.text(); const lines = text.split('\n'); const headers = lines[0].split(','); const arr = [] for (let i = 1; i < lines.length; i++) { const row = lines[i].split(','); if (row.length !== headers.length) continue; // make sure CSV header names line up with DSNYBasket type const obj = Object.fromEntries( headers.map((h, j) => [h.trim(), row[j]?.trim()]) ) arr.push(obj) } return arr } const res = await csvParse(inventoryURL); self.postMessage(res);
This means that the worker will execute immediately upon load, downloading the csv asset and parsing it, posting a message to the main process when done. The response was wrapped in a promise in main.ts so that the rest of the script can continue executing while the csv loads and parses. I think this is the right approach when data sizes get really large, but I think that threshold is going to be project dependent.
At this point I also started asking myself if the download time of the js asset is really the most important factor. After all, after compression, the js + data asset is under 1MB.
Inspecting network traffic during map load
I’m using the Maptiler API on the free trial to show vector maps. The map styles are beautiful and I would highly recommend checking it out for mapping projects. The instructions for set up are to create a MapLibre map with the style url pointed at something like
https://api.maptiler.com/maps/777daf37-50e3-4c3c-a645-c13a66e712e3/style.json?...
. When maplibre loads, it downloads these from maptiler. The latency for these is hard to measure, but I consistently see that this file(and a related tiles.json) file take 100ms - 150ms to download. Furthermore, these only load after the maplibre has has loaded, at around 500ms. The first map tiles only start downloading around 680ms, which means that the user will only start seeing content around that time.Serving style.json and tile.json assets
By inspecting the time it takes to serve my js assets, I was confident that this could be served up even quicker. For one, by bundling the style.json file as a module, I could include it just as part of the js bundle, and not have to initiate a separate fetch for it. I could not do the same for the style.json file as inlining it fails a runtime check in maplibregl.js.
I also use vite for development and publishing, and while it supposedly generate module preload directives, it doesn’t seem to do it properly. I realized that if I served the tile.json, I could then also preload the asset, so that it doesnt need to be downloaded when its used.
With this approach, the first tile fetch is now at 390ms. Without easy ways to profile the GL draw cycle on canvas, I’m not sure at actually what moment in the load cycle that the screen is filled. However, the lower bound of the load time was dramatically reduced.
Takeaways
I initially thought that reducing large asset sizes would lead to faster load times, but through profiling I found that making sure that I was serving my assets most efficiently delivered the largest time-to-first-draw gains.
-
For smallish datasets, embedding data directly into js results in a more consistent load time. In any case, all of the load times are dwarfed by proper caching implemented at a CDN layer, which is the case with my static website.
-
preload
directives can significantly speed up page load time. In this case, I knew I wanted to load the json files, so I could include a preload manually.
Next steps
This project includes the script as a module
<script url="..." type="module">
. According to MDN, this defers the execution of the script. Between script download and kicking off the tile download, there is about a 100ms delay. This may be script parsing, or it might be Maplibre loading. If it is the latter, then perhaps inlining the map initialization somewhere on the page can result in even faster load times.Check out the latest iteration of the project here DSNY-Baskets
-
-
Smoother scrolling with densely rendered map points
In a previous blog post, I had tried to speed up rendering of leaflet markers using an image. While I think this still presents an interesting approach for rendering dense map points, I didn’t think the 50k points should have caused a slowdown. Furthermore, there were two other issues:
- When zooming in closely, you can see the pixelation from the image. The generated image needs to balance size and generation speed.
- The leaflet native library does not scroll smoothly - there seems to be plugins that handle this but I wanted something that would work out of the box.
This video shows the issue:
The MaplibreGL project has matured a lot since I last looked at it, and seems to take care of the issue. Scrolling is much smoother now:
I used MapLibre’s sources and layers primitives, which seem to draw GL meshes directly, as panning and scrolling speed are smooth. However, a 3px map points is too dense when zoomed out, and so it would be better if the map points could resize dynamically according to map zoom. I think this is more than possible, but for now, I had luck simply specifying layers to be drawn at different zoom levels:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
map.addLayer({ 'id': 'baskets', 'type': 'circle', 'source': 'baskets', 'paint': { 'circle-color': DSNY_COLOR, 'circle-radius': 1.2, 'circle-opacity': 0.5 } }); map.addLayer({ 'id': 'baskets-zoomed-1', 'type': 'circle', 'source': 'baskets', 'minzoom': 13, 'paint': { 'circle-color': DSNY_COLOR, 'circle-radius': 2, 'circle-opacity': 0.5 } }); map.addLayer({ 'id': 'baskets-zoomed-2', 'type': 'circle', 'source': 'baskets', 'minzoom': 14, 'paint': { 'circle-color': DSNY_COLOR, 'circle-radius': 2.5, 'circle-opacity': 0.8 } });
This ensures that when zoomed out, the map looks less cluttered, but points are still visible when zoomed in.
-
Speeding up Leaflet Markers
I recently published a map of the collection of wastebaskets in NYC. Initially, I was curious about whether I can show that the density and availability of wastebaskets vastly differs depending on which borough you are in. It was a project that helped me understand how to work with geospatial data, everything from downloading OSM layers and objects, querying PostGIS, and understanding the conversion between different mapping projections. But right now I’d just like to discuss the performance of map markers on Leaflet.
The dataset I was displaying contained over 20k points, and displayed terrible drag performance when being added in the canonical way:
datapoints.forEach(({lat, lng}) => { L.circle([lat, lng], { color: 'blue', radius: 0.001 }).addTo(map); });
Notice how the the image lags behind the pointer by several seconds. After digging through the source code in vain, I realized that the issue lay in a ‘style flush’. Looking at the profiler more closely, we see that the style flush was caused by adding a class in the _onMove handler of the layer.
This profiling was done retrospectively - while I was working on my visualization, I wanted to speed up the interactivity of map. A tried and true way to do this is to reduce the number of points I was showing on the map. I tried heatmaps, clustering, and even a novel approach by showing the roads that were ‘serviced’ by wastebaskets. I considered sampling but it would need to be fairly intelligent, since I would only want to reduce the density of points only in already dense parts of the map. Nothing felt as clear as displaying the points directly on the map. So what I ended up doing instead was rendering a raster image of the points.
Rendering map points
To do this, we first calculate the extent of our displayed data. This gives us a lat-lng rect that we can then use to compute the aspect ratio of the image we want to generate:
const latLngToContainerBounds = function(map: Map, b: LatLngBounds) { return bounds( map.latLngToContainerPoint(b.getNorthWest()), map.latLngToContainerPoint(b.getSouthEast()) ); } const getExtent = function(arr: LatLngLiteral[]) { let initialBounds = latLngBounds(arr[0], arr[1]!); return arr.reduce((bounds, basket) => bounds.extend(basket), initialBounds); } const targetWidth = 10000; // this is how large of a raster image we want to generate const radiusInM = 50; // this is how big we want to draw each marker const latLngBounds = getExtent(arr); const containerBounds = latLngToContainerBounds(map, latLngBounds); const origin = containerBounds.getTopLeft(); const { x, y } = containerBounds.getSize(); const width = x; const height = y; const aspectRatio = height / width; const scaleFactor = targetWidth / width;
We then take two random points, in this case we just take the first two points:
const p1 = map.containerPointToLatLng([0, 0]); const p2 = map.containerPointToLatLng([0, 1]); const pxInMeter = map.distance(p1, p2); const radiusInPx = Math.max(Math.ceil(radiusInM/pxInMeter), 1);
The variable pxInMeter is the distance in meters, in the map’s CRS, between the two points. This is used as a scale factor, which we use to determine the size(in pixels) of each marker.
const canvas = document.createElement('canvas'); canvas.width = 500; canvas.height = 500; const ctx = canvas.getContext('2d')!; canvas.width = targetWidth; canvas.height = Math.floor(targetWidth * aspectRatio); for (const latlng of datapoints) { const pt = map.latLngToContainerPoint(latlng).subtract(origin).multiplyBy(scaleFactor); ctx.beginPath(); ctx.arc(pt.x, pt.y, radiusInPx, 0, 2 * Math.PI); ctx.fill(); ctx.closePath(); }
And last but not least, we render the image using the canvas.toBlob function and we return as a url which we then display on the map at the extent that we calculated earlier:
canvas.toBlob(b => { if (b == null) { reject('error generating image'); return; } let url = URL.createObjectURL(b) imageOverlay(url, latLngBounds).addTo(map); });
This technique works pretty well, and manages to draw the load the image within 3s, with no issues with drag performance.:
Try it here. Now this obviously introduces some other issues, namely that we now have to hit test for any interaction, and that since the marker is a raster, we will see pixelations as we zoom in. In the future I think this can be combined with Leaflet.LayerGroup.Conditional could allow us to show actual markers when zoomed in enough such that the number of markers are drastically smaller, or we could only render the points that are onscreen at any given point, and updating as the map bounds changes.
-
Animating leaflet points
This is a naive way of animating points in leaflet.
In a css file, add the keyframes and details of the css animation:
@keyframes fadeInOpacity { 0% { opacity: 0; } 100% { opacity: 1; } } .fadein { animation-name: fadeInOpacity; animation-iteration-count: 1; animation-timing-function: ease-in; /* animation-duration is set in code to a random number */ }
Then, where you create the Leaflet circle:
let c = L.circle([0, 10]); c.addTo(map); // add animation to circles let e = c.getElement() as HTMLElement e.classList.add("fadein"); e.style.animationDuration = `${Math.random() * 2}s`
-
Control flow of promises vs Rx-Observables
Coming from years of iOS development, I’ve come to be pretty fond of Rx-streams and observables as a way of taming state changes in a frontend-mobile environment. Javascript promises predates the development of Rx frameworks like ReactiveCocoa and RxSwift, so I was a little surprised to learn that Promises don’t offer the lazy evaluation that I’ve come to associate with reactive style programming.
For example:
const p = new Promise((resolve) => { // (1) console.log("within promise"); setTimeout(() => { resolve("hello world"); // (2) console.log("promise resolved"); }, 3000); }); console.log("test begins"); console.log(await p)
The order the resulting logs are:
within promise test begins promise resolved hello world
I think several points were surprising at first:
- Immediately after the const t is created, the body of the promise closure is evaluated, which means that one cannot create multiple promises and ’then’ only one of the promises, since all of them will be evaluated. This also means that the long running task in the closure must use one of the built in IO or wait methods that can escape Javascript’s single-threadedness.
- The ‘resolve’ call only records the response of the function, and the control flow does not return to the ’then’ closure until the body of the promise is completed. It seems this way not because the promise was not resolved by the time ‘resolve’ is called, but because javascript has not finished executing this block of code, and has no way of ‘yielding’ the control flow back to the caller. Another modification makes this clearer:
const p = new Promise((resolve) => { console.log("within promise"); setTimeout(() => { resolve("hello world"); setTimeout(() => console.log("promise resolved"), 1); }, 3000); }); console.log("test begins"); console.log(await p)
Now the order of logs are:
within promise test begins hello world promise resolved
-
Interacting with Simple Contracts on Ethereum
The truffle project helps one get started in writing contracts, but after the bootstrap and deploying a contract to the testnet, I realized there was less support on how to interact with the contract. Alchemy has a good example in their docs, but I wanted to use the web3.js library maintained by ChainSafe.
The key parts were:
- Getting the compiled json abi from trufflesuite and creating a web3.eth.Contract instance
- Creating an account with web3.eth.accounts.privateKeyToAccount
- Creating a transaction with the function call encoded by
encodeABI
- and finally signing the transaction before sending it.
import Web3 from "web3"; const yourApiKey = "..." const web3 = new Web3("https://eth-sepolia.g.alchemy.com/v2/" + yourApiKey); const privateKey = 'yourPrivateKey' const senderAccount = web3.eth.accounts.privateKeyToAccount(privateKey); const contractAddress = "0xd76E31314D760b51493278C72c22d280C6ba6C4b" # a deployed contract on Sepolia import SimpleStorage from './build/ethereum-contracts/SimpleStorage.json' assert { type: 'json' } const contract = new web3.eth.Contract(SimpleStorage.abi, contractAddress) const paymeTransaction = contract.methods.payme(); const transactionObject = { from: senderAccount.address, to: contractAddress, data: paymeTransaction.encodeABI(), value: Web3.utils.toWei('0.001', 'ether'), gas: 200000, maxPriorityFeePerGas: 100000, maxFeePerGas: 200000 }; senderAccount.signTransaction(transactionObject) .then((signedTx) => { return web3.eth.sendSignedTransaction(signedTx.rawTransaction); }) .then((receipt) => { console.log('Transaction successful:', receipt); }) .catch((error) => { console.error('Transaction failed:', error); });
In the browser environment, somehow the privateKey would be part of a default list of accounts, and the higher level
transaction.call()
can be used, but I’m unfamiliar with that flow currently. -
Timezones in Postgres
There are two representations for timestamp in postgres, timestamp, and timestamptz. Both store dates in the db as UTC, and the only difference between the two is that timestamptz uses to timezone to format the data for display. But I think this obscures what I would say is slightly unintuitive behavior when working with timestamps in postgres.
Converting between timestamp and timestamptz
Consider the following query:
postgres=# select pg_typeof(timestamp '2022-01-01' at time zone 'Pacific/Honolulu'), pg_typeof(timestamptz '2022-01-01' at time zone 'Pacific/Honolulu'); pg_typeof | pg_typeof --------------------------+----------------------------- timestamp with time zone | timestamp without time zone
I think it makes sense that a timestamp
at time zone
becomes a timestamptz, but not so much that a timestamptzat time zone
would be converted back to a timestamp! This has implications for methods like date or date part, because these operate on the ‘display’ value.For example:
postgres=# select date(timestamptz '2022-01-01T00:00+00:00'); date ------------ 2021-12-31
The reason that the date of this timestamp is December 31st, as opposed to January 1st, is that when converted to the default timezone on my postgres db, the date is December 31st. We can verify this:
postgres=# select timestamptz '2022-01-01T00:00+00:00'; timestamptz ------------------------ 2021-12-31 19:00:00-05 (1 row)
This means that the default timezone has a lot of bearing on the output of date, or date_part! Because of this, it’s best not to use timestamptz when dealing with methods like these, as they lead to unexpected outputs.
Implicit coercion between timestamp and timestamptz
Let’s create a table with one timestamp field and one timestamptz field.
postgres=# create table if not exists tt ( timestamp_col timestamp timestamptz_col timestamptz );
Let’s insert a value without a time zone into both of these fields.
postgres=# insert into tt(timestamp_col, timestamptz_col) values ('2022-01-01T00:00', '2022-01-01T00:00') returning *; timestamp_col | timestamptz_col ---------------------+------------------------ 2022-01-01 00:00:00 | 2022-01-01 00:00:00-05
In the timestamp field, the value of UTC midnight, January 1st is recorded, which makes sense. In the timestamptz field however, the value of UTC 5am is actually recorded, because the inserted timestamp is assumed to be in the default time zone of the db.
postgres=# show timezone; TimeZone ------------------ America/New_York (1 row)
Now let’s insert a timestamp with a time zone in both of the fields:
postgres=# insert into tt(timestamp_col, timestamptz_col) values ('2022-01-01T00:00+01:00', '2022-01-01T00:00+01:00') returning *; timestamp_col | timestamptz_col ---------------------+------------------------ 2022-01-01 00:00:00 | 2021-12-31 18:00:00-05
In the timestamp column, the time zone part of the input is simply dropped, so UTC midnight is saved. In the timestamptz column, we see that the field is set to exactly the same time as the input. Ie, 2022-01-01T00:00+01:00 == 2021-12-31 18:00:00-05.