M-Lab's Murakami Tool - Supporting Structured Research Data Collection from the User Perspective
Many people know M-Lab and our TCP performance test, NDT, from running it in a web browser. Perhaps the largest single source of NDT tests comes from its integration by the Google Search team. While M-Lab is known for the large volume of crowdsourced test data resulting from people running our tests, over the past few years we’ve developed new ways to run our tests and open source Internet measurement tests from other platforms using a tool we’ve called Murakami.
Murakami is a software program that runs automated measurements from a computer attached to a network. Developed as part of our involvement in “Measuring Library Broadband Networks for the National Digital Platform,” a grant (award #LG-71-18-0110-18) from the Institute of Museum and Library Services (IMLS) National Leadership Grant for Libraries program, Murakami enacts the best practice used by multiple national regulators, academic Internet researchers, and commercial enterprises, of running Internet measurements from a dedicated device, placed on premise in the network that needs to be measured 1 2 3.
Murakami offers three key features:
Standardized data. By running measurements from a standard premise device, we eliminate issues of self-selection bias that can plague analyses of solely crowdsourced data. Additionally, each supported test has a defined output specification providing a simplified JSON file containing the most common measurement fields of interest, and metadata fields that can be defined by the researcher. This feature standardizes Murakami test output independent of potential future changes in the output formats and available fields provided by third party test developers.
Automatic, recurring measurements. Seeing measurements like upload and download speeds, latency, etc. collected over time is also more informative than a one time speed test, and Murakami enables that data collection. Test results can be saved locally on the device, pushed to a central archive on a self-maintained server on-site or in the cloud, or sent to a storage location in Google’s Cloud Storage service.
Multiple measurement methodologies. Of course, M-Lab measurement services aren’t the only way to measure Internet service, nor should they be. A diversity of measurements from different instruments, measurements of different aspects of a connection, or different segments of the network path between service locations and the Internet are all relevant. This is one reason why while M-Lab has led the technical development of Murakami, the software doesn’t only include M-Lab tests, currently including both M-Lab NDT protocol tests, as well as a single and multi-stream Ookla test.
Murakami follows the model of other open source Internet measurement initiatives such as RIPE Atlas, as well as dedicated devices and native applications such as SamKnows, Eero, Roku, AppleTV, and others. By enabling multiple measurements to run from the same locations and not being exclusive to the M-Lab platform tests, we’ve focused not just on measurement but on improving the general understanding of what different tests tell us about different aspects of an Internet connection. Each measurement tool is orchestrated or controlled by a “test runner” script that is registered in the software and made available for users to configure. This allows a user to enable only the specific tests they’re interested in running, and configure their options according to specific measurement use cases. For example one set of Murakami devices might be configured to run only Ookla multi-stream tests and another to run only NDT tests. The modular test runner framework also can enable researchers and developers to envision other measurement tools to be included in the software as future test runners.
Supporting Internet Research
Building on the success of the MLBN program, in fall 2020 M-Lab began supporting researchers with the Measuring Internet Resilience in Africa project (MIRA), “a joint initiative between [the] African Network Information Centre (AFRINIC) and the Internet Society.” Part of Internet Society’s (ISOC) Measuring the Internet project, researchers are studying the resilience of the Internet in Africa using a combination of measurement tools including Murakami and RIPE Atlas, as well as other data sources and tools. You can learn more about the MIRA project and their methodology in their whitepaper, or register to attend their upcoming webinar.
Supporting the MIRA team’s use of Murakami has also helped drive new features, integration with other open source M-Lab software, and contributions from the MIRA team have filled in some gaps in Murakami’s data collection workflow and system monitoring.
Supporting Measurements to Multiple Servers, Target Countries or Regions
When an NDT test is run, the software (client) by default contacts the M-Lab Locate API service which provides the client a list of servers that are geographically closest to the client 4. The Locate API is a service running in Google Appengine, and uses latitude and longitude values in the HTTP request headers provided by Appengine, which are derived using the client’s IP address. If Appengine does not identify the client location, the Locate API falls back to the most current version of the Maxmind database. While this is useful for a general default, the Locate service also supports testing to available servers in a target country or ISO 3166-2 region using a URL parameter. Internet researchers generally find that measuring to multiple locations on the Internet is helpful, for example to see the difference between measuring to a server within the same country as the client and to a server further away. Murakami’s default is to conduct enabled tests a specific number of times per day, randomizing the time when each group of tests is run. But originally Murakami only supported one NDT test to the nearest server identified by the Locate service. To support MIRA’s interests, we added support to Murakami tests using the M-Lab Locate API to use the country and region code URL parameters. A custom configuration file allows a researcher to define server groups for each run of enabled tests. The result is that one group of servers could be identified by an array of country codes or region codes, and each time Murakami runs NDT instead of one test a series of tests is conducted to each country/region in the array.
Supporting Both M-Lab’s NDT Servers, and Self-provisioned NDT Servers
MIRA researchers also wanted to set up and maintain their own ndt-server instances in countries where M-Lab did not have its own production server. This would enable them to test to both a server in the same country as the client, and to one or more servers in other countries.
Running tests to a self-provisioned ndt-server was already possible in the ndt7-client-go and ndt5-client-go libraries with the
--server flag, but Murakami’s wrapper around these libraries did not support that flag. Along with adding support for multiple tests per run to different groups of servers, support was added for
--server to allow including both M-Lab and non-M-Lab NDT servers.
Automate Publishing Collected Measurements to BigQuery
Murakami does a nice job of posting the test results it collects to either your own SSH server or to a Google Cloud Storage bucket. From there test results can be loaded into a database for analysis, but this step wasn’t automated in our previous work with public libraries. The MIRA team developed a script that uses the
bq tool in the Google Cloud SDK to post collected results to a BigQuery table. The script can be run regularly using standard scheduling tools like cron, Google Cloud Scheduler, or Windows. A daily scheduled job for example, can automatically post the previous day’s tests to a BigQuery table. MIRA is doing just this, using a Linux bash script using the bq tool within the Google Cloud SDK, similar to the example below:
#!/bin/bash export GOOGLE_APPLICATION_CREDENTIALS=<path to keyfile> today=`date +"%Y-%m-%d"` for f in `gsutil ls gs://<GOOGLE CLOUD STORAGE BUCKET PATH>/ndt7*$today*` do bq load --source_format=NEWLINE_DELIMITED_JSON <BIGQUERY DATASET>.<BIGQUERY TABLE> $f TestName:STRING,TestStartTime:TIMESTAMP,TestEndTime:TIMESTAMP,MurakamiLocation:STRING,MurakamiConnectionType:STRING,MurakamiNetworkType:STRING,MurakamiDeviceID:STRING,ServerName:STRING,ServerIP:STRING,ClientIP:STRING,DownloadUUID:STRING,DownloadValue:FLOAT,DownloadUnit:STRING,DownloadError:STRING,UploadValue:FLOAT,UploadUnit:STRING,UploadError:STRING,DownloadRetransValue:FLOAT,DownloadRetransUnit:STRING,MinRTTUnit:STRING,MinRTTValue:FLOAT,RTTValue:FLOAT,RTTUnit:STRING done
MIRA is then using Google’s DataStudio tool to visualize the data, but other researchers could use any visualization tool of choice.
Adding Murakami Monitoring Support with the Balena CLI Tool
In the MLBN program, and in the MIRA project, the Balena.io platform was used to remotely manage and update Murakami devices in the field. The Balena CLI provides an API for interacting with devices managed in your projects, which the MIRA team is using to pull data about the status of each measurement device in the field. Using the API, the MIRA team will automate the monitoring of the devices and will alert the host when a device is found offline.
Add RIPE Atlas Software Probe to Murakami
One of the most interesting ideas to come from our collaboration with the MIRA team is the use of measurements from RIPE Atlas probes with performance measurements from NDT and Ookla. While not yet completed, the MIRA and M-Lab teams have begun testing the addition of the RIPE Atlas Software Probe to Murakami, allowing it to be an endpoint for RIPE measurements as well.
It’s been great to support the MIRA team’s research by developing additional features and tools to support premise-based measurements with Murakami. We’re looking forward to seeing the results of their research, and continuing to support the ecosystem of open source tools to collect and analyze measurements from various platforms and tests.
Future Development & Collaboration
The development of Murakami was made possible by the support of the Institute of Museum Libraries and Services, as part of our involvement in “Measuring Library Broadband Networks for the National Digital Platform,” a grant (award #LG-71-18-0110-18). If you are interested in using Murakami for your own research and want to ensure the continued development of the tool, please consider contributing to the repository or making a donation to the M-Lab project. If you are interested in a collaboration similar to the one with the MIRA team, please reach out to email@example.com.
Like all M-Lab tools and services, Murakami is built using open source code. If you are interested in contributing to the future of Murakami, we welcome proposals and donations.
Federal Communications Commission. Measuring Fixed Broadband - Measuring Broadband America Program. https://www.fcc.gov/general/measuring-broadband-america-measuring-fixed-broadband ↩
Ofcom. UK home broadband performance, measurement period November 2020 - Technical Annex. https://www.ofcom.org.uk/research-and-data/telecoms-research/broadband-research/broadband-speeds/uk-home-broadband-performance-nov-2020 ↩
Sundaresan, Srikanth; et al. Broadband Internet Performance: A View From the Gateway. ACM SIGCOMM Computer Communication Review, Volume 41 Issue 4 August 2011 pp 134–145. https://doi.org/10.1145/2043164.2018452 ↩
Measurement Lab. Locate API Usage. https://github.com/m-lab/locate/blob/master/USAGE.md ↩