In this second “analysis recommendations in context” post, we will explore the refined research questions from the first post, resulting from our discussion of how to design specific questions with understanding of available data source(s) and the context of what each contains. We emphasized the importance of selecting a data source that matches the goal of the research question. This is critical for analyses of broadband measurement data, particularly when the research goal is to compare the results to one another, to national broadband standards or specific funding requirements, or to align with advertised terms of ISP service.
If you typically use the
measurement-lab.ndt.unified_downloads views, then nothing will change. We are updating the ndt5, switch, and tcpinfo schemas, removing obsolete views, and renaming some views in preparation for improving ease of use and documentation.
A while back, our team published some analysis recommendations for anyone working with our data from the Network Diagnostic Tool (NDT), comparing it to other Internet measurement data sets, and drawing conclusions or inferences about the data. These recommendations are intended to provide guidance about analyzing crowdsourced data, because we know that it’s easy for analyses to end up with what looks like a striking comparison or finding, but that may not actually be supported by the underlying measurements or data. But because recommendations are only that, we’re now beginning a series of posts to unpack those recommendations with some context and examples. First, we’ll recap our previous recommendations post with more context, and finish with an example that we’ll continue working with in subsequent posts.
When thinking about broadband in the United States, the first thing people likely think about is whether their connection is fast enough– are they getting the speeds they need to do business, go to school, etc. The dominance of “speed” in assessing broadband service goes all the way to the top- the FCC defines broadband according to the achievable download and upload speeds to the Internet. But generic speed test measurements only go so far in observing a connection’s performance, and M-Lab and the research community are working to expand the concept of broadband measurement beyond basic speeds.
At TPRC 2021, Dave Clark and Sare Wedeman presented “Measurement, Meaning and Purpose: Exploring the NDT Dataset” which raises relevant and timely questions about M-Lab’s NDT dataset and its potential applications. Please join us Wednesday, December 15, 2021 from 11am-12:00 pm Eastern for a presentation from the authors and a discussion with the M-Lab community.
The National Telecommunications and Information Administration’s (NTIA) recently released a new public map, the Indicators of Broadband Need. Pulling together different sources of data in this excellent, publicly available resource is helpful to communities as they plan how and where to improve broadband services for their residents. Historically, many factors have made it difficult for communities in the US to address digital inequities through federal subsidies, notably the well publicized inaccuracies of federal data sources on broadband deployment from the FCC. This process is changing and hopefully improving at the FCC. But the landscape of assessing or measuring who does and doesn’t receive quality and affordable Internet service is also complicated by the conflation of multiple measurement data sources covering different aspects of Internet connectivity and user experience. The different data layers in the Indicators of Broadband Need provide a chance to step back and examine all currently available sources, understand what they are measuring, how they differ, and what aspects of Internet service are not yet being measured, but should be. The Internet is a complex system, and the reality is that no one measurement methodology or data source is sufficient to measure its performance.
Many people know M-Lab and our TCP performance test, NDT, from running it in a web browser. Perhaps the largest single source of NDT tests comes from its integration by the Google Search team. While M-Lab is known for the large volume of crowdsourced test data resulting from people running our tests, over the past few years we’ve developed new ways to run our tests and open source Internet measurement tests from other platforms using a tool we’ve called Murakami.
Performing a measurement with ndt7 on the M-Lab platform now requires an access token issued by the Locate API v2.
Baltimore Data Day is an annual conference bringing together “community leaders, nonprofit organizations, government and civic-minded technologists to explore trends in community-based data and learn how other groups are using data to support and advance constructive change.” This year the 11th annual event expanded to become Baltimore Data Week, celebrating the 20th anniversary of the conference’s host organization, the Baltimore Neighborhood Indicators Alliance (BNIA). As a Baltimorean myself, I was honored to be invited to give a talk about the M-Lab platform and our open data, on the conference’s “Digital Inclusion Day.”
Starting October 7th, 2020, the ndt7 server on the M-Lab platform will require access tokens issued by the Locate API v2 to run a measurement.
Over the past month, M-Lab has published a series of blog posts about ndt7. As of Thursday, August 13th, 2020 roughly 90% of NDT clients using secure websockets have completed the migration from ndt5 to ndt7.
NDT measures “bulk transport capacity”: the maximum date rate that TCP can reliably deliver data using the unreliable IP protocol over an end-to-end Internet path. TCP’s goal is to send data at exactly the optimal rate for the network, containing just the right mix of new data and retransmissions such that the receiver gets exactly one copy of everything. Since its creation, the TCP protocol has consistently made improvements to the way it accomplishes this task, consequently, NDT has also incrementally changed to reflect these improvements. The most recent improvements, including support for TCP BBR, are available in ndt7. On July 24th, we announced the start of migration of NDT clients to the latest protocol version. As of today, approximately 50% of clients are using ndt7. As the ndt7 measurements become the majority of the NDT dataset, the M-Lab team is considering what we do and do not know about whether and how changes to the NDT protocol have affected M-Lab’s longitudinal NDT dataset over time.
Following the general availability of the ndt7 protocol, we will be working with NDT client integrators to support their migration to ndt7. As they do, the NDT dataset will shift from predominantly ndt5 to predominantly ndt7. As part of assessing our readiness for this larger effort, a pilot was started on July 8.
The new ndt7 protocol for the Network Diagnostic Tool (NDT) is now generally available on the M-Lab platform. Since 2009, NDT has been the premier TCP performance measurement service test hosted by M-Lab. During its history on the platform, NDT has produced the largest test volume to date, spanning the longest history. Since late 2018, M-Lab has worked with researcher Simone Basso to develop the ndt7 protocol and archival data format.
In November 2019, M-Lab reached a milestone after upgrading the operating system, virtualization, and TCP measurement instrumentation running on our servers worldwide. The upgrade also included a completely re-written ndt-server, providing backward compatibility to old clients, as well as the new ndt7 protocol. With the change in system architecture and the changes to ndt-server, our team wanted to provide unified, longitudinal views of the data in BigQuery that embed the provenance for all tests.
If you’ve been following our blog over the last few months, you know M-Lab has been working toward a complete server platform upgrade. As of November 20, 2019, all M-Lab servers are now managed by Kubernetes, running Docker container services for all experiments. This transition has greatly improved our platform management, this post addresses the short term impact on downstream data users and applications, and outlines a temporary solution and our longer term for new NDT tables/views.