T3 Webinar Question and Answer Transcript

Learn from the Experts: Open Data Policy Guidelines for Transit – Maximizing Real Time and Schedule Data Use and Investments
(December 5, 2013)

Back to Webinar Files

Q. Are there other participants from Europe and Switzerland?
A.

Kevin Webb: I'm not sure about participants on the webinar today, but if you're referring to folks using GTFS and the kind of tools I mentioned, absolutely. One of the things I mentioned early on was Open Trip Planner and the work that's been done there, which obviously came from the U.S. It was something that Portland TriMet really takes credit for getting going and building up, but we're actually seeing quite a bit of replication abroad, and most recently in Europe there was an E.U.-funded initiative to make Open Trip Planner the routing platform for all the Netherlands. It's actually been quite actively developed with a large consortium of developers in the Netherlands and a couple of other neighboring European countries played an active role. And I think you're starting to see that sort of platform become very popular within the E.U., where there's substantial investment in transit information systems just kind of as part of the operational capacity they provide. And if there's an interest about specific countries, I'm happy to follow up with folks individually and see if I can connect you to developers or agencies that are using it there, because the list is quite long.

Q. I'm going to ask the Volpe Center, too. Are there any participants in the audience from Europe or Switzerland? If you find out, let us know.
A.

Bob Marville: Yes, we can't tell that right at this moment, but we can check over our registrants and attendees and get back to you.

Q. How many third-party service providers does the agency depend on? What happens when the service goes down?
A.

Dave Barker: We have—I don't have an exact count because it's always fluctuating, but we have several dozen. We have over 50 applications out there that are third-party applications using the data, so around 50 third-party developers. How many do we depend on? All of them and none of them, because if an individual application is not supported, then users migrate to another one. So we don't have any sort of critical points of failure in any one particular third-party application. And we find that the—so the free market really does allow for them to cover for each other and fill in, and as Tim was saying, users are quite good at gravitating towards what are the applications that are supported and that work well. Tim, I don't know if you had anything that you wanted to add to the answer.

Timothy Moore: No, that sounds great. You covered it.

Q. How do you reach developers to get them to use data? Do you do targeted marketing?
A.

Timothy Moore: So yeah, that's a great question. I think there's a few elements to that. The first would be trying to eliminate as many hurdles as possible, and you start by making your license agreement as simple and human-readable as possible. Ours is about 250 words. It's at bart.gov/dla. We don't require signatures or reporting requirements—that's the first thing. If you can provide multiple paths into your data, that always helps, too. Like for a casual developer, for example, if you can create some kind of an embedded I-frame type situation. The “at” map is something that we provide. It's a creative commons map of the BART system. Intermediate-level paths in, RSS feeds, the ETD information through the API, and then advanced stuff—the geospatial, GTFS Real-Time—creating multiple paths in is important because you can then start bringing developers up that value chain. The next thing would be to make your data free to anybody—no fees, reusable, and commercials, non-commercials. And I think it's more important just to get the data out than worry about waiting for some standard that's coming down the line and adjust a little bit later. We've had to do some of that. And also not really messing in the marketplace, not playing favorites, letting people compete. Finally, I think promotion is important. We use car cards and electronic displays in the system to promote not only the fact that data is available, but also that apps are available. We're trying to connect those, too. I think maybe in San Francisco the geek per capita factor might be high enough that the fact that we promote data works. One other quick thing—where the information is posted on your website, bring it up into a really common area. Make sure it's front-and-center so that people can find it. And then research giving developers a little bit—I may be going on too long—but giving developers information to connect your customer needs with their skills—that's also important. I'm going to stop now.

Q. What is the website for knowledge exchange? We can find what that website is and get back.
A.

Martin Catala: Yeah, we can. There are several sites when it comes to the GTFS and GTFS Real-Time elements. There's a Google group that handles both of those, and there's also a transit developers listserv that talks about all of those subjects at once. But they have two specific ones that are designed for GTFS and GTFS Real-Time, if I'm correct. We'd be happy to share it with the participants after the fact, just sort of the list of things.

Q. How can a smaller agency that doesn't have a developer base of Boston/Oakland use publicly-created apps for their service?
A.

Dave Barker: I think for smaller agencies, it's probably more important to use existing standards. If you develop—if you come out with sort of your own protocol and a unique way of accessing data, then your likelihood isn't necessary great if you don't really have a nearby developer community that someone is going to work with it. But if you tap into an existing standard, then people can just add a few lines of code; so an application that they've already written for five other agencies will now work with six, will now work with your data as well. So that would be my advice, is look for the most popular standards and be sure to use those.

Timothy Moore: I agree completely.

Q. We find wide variety in the level of adherence to the GTS ex-facto standard from place to place, which makes it difficult to work with different agencies, GTFBs. How do you ensure that GTFBs for large complex systems adhere to the standards? So GTFS is an ex-facto standard.
A.

Kevin Webb: I can jump on that just in terms of the variety of things that we've encountered. I think the first really is that there's—there's kind of a core set of principles in GTFS that really are fundamental to delivering a valid feed, and there are tools out there that make sure that any feed you're consuming actually meets the kind of level of expectation of what's valid at the end there. And those are quite simple. Then there's a lot of additional fields and additional layers of data that can be optional. And I think that's where you start to begin to see variety. And I think one of the things that really makes GTFS work though, and it's really one of the kind of brilliant aspects of it as a way of approaching the data problem, is that it doesn't mandate the inclusion of all that data. In a lot of cases, transit agencies don't necessarily have it, or they don't have the organizational capacity to maintain it, so they can fall back to that kind of minimum viable set of data that's required that communicate about services, and then other agencies that do have the additional information—for example, a great one of that is disability access on buses, knowing which buses handicap-accessible, wheelchair-accessible boarding and riding opportunities. That's something that agencies that do have that data can then include into the field and make available. And that's just a natural variety of what agencies are operationally capable of delivering. And I think by allowing that variability, you actually end up with more data made available, and it's up to you as a consumer then to know—to check for what's there and make sure it actually serves your need. An example I showed of the TCIP report—part of that validation was really looking at the variety of what fields we've used in different agencies at different locations. And if you're interested in that, I definitely recommend that paper as a way to kind of get a sense of how different agencies have implemented or adopted the feed in their [inaudible] publicly. That said, if it's a valid data set and it meets the minimum specification, it works quite well in most applications. Like for example, like journey planning, we see it readily deployed, and any agency that has a valid feed were able to consume the data and make use it of it without any modification or cleaning on our part.

Q. We are a mid-sized agency in Raleigh, North Carolina. Our ridership is approximately 500 thousand per month. Our demographics are mainly low-income riders that may or may not have access to mobile devices. How would it benefit our market?
A.

Martin Catala: I think there's a lot of agencies that are similar to the Raleigh, North Carolina example that we hear in the question. And I think there's two things that we can sort of count on, is that the use of mobile devices is growing and the use of landline telephones are decreasing. So there are probably more than you realize in your community, and that's going to continue to change as more and more people are not having landlines and are actually having more mobile lines. And if they don't have a smartphone—actually some of these—I can't speak for Dave and Tim's applications—but some of these also can support texting. I know both of them mentioned SMS, which is a form of texting, but I think that would be something—it would still continue to benefit that market. And I would also argue that then you would be—this is becoming an amenity for commuters, so this can actually open up your markets to people that are just outside—not just within the transit-dependent community, but actually within the choice riders. These are—I would characterize them as an amenity for customers. Just like you print out your schedules and put out benches, data is an amenity that helps customers make decisions. That's just sort of my take and my observation on it. But this is a very common concern, something that we want to make sure we address with our guidebook.

Timothy Moore: I think, yeah—Martin, I think those are some great points. And I would also—if I was in a market other than the Bay Area, I would look very close at the penetration of text services on mobile devices. I would bet you a lot that they're pretty high. And that also points to another obligation I think for transit agencies. Not a lot of developers are going to get into the SMS space in an effective way simply because of the cost to entry. Boston was talking a little bit about how you go email-to-text, and there's a little bit of latency there introduced. There are costs involved in going to a straight SMS gateway with your messages, and what we found was that not—a lot of developers are not willing to take those costs on. So that's a channel that we feel like it's important to be in. The second thing is accessibility. When you think about some of the policies that you're looking at, you can't just turn over data and walk away and think that all your responsibilities are going to be taken care of as an agency. It's important to us to reach as many customers as possible, and we create accessible mobile tools, like at m.bart.gov, so that we can fill that space. Because I got to tell you, not a lot of developers are considering the accessibility issues when they're creating their mobile apps, regardless of the platform.

Q. We had fascinating examples. However, the examples of open data uses for traveler information in North America are mostly from the cities listed by Tim—Bay Area, Portland, Boston, etc., or college towns. Are we starting to see deployments in other cities?
A.

Timothy Moore: You know, I can barely keep up with what's happening in my city. I would imagine that as—and I've seen this occur, where transit third-party developers have created platforms in which they can go from market to market and integrate standardized feeds like GTFS and expand into different markets. I'm seeing more agencies issue data in GTFS format. I don't have any specifics though; it's all kind of anecdotal.

Martin Catala: Yeah, we are starting to see it spread. In terms of the total agencies—I mean, there are so many agencies that are in a really small category. So only about a little bit more than a third of the agencies that report to the National Transit Database are participating in Google Transit, which is just one component of this. Even fewer are participating in the Google Transit Real-Time, or real-time data sharing. But in terms of the total passenger trips made via transit, it's upwards of 80 percent of all ridership. So we do see that it's being used in the big cities, just as the person asking the question—but we're starting to see it's spreading to smaller markets. And we believe, we hope, that sort of we can allay some of the concerns of both technology as well as sort of the liability concerns and fears that we might get more agencies looking to see this as an opportunity.

Q. This question is for Dave and Tim: Do you do outreach specifically to local universities to attract developers that are students?
A.

Dave Barker: No, we have no—we've made universities part of our outreach a little bit if we're—when we had a developers conference that involved fires, and we did do some firing at universities. So a very, very little. But so, not much specific outreach to universities.

Timothy Moore: No, we have not done specifically for that, but I can tell you, at the end of that laundry list I was talking about—about kind of developer advocacy and developing your developer group, community is a big part of that. Starting discussions, generating ideas—a lot of agencies have had good luck—although we've never done one—but a lot of agencies have had good luck with hack-a-thons or sort of prize-based incentives for getting apps built. But we've had a lot of success just having meet-ups and doing one-on-ones, and we have a Google group reaching out in social media and that sort of thing. So I think it's—you could go to like the local Drupal group or user groups and announce that you have data. And it's maybe a little bit of shoe leather, but I think that's the best way to build that community and build those relationships.

Q. Any post-application development QA/QC of third-party apps? Is there a formal process to rescind transit system data access?
A.

Timothy Moore: We don't do any QA or QC on any of these apps. We make it very clear at our app center, which is bart.gov/apps that these are third-party applications and that customers need to look at the reviews and do an evaluation on their own. I think the best formal process is the market forces that come to bear on these apps, because customers will find very quickly which ones are good and which ones are not so good.

Q. How do you work with vendors/partners to ensure that IDs, keys for data elements—example TRIPS—match across many different feeds? For example, is this specified as a requirement in a contract?
A.

Dave Barker: I can say that for the development that we did for MBTA Real-Time, we did not specify anything [inaudible] specific key level when we specified what it would do. We did specify that it would be based on GTFS and that keys would match where possible. And other than that, it's a very case-by-case, system-by-system basis, is all I'm really able to say. In some cases, we needed to do workarounds in order to get data to match, but it's been a very case-by-case basis.

Timothy Moore: I think—we issued data early on, like our ETD feed, for example, which—and we just turned out the feed that we were using ourselves. And I think once GTFS-RT came along, we were making data available in a little bit more formal way, where keys and IDs were matching on both the schedule side and the real-time side. I think developers have been used to working with data in different ways and doing transforms on their own. Like for example, we're not providing JSON outputs, and developers seem willing to do that. So I think we're doing the best we can with the data that we have. Putting it out I think is more important than waiting for a standard or standardization to come down the line.

Q. What are the barriers to implementing GTFS-based transport data platforms?
A.

Kevin Webb: I think actually the question that was asked early about standardization of IDs and data across different systems is actually one of the most fundamental. Some of these already got talked a little bit about. I think it's a really important thing to recognize, but this isn't really a technical exercise as much as an organizational one and about getting various people to agree on the naming of things, so you can actually integrate different data sets together. It's particularly valuable when you start moving into integration with real-time information and integration with other operational data feeds, which are often managed by different vendors and different systems that in many operational contexts don't all line up. One of the things I think is really valuable to understand is that the ideas behind GTFS actually can start to help provide a very lightweight framework for connecting these different systems together and finding some common identifiers that things can hang off of. But again, it's not as much a technical concern as it's more of an organizational one, and making sure the different vendors involved and different platforms involved are in agreement.

Q. Have you had any negative legal experiences due to open data? For example, a developer threatening to sue over incorrect or missing data.
A.

Timothy Moore: In a word, no. I think our developer license agreement pretty much covers it. We're providing this data as-is and make it as-available. Yeah, so I don't think—yeah, that has never happened.

Dave Barker: That has never happened with us either.

Kevin Webb: One thing I just wanted to add on as more of a consumer—one of the things—I've heard this question come up a lot, and I think it's always interesting to realize that printed feeds and printed data sets have been being generated long before open data formats in the digital realm came along, also had mistakes in them, and were also being corrected and changed over time. What's really fascinating about digital data platforms and open data frameworks like what we're talking about here is it allows you to more readily communicate those changes and make corrections.

Q. You mentioned that BART puts out everything it has in terms of schedule, real-time data, geolocation, etc. What's the next thing? What are you working on next?
A.

Timothy Moore: I think what I'm starting to see is that developers are looking to kind of fill in data a little bit. One of the things that we recently introduced into our API is load factor, or how crowded trains are at specific times, as a way to help customers—we basically have a certain amount of inventory that's running out on the line. If a customer can see where the crowding is and maybe make a decision to take a later train or an earlier train and actually get a ride or get a seat, that was kind of the motivation behind providing load factor. So I think you're going to start seeing that type of data being requested. Some of the research and visualizations that are occurring now with origin destination data are fascinating to me, and we've been asked a lot to provide OD data for like a month or a specific day, like when the Giants won the World Series—that sort of thing—and there was a big parade downtown. And I think you're also going to start seeing other big mapping services looking for deeper information, more granular data, so that they can differentiate their services from other services that are out there. So I think some of those partnerships might be developing, and that's kind of the direction that I see things going in.

Q. If you had a standardized data feed base on GTFS, schedule-based, and TCIP Real-Time, in your experience, what demand is there for public agencies to provide this data in all the other formats available?
A.

Timothy Moore: I can only speak to—we've released our data in GTFS and GTFS Real-Time and in an API. And I can say that we haven't had people coming to us asking for it to also be in TCIP. I think the environment is changing pretty quickly. I think that it's very worthwhile for people to keep their eye on GTFS Real-Time as a standard. The fact that Google is behind it is giving it a big boost, and it's very efficient. It can also be a little difficult to work with. So I think the most important thing is to really watch the market and watch what's taking off, and make sure that those are the ones that you support.

Q. GTFS is still considered to be an ex-facto standard and we can provide you at the end of this webinar a list of transit agencies that have implemented TCIP that may benefit the results of this question. We did a study and there have been—we have a list of the agencies that have it in place. For example, LINKS is one of the agencies. Does anybody have anything to add to that question?
A.

Kevin Webb: The only thing I'd add is just—is less on the production side of data and more on the consumption. I think it's important to look at what's consuming the data more than what's being produced in terms of what standard to choose. But I think that difference between GTFS and TCIP is probably most to sync there. But that aside, I think that what's really important here is that data should be released in a bulk format that it's freely available without any sort of limiting factors around it. And that's one of the biggest challenges we run into in terms of accessing data feeds, is that agencies will sometimes release the schedule data through GTFS, which is a bulk format. However, they'll restrict access to the real-time—or put it behind a semi-proprietary API, which makes it very hard to get bulk access to the data. And the kind of brilliance of what GTFS-RT has done and the efficiency that was just mentioned in terms of the way it's been framed, is it makes it very easy to get a full snapshot of the system, which many, many applications depend on that. And I think that's one of the most fundamental things you have to recognize if you're building this from the agency side, is that all the data you have should be available at any given point in time to anyone that wants to consume it. Otherwise, it really restricts what developers can do.

Charlene Wilder: [inaudible] for example, TCIP gives more exact information like for bus schedules.

Q. Has anyone been able to create apps on loaded bike racks or know how to pull data for bike racks on the front of buses?
A.

Dave Barker: We have not.

Timothy Moore: No.

Kevin Webb: I'd say this is a great example of the kind of extension that can be built on. But again, it kind of comes down to an operational capacity question. I think it's something—like load factor that was mentioned earlier, which is a very valuable piece of information but not one that every agency could actually provide. And there's certainly an opportunity to provide information about [inaudible] bikes, and then there's another opportunity to provide dynamic information about loading. But it really does come down to a question about whether there's a framework to get that through. The data standards like GTFS are extensible in that sense and could provide it. It's really more of connecting the kind of end-to-end information and making it actually accurate, is probably more important than the data standard itself.

Q. Isn't the purpose of TCIP primarily for internal operations and to and from op centers, and not for public release? It has features that really aren't for public consumption. Any comments on this?
A.

Kevin Webb: I can add—I mean, I think that it's really—it's unfortunate I think that we've gotten bogged down in that conversation about really specifics about the standard. I think that it really comes down to creating a framework where the data is made available and open, and I think that really what you have to look at is what enables that to happen readily. And I know that there are certainly internal operational examples where TCIP is used and is valuable, but it's not gotten 1200 cities around the world to all agree on the way to do really a large number of really valuable things in the way that GTFS has. And I think you have to start to ask is it worth getting into a debate about standards, or is it worth getting in a debate about where can we make the most impact in delivering public information? And I think in that regard, it's kind of a foregone conclusion at this point. It's unfortunate that it still seems to be kind of a primary driver of the debate, at least on the Federal side. I'm working with local agencies and people internationally. I don't think it's even relevant to the conversation anymore.

Charlene Wilder: Hopefully, TCIP and GTFS can work together. [inaudible] did do an internal TCIP recently on their CAD/AVL buses. They spent over 30 million dollars doing this, and it's been very successful.

Q. How can you tell how much information customers really want?
A.

Dave Barker: Well, it varies very much according to the customer, and also customers themselves aren't really sure how much information they really want. They tend to come in saying, “Oh, give me everything, give me everything.” And then you give them everything and it turns out to be too much. And also driving those decisions are, well, how much information do we want to give customers? We tend to err on the side, for example, of pushing out alerts because we—essentially we get less complaints if we are warning people—if we err on the side of warning people. So I think that the key is to try to give customers tools that will let them filter out themselves what they do want and what they don't want. And we're trying to do that within our API, and I know that GTFS Real-Time has a number of different keys and things to be used for filtering so that it can be used there as well.

Q. If you had to do your open data effort all over again, what would you do differently based on lessons learned from the first time?
A.

Timothy Moore: I'd probably go straight to the API. We turned out data kind of however we had it, and there were a ton of apps that were built on it, but we just recently made a migration of our site to the open source Drupal framework, and some of that stuff we just had to kind of walk away from because we were no longer using it in that format and it didn't make sense to continue to support it when we had GTFS and the API there. So hindsight is 20/20. I wish we had just jumped in with both feet and built the API from the get-go. And I think a lot of agencies who kind of followed our path in just releasing what they had are going to have these transitions to make. Luckily, developers were really gracious and a lot of folks were moving over to the GTFS-RT side anyway.

Q. How do you balance investing in your own notification service with encouraging third-party developers to deliver notifications? Are you competing with them?
A.

Dave Barker: We are in a sense, and it is a hard line to walk. I think we had—I mentioned in the presentation that no one's really jumped in to try to complete with our T-Alert service, and it would be great if someone would, because I'd love for there to be an alternative to that that customers can turn to if there's something they don't like about ours. But ultimately, I think you need to pick what is it that's important for the agency to provide directly. What is it that you need to have your name on? And then do the best possible job you can at that. And if there's something that you're interested in dabbling in and just sort of releasing something small, stay away from that. Let the third-party developers tackle that and focus your resources on what services you do want to provide and making those as good as possible.

Back to Top