Engelberg Center Live!

A Bird in the Hand is Worth Two in the Dataset

Episode Summary

Birders and machine learning researchers have forged an unlikely friendship: birders rely on advancements in technology to bring them closer to nature, while machine learning experts rely on the slow, methodical research that birders love to do. But what has this alliance meant for the relationship between birders and their environments, and for the birds themselves, who are so stubbornly resistant to becoming data? In this episode, we explore the ways that the natural environment relates to generative AI, the ways it pushes back, and how we can find hope in both the pace of technology and in unhurried birdsong.

Episode Notes

Music Used:

The Blue Dot Sessions, “Dirty Wallpaper,” “Valentis,” “Pulse,” “Mill Wyrm,” “Cloud Line,” “Pall Canyon,” “A Common Pause,” “Dialtone 11"

Citations:

Rachel Carson, "Silent Spring"

John Keats, "La Belle Dame sans Merci," read by Michael Sheen

Siegfried Sassoon, “Everyone Sang,” read by Garrison Keillor

Episode Transcription

Tamar Avishai: This is the sound of nothing. It’s audio recorded by University of Salford researchers in the abandoned village of Borisovka, located in the Chernobyl Exclusion Zone in Ukraine. But really, it's the sound of absence. The sound of an abandoned place. Nothing remains. Except birds. The sound of birdsong in the air in our daily lives is so present that we stop hearing it, and it's why its absence can be so jarring. Yet why it's so hard to put your finger on what's missing? Forget hoofbeats - no birds are the true sound of the apocalypse, and a reason why the return of birds to the Chernobyl site. That Three Mile Island that resulted when a nuclear plant overreacted in 1986, is such a tangible sound of hope. And so is today's story.

 

Tamar Avishai: So what I'm hearing here is that this is actually going to be the most heartwarming episode of the series.

 

Jer Thorpe: Oh yeah, we're the we're the feel good team. There's no question about that.

 

Tamar Avishai: From the Engelberg Center on Innovation, Law and Policy at NYU School of Law and USC's Annenberg School for Communication and Journalism, this is Knowing Machines, a podcast and research project about how we train AI systems to interpret the world. Supported by the Alfred P Sloan Foundation. And today we're telling the story of birds, more specifically bird ers, and how this [00:02:00] die hard, nerdy, binocular wearing subculture has helped us to understand some of the ways that AI and machine learning can create a deeper connection between ourselves and the natural world, and also the ways that that world can push back.

 

Jer Thorpe: Yeah. So birds have been implicated with machine learning throughout pretty much its entire history.

 

Tamar Avishai: This is Jer Thorp. He's a data artist, adjunct faculty at NYU's Tisch School of the Arts, a Knowing Machines team member, and, naturally, a birder.

 

Jer Thorpe: And there's a few reasons for that. And one is that birds are a ubiquitous object. So we go back to look at some of the early experiments and work with machine learning and specifically computer vision. They were looking at classes of objects that they could find commonly in photographs, so they would look for things like cars and bicycles and desk lamps and of course, birds, because birds are everywhere. But the other one is that they serve this, this very specific purpose. That was a challenge for machine learning and continues to be, which is this idea of fine grained classification.

 

Tamar Avishai: Fine grained classification is a key thing to understand. It's one thing to say that an image contains a car or a motorcycle. We're all familiar with what Captcha wants us to find to prove that we're not robots online, but it's quite another to say that an image contains a 1957 Chevy coupe or a black crowned night heron. No one would get into a website ever again, but for some communities, God is in the details.

 

Jer Thorpe: So we have this huge community of hundreds of millions of birders all around the world who are doing this act. And so for the computer vision community, this was like catnip. There was this problem that was hard, but it was also a problem that there was a lot of data that had like, solved examples of it. And the reason for that is because [00:04:00] birders like to identify birds. And so one of the reasons why this became such a big thing was that at some point, these computer vision researchers who were kind of used to working in their own labs and their own kind of little rarified environment where they didn't like to talk to other people, realized, oh, wait a minute, there's these gigantic communities that are based around this problem. What if instead of scraping data from Flickr, we were to work directly with these communities to try to, like, build these machine learning systems that actually involve those people?

 

Tamar Avishai: These machine learning systems evolved into apps like Merlin, a bird identification project that uses sound and images to identify birds.

 

Jer Thorpe: This is a project that's actually a real, meaningful collaboration between a community and and these researchers.

 

Tamar Avishai: Birders, both amateur and professional, although the former certainly dwarfed the latter, have been around for centuries. And what's particularly interesting is that the history of birding is also the history of technology. As long as people have wanted to study birds, they've wanted better, more precise instruments for doing it. For all the prevalence of birdsong in the air, the birds themselves are both small and far away. So anything that brings us closer to the actual animal is a crucial step in the evolution of the discipline. Ironically, that often meant destroying the thing that we wanted to study in pursuit of studying it.

 

Jer Thorpe: So binoculars, for example, were like a really useful technology because before binoculars existed, the way that you would get a close look at a bird as you would shoot it, and then you would look at its body. Wow.

 

Tamar Avishai: So enter the development of the [00:06:00] app Merlin and this meaningful collaboration between the birding community and machine learning researchers.

 

Hamsini Sridharan: It's like two communities coming together, right?

 

Tamar Avishai: This is Hamsini Sridharan. She's a doctoral student at the University of Southern California and Jer's partner in the Knowing Machines project.

 

Hamsini Sridharan: This sort of more common approach to machine learning data, right, has been to say, like, what can we scrape off the internet in bulk, and what's the cheapest way to get that annotated so that you have this like bigger, better, faster approach to AI? Basically of like if we just feed it as much crap as is available, that'll end up with a more sophisticated model.

 

Tamar Avishai: Birders, of course, are used to moving at a slower, more methodical pace. And so this initial partnership wasn't without its kinks. Scraping pictures of birds off of Flickr and using mechanical turkers to label them didn't impress the ornithologists at the Cornell Lab of Ornithology. The Mecca of bird science, and where the initial data set creators were invited to show their project.

 

Jer Thorpe: And the ornithologists were like, this is junk. There's no way we could use this. It's just not good enough. And it sort of spoke to the idea that, you know, birding is certainly this community of people who are very meticulous. You know, the whole process and practice of birding is really to, you know, we call it sort of puzzling out and identification and trying to figure out sort of exactly what you're looking at. And if the machine can't do that to the level that an average birder could do, then it's not a useful tool for birders.

 

Tamar Avishai: So developers realized that the only way to make their output, in this case their app actually useful, was to scrape the best data possible by engaging the community who deeply cares about understanding the fine grained details and getting things right. So they asked the birders for help and Merlin ends up [00:08:00] being an app that is both for amateur birders and made by them. Birders around the globe input information into this app. Pictures. Sounds. Location. Merlin is one of the most comprehensive repositories for birders in the world.

 

Jer Thorpe: I think that was really exciting as a realization for this group was that, hey, we can work with these experts such that we can make tools that are really good and that will suit the needs of these people, but also have these much wider applications.

 

Tamar Avishai: Okay, put a pin in these wider applications. We will definitely come back to them. But for now, we've set the stage at its outset. This relationship between birders and machine learning researchers feels like a quieter, lesser told narrative about AI, one that pushes back against the big, faceless tech controlling everything from above. It's a relationship that feels symbiotic. I mean, why is it good for the birders? Well, because as we mentioned earlier, birders benefit and always have from the evolution of technology. They learn from one another, and Merlin and apps like it are incredible ways of accomplishing this across the globe. It allows them to tackle environmental concerns, conservation, research, all with a tap. And so why is it good for the machine learning researchers? Because by asking birders to lend them their slow, methodical expertise, they have an opportunity to make datasets that are really functional, that are stuffed to the gills with high quality data, which, unlike, say, ChatGPT or similar generative AI models, is actually full of a boundary set of knowable knowledge. Furthermore, when AI is built from the [00:10:00] bottom up, not the top down, it's less vulnerable to the sticky issues of consent that we looked at with Clearview in episode three. Meanwhile, experts, often on a volunteer basis, are constantly removing objectionable content. This seems like it should be a straightforward and yes, even heartwarming relationship. But... And you knew there was going to be a but.

 

Jer Thorpe: How are these systems changing the way that we know birds? And then also, how are they changing the way that we act as birders or as other appreciators of nature?

 

Hamsini Sridharan: And then you also have on top of that or below that, I guess, the birds themselves and how they fuck with the process of becoming data. Right. Like they're not always like amenable to that.

 

Tamar Avishai: Exactly. For the relationship between birders and data scientists to work, birds have to become data. And plainly put, they don't want to. Birds Challenge. Classification. Bird song is never actually isolated from its natural environment. Ducks intermate, apparently. Life, in the words of cinema's wisest chaotician, finds a way. Okay, but we're getting ahead of ourselves. Let's go back to the real point at which everything starts to fall apart, which is this idea that birds are data, which, ironically, is also why so many people become birders in the first place. It's a powerful thing to observe something in your own intimate, familiar space, like your backyard or your park, and be able to put a name to it to give it a context and understand its role in our larger world. It makes you feel like you're part of this larger world.

 

Jer Thorpe: I mean, I joined [00:12:00] the project 100% because I was very interested in this idea of how birds become data. So, you know, I'm out in the world and I see a bird. There's no data being created in that moment, but as soon as I identify it, there is that piece of data that it's in my brain. And what birders tend to do is they tend to record those things. And, you know, with the advent of databases, birders were some of the first people to start using databases to store and share their data.

 

Hamsini Sridharan: You know, take any bird like outside your window. In my case, it would be sparrows, right?

 

Tamar Avishai: Sure. I'll take her word for it.

 

Hamsini Sridharan: And you can just start with a sparrow and kind of follow it through some of these technological systems and, like, tell the story of what happens to it. But there is also, you know, abstracting from that a level like the story of the people who were involved with it. And this like story of like birding is basically know what we call like citizen science, right? Or community science, I think is another term that gets used a lot.

 

Tamar Avishai: And of course, if you're a citizen scientist, which in this case is a fancy way of saying an amateur birder, you want to be part of a community and you're really interested in speaking that community's language. You want to collect data points and moreover, you want to learn from them. I mean, take me, for example. I literally couldn't recognize a sparrow call or really what it looks like. I mean, how could I ever learn to tell the difference between a house sparrow or an American robin or a European starling? All of which, according to Merlin, live in my backyard, without the explicit understanding that someone before me has identified and classified all of these birds. Machine learning is wholly dependent on that classification. The thing is, nature on the whole, and birds in particular are hell bent on resisting it.

 

Hamsini Sridharan: And [00:14:00] so when you look at what the docs are doing, they don't care about what we think a species is. They're just having sex with who and what they want to have sex with, and making little hybrid duck babies that are so completely, like, resistant to classification because a hybrid is, you know, neither species of its parents, right?

 

Jer Thorpe: I have a bird in the park that I look out onto, which is a pretty common bird in North America called the northern mockingbird. And the northern mockingbird is really well known because it mimics other birds songs. They have tons and tons and tons and tons and tons and tons of recordings of northern mockingbirds, but they're really hard to pin down because they can make the sounds of all these other birds. And so that's one example of the ways that no matter how well we build these systems, birds are going to mess with them and they're going to like question the ways that we sort of make rigid classifications.

 

Tamar Avishai: And this is also why people love becoming birders, because there is no endpoint to this knowledge. There actually aren't any boundaries. Ducks species are continually interbreeding. A mockingbird call is, by definition, the call of other birds. And this pursuit of boundless knowledge, which can seem like a fun, never ending Rubik's Cube of observation and categorization, can be just as appealing to birders as the fixed nature of classification. I mean, what can I say? Like all human beings, birders contain multitudes. And so do birds, or at least what birds have represented to our culture. So often they've been seen as, forgive me, the canary in the coal mine, a bellwether. They're the first thing you notice when they're missing. And like with the Chernobyl Exclusion Zone [00:16:00] clip that you heard at the top there, the first thing that speaks to a sense of hope and regeneration when they return. And this is another really interesting way of thinking about birds, or as Hamsini put it to me, thinking with birds.

 

Hamsini Sridharan: There's a really long history there. But, you know, let's take, for example.

 

Audiobook narrator: Silent Spring by Rachel.

 

Hamsini Sridharan: Carson and Silent Spring.

 

Audiobook narrator: There was a strange stillness. The birds, for example. Where had they gone? It was a spring without voices.

 

Hamsini Sridharan: She sort of started to notice the absence of birds, and that was sort of her entry into thinking about the environmental impact of pesticides. Right?

 

Audiobook narrator: On the mornings that had once throbbed with the dawn chorus of robins, catbirds, doves, jays, wrens and scores of other bird voices, there was now no sound.

 

Tamar Avishai: When you start to look, you see the importance of birds or the importance of their absence in culture everywhere, both. Today, when you're talking about climate change, habitat loss, nuclear proliferation, and throughout history. Literature, poetry. Whether it's the romantic poet John Keats musing on death.

 

Poet: And this is why I sojourn here alone and palely loitering though the sedge has withered from the lake, and no birds sing.

 

Tamar Avishai: Or the World War One soldier and poet Siegfried Sassoon euphorically describing news of Armistice Day.

 

Poet: Oh, but everyone was a bird and the song was wordless. The singing will never be done.

 

Tamar Avishai: All of this only confirms how birds defy scientific, knowable spreadsheet classification. As much as birds are the tiny, chittering dinosaurs that fill the sky, they're also an idea. [00:18:00] And it's absurd to think that a machine learning program could claim to be able to accurately and comprehensively classify them in their totality. Except what other tools do we have besides classification to give nature any sense of structure, to learn it, to teach it, to get new cohorts of citizen scientists excited about it, and then to give them a language to describe their experiences of it. And then, of course, to implement the tools we need to save it. I mean, the way a bird tells the existential and logistical story of climate change is by us counting it, which necessarily means identifying it. And this brings us to the question which we visit in every episode of Knowing Machines of bias.

 

Hamsini Sridharan: We hear a lot about bias in artificial intelligence datasets, right? You'll see various kinds of biases crop up in these naturalist communities as well. You know, you can start to get into which birds and which birders, right, are feeding into.

 

Tamar Avishai: This because someone had to classify these birds in the first place. And all of those someones were human beings. And human beings are attracted to shiny objects, or in this case, male birds who tend to have flashier plumage, which leads to the females of the species naturally being underrepresented.

 

Hamsini Sridharan: But then on top of that, you have the layers of who the birders are right, and who has access to birding in their environment. Who's taught that this is like an interesting way to engage with the world? Who has access to smartphones like all of that builds in these layers of interest access bias. And these are communities that are aware of that. [00:20:00] Right. But like, for example, there's like a heavy bias towards North American birds and, and European birds because people in other parts of the world, you know, aren't as engaged with this. Right. And then there's also like birders tend to be white, you know, birding while black, like, that's a hashtag. Like that's a thing that it's becoming a movement to expand like interest and access in this way of connecting with the world. But there's still like questions of who and what is represented that then get reflected in these datasets.

 

Tamar Avishai: These communities are largely independent and self-governed, led by a group of reviewers who determine whether someone's bird ID data is valid and who also happened to be overwhelmingly white, male and Western. In the meantime, 25% of the world's birds are in Africa. And yet most of these apps don't work on African birds. So you have a biased sample set where making the discoveries working within limited geographic constraints. And then they have to give them names.

 

Jer Thorpe: The root sort of truth of all birds names are Latin names, right? So we have common names and then we have the Latin name. But the Latin name is considered to be like what the bird is, which is of course, ridiculous.

 

Hamsini Sridharan: It's a, it's a Western scientific tradition, down to sort of like the binomial nomenclature of genus species naming the.

 

Tamar Avishai: House sparrow, for example, is the passer domesticus. You know, if you're feeling fancy.

 

Hamsini Sridharan: Yeah. And then there's also just like groups and communities that choose to know birds in the natural world very, very differently. If you look at like indigenous cultures around the world, for example. Right?

 

Jer Thorpe: There's indigenous names for these birds that go back thousands of years before these names and somehow, like this kind of manufactured Latin name kind of gains precedence over it.

 

Tamar Avishai: So we have these tools that allow [00:22:00] this work to work, that allows us to understand and interpret the world around us, from the sound of birds to the much larger implications if they're missing. And even more. These tools invite so many more people into that work than would ever have been able to before. We can pull out our phones, download an app, and ironically become closer to nature. We are helping that data become more comprehensive. That's one way to think about all this, and it's not a bad one. So long as we recognize that every single one of these tools was created by people. And people are the products of their environments, their context, their history, their blind spots. They, like birders, contain multitudes. And look, clearly, the reality here is that we have a real paradox on our hands. What happens when something big, unwieldy, living mostly, but not entirely observable and constantly changing, gets a name, gets a structure. And this paradox is, of course, not limited to the story of I and the birding community. Historians deal with this all the time. So do linguists. The structure can never properly or comprehensively contain or explain its parts. The structure itself is a series of human choices and compromises and biases. But again, without it, what do we have?

 

Jer Thorpe: You're actually kind of allowed to spend some time with the bird as a bird.

 

Tamar Avishai: Some birders, like Jer, have talked about how liberating it can be [00:24:00] to step away from identification. Not only because you are throwing off the yoke of a problematic system, but because you get this really lovely experience communing with nature.

 

Jer Thorpe: It's like, wait a minute, birds are not a thing on a shelf that our job is to count. Like there are these miraculous creatures.

 

Tamar Avishai: I love this. I'm sold. Buy me some binoculars for my birthday, I want in. But I also recognize that Jer is an experienced birder who has already done enough legwork to put that database down and let the already identified bird song just wash over him. For me. I can't get in without being able to identify the birds around me without knowing what a sparrow actually sounds like, which, as previously stated, I do not. Not even a little bit. Or at least not that I know of. And Jer and Hamsini, they totally get this.

 

Jer Thorpe: We kind of find the value we want from these tools, but also maintain this kind of sense of wonder and this sense of like this kind of interspecies connection that happens when you when you really have an experience with an animal. It's a very magical experience, Tamar. You should do this right now. It's a perfect time in Cleveland. Install the Merlin app. Go press the the like listen button and it'll tell you all the birds that you're hearing.

 

Tamar Avishai: Okay, so opening Merlin sound ID listening for birds. Oh my God. American robin. European starling. I don't know if these are, like, lame Midwestern birds, but they're certainly birds in my backyard. [00:26:00] That was just a robin. How Sparrow. I think what's so amazing is that you hear birdsong as kind of one blanket thing, you know, like you just hear a chorus of birds and you never really think about what individual kind of instruments contribute to that orchestra. And I'm looking at this app right now and it's already listed five different birds. It's like five different instruments, except it's not an instrument that's like meant to be part of an orchestra. These are all different individual animals. And now I know that that little kind of doo doo doo. Like that's a robin. I never would have known that before, but I hear that one bird all the time. And that's the one that's pinging the most on this app in my backyard.

 

Tamar Avishai: So what happens when everyone has Merlin on their phones? Well, we become more connected to and more curious about the sounds in the sky and our backyards than we ever thought we'd be. And realistically, we're also becoming listening stations for data scientists logging birdsong into an app around the world 24 hours a day. And we are also counting the birds for everyone.

 

Jer Thorpe: And so, in a sad way, all of us citizen scientists are going to be out of a job if the job as described is counting birds. But I actually love that, right? I love the idea because it sort of [00:28:00] frees us, and it kind of asks us a question about like, how does a guy like, you know, it's the opposite of like taking our jobs? Well, it is taking our job, but it's like a job that that we're only doing because we are the labor that can do it right now. But if don't know if these, like, every park had these little listening stations, that would mean that I could, like, be freed from this burden of having to count and could go could go and like, just watch, you know, just watch the birds.

 

Tamar Avishai: Calvin, can you hear the birds? Come here.

 

Calvin: Yeah.

 

Tamar Avishai: Do you hear the birds?

 

Calvin: Yeah. What's that sound?

 

Tamar Avishai: That's a crow. That's an American Crow. And they're talking to each other. Can you make that sound?

 

Calvin: Row, row, row.

 

Tamar Avishai: Next time. On knowing machines, we dig into the question of AI and the media. Journalism is, of course, just one of the many vocations wrestling with AI. But in a world so concerned with the truth, whatever that means, and with facts, whatever those are, generative AI feels like a particularly uneasy wrinkle.

 

Mike Annany: I think the big motivation for this project is thinking about what the heck does it mean to live in worlds where news might be coming from a mix of humans and machines?

 

Tamar Avishai: We'll see you then.