Martha Tsutsui Billins (host): Hello, and welcome to Field Notes, a podcast about linguistic fieldwork. I’m Martha Tsutsui Billins, and today’s episode is with Richard T. Griscom. Richard T. Griscom is a post-doctoral researcher at Leiden University. He obtained his bachelor’s and PhD degrees from the University of Oregon. Richard’s research focuses on language documentation, fieldwork methodology, and functional-typological linguistic description and theory, with a special emphasis on languages of East Africa. Over the past five years, he has been working with the Asimjeeg Datooga and the Hadzabe, both endangered minority language communities of northern Tanzania. He is a recipient of two grants from the Endangered Languages Documentation Programme and is a depositor of the Endangered Languages Archive.
In today’s episode, Richard and I talked about how researchers can utilize mobile technology to make recordings and do analysis over great distances in communities that maybe don’t have much electricity. Another thing we discuss is how digitizing fieldwork workflows can increase efficiency and accuracy in the field, as well as what community collaboration has looked like for Richard’s projects.
MTB: Welcome, Richard. Thank you for coming on to the show.
Richard T. Griscom: Oh, thank you very much for having me.
MTB: So to start, can you take us through your fieldwork biography, so where you’ve done fieldwork?
Richard T. Griscom: Yeah, so most of my fieldwork has been in northern Tanzania. Tanzania is located in East Africa. So, I’ve been primarily working with the Asimjeeg Datooga for the past four years, and that’s through projects funded by the Endangered Languages Documentation Programme, and then also the Firebird Foundation for Anthropological Research. And then, to a lesser extent, I’ve also worked with Hadza speakers, who also live in the same region of Tanzania. And then I’ve also worked briefly with the last speakers of an unclassified language known by some as Omaiyo, also in the same region of Tanzania. And I’m now starting a two-year documentation project with the Hadza funded by ELDP.
MTB: Briefly, can you go into about what you mean by “unclassified”? Like unrecognized, or…?
Richard T. Griscom: Well, there’s so little information on the language and there are so few proficient speakers left that it’s not exactly clear which family the language would be classified as belonging to. There is evidence of contact with some other neighbouring groups, but the lexicon and a small bit of what appears to be morphology that we have documentation of doesn’t really correspond to any of the families in the region, so as of now, we’ve kind of left it as unclassified. It could be an isolate, perhaps, but we don’t really have enough information to say.
MTB: That’s really interesting. So, can you briefly describe the languages you research in more detail?
Richard T. Griscom: Yeah, so Asimjeeg, or it’s also known as Isimjeega, it’s a variety of Datooga, and Datooga is considered by some to be a dialect cluster, and it’s considered by some others to be a group of closely related languages, but either way you view it, they are language varieties that are members of the Southern Nilotic subfamily, which is one of three primary branches of the Nilotic family, which consists of a number of languages spoken throughout East Africa. And the number of Asimjeeg Datooga speakers is estimated to be no more than 3,000, and all of them reside in northern Tanzania.
And then Hadza is a language isolate. It is spoken by around 1,000 people around Lake Eyasi in northern Tanzania, and they are a traditionally foraging society. So, a few hundred members of the community continue to practice a nomadic hunting and gathering lifestyle in the bush, while others have started to adopt a more sedentary lifestyle on the periphery of the bush in villages that are typically occupied by members of other ethnic groups, such as the Datooga, but also the Iraqw, or the Ihanzu, or Sukuma.
MTB: Can you talk a little bit about your main research interests?
Richard T. Griscom: Yeah, so generally I’m interested in language documentation, and then also more specifically fieldwork methodology. Also, functional-typological approaches to linguistic description, especially with a focus on morphosyntax, and more recently language contact and variation.
MTB: And going back to Datooga, because that’s the language you’ve worked with most extensively at this point, right?
Richard T. Griscom: Yeah.
MTB: Can you tell us a bit more about the sociolinguistic context? Is there language shift happening? What is the — you said there’s only 3,000 speakers left, so are they all elderly? What’s the situation?
Richard T. Griscom: Right. Well, speaking generally about the Datooga, they’re relatively new to Tanzania. So, they arrived in Tanzania about two or three hundred years ago to the region just north of what is now the Serengeti National Park, and only within the past 100 years, they have now spread throughout different regions of the country. So, they’re relatively new to the regions in which they now live, and for that reason, they’re kind of set apart from many of the Bantu agriculturalist communities of Tanzania that have resided in those regions for the past 2,000 years or so. Now, due to a number of pressures, many of the Datooga have now adopted a mostly sedentary agropastoralist lifestyle, so they’re now practising some agriculture, but they’re typically seen as being somewhat distinct from the Bantu ethnic groups of Tanzania. And for that reason, they are typically a marginalized community within Tanzania, especially those that continue to practice traditional lifestyles, and then among the Datooga, the Asimjeeg are additionally marginalized, so they are described by some as a slave group of sorts, so they were subservient to some of the larger Datooga ethnic groups or Datooga subtribes. And for that reason, there are a number of Datooga who could not intermarry with the Asimjeeg. They would not reside together with the Asimjeeg, so they were largely a marginalized group even among the Datooga.
So their existence outside of areas where Datooga reside is essentially unknown, so even among a marginalized kind of macrogroup, they’re even more marginalized. And because of that, there’s a lot of language shift that’s occurring. It varies from community to community. So in the most remote community that I visited, Asimjeeg Datooga continued to be used by children, and most of them were urban communities that I visited. It was used by elders. There were some children who were continuing to use the language, but only in certain neighbourhoods. It’s actually quite interesting that [in] the village that I lived in the most, those that were in the centre of town where there was contact with other ethnic groups, the children would be speaking Swahili, but those on the edge of town more near the bush where there’s less contact with these other ethnic groups, the children would be speaking Asimjeeg. So it really varies quite a bit, but in some areas, there is significant language shift, especially north of the Serengeti, and I would say quite confidently that the language is in danger.
MTB: Is there any relation to Gorwaa, the language that Andrew Harvey works with [who] we had on the show earlier?
Richard T. Griscom: So there are connections to Gorwaa and Iraqw through contact, but they’re not related genetically, so to speak. So Asimjeeg Datooga is a Nilotic language, whereas Gorwaa is a Cushitic language, but having said that, historically, these language groups have co-resided in the same region for many years. And there are what have been described as areal features that all of these languages share, even though they come from totally different families or different phyla, even. And there was a book chapter specifically on this topic in a book called The Linguistic Geography of Africa, and this chapter was written by Derek Nurse, and Maarten Mous, and Roland Kießling, and it’s about what they described as the Tanzania Rift Valley Area, and in this area, a number of genetically distinct languages all shared similar features. And so that’s something that Andrew and I have been looking at, and so we’ve been doing these kind of larger documentation projects with these language communities that speak languages that aren’t related but then happen to have these similar features.
MTB: Okay. Yeah. That’s interesting. Can we talk a bit about how you’ve utilized mobile technology to make recordings and conduct analysis over great distances with minimal electricity?
Richard T. Griscom: Yeah, so mobile technology, and also mobile money systems, they’ve created some new ways for fieldworkers to engage remote speech communities, so now you can do many aspects of fieldwork while you’re not even in the field, and there are a number of prerequisites for doing this kind of work. One key is that you really need to build strong relationships with the community members that you plan to work with remotely. You also need to maintain regular contact with those community members, and you need to provide some specialized training for this remote work. So, developing the strong relationships, those are important for maintaining good contact. It’s good to share information about your life when you’re not in the field to give people an idea of what kind of issues you might run into while you’re working remotely, so if you have a tight schedule when you’re at university which you might not have when you’re in the field, then you can tell people about that so they know in advance that when you say you need to meet at a specific time or you’d like to get something by a certain time that then they understand why. Also, any issues with budget concerns, for example. Then you be kind of upfront about those so the expectations are clear while you’re in the field so there’s not a misunderstanding when then you return home and then try to engage in this kind of work. It’s also important just to continue regular contact with community members that you’re working with, so this will keep you aware of any issues that they are facing, and it also demonstrates to them that you value your relationship with them and give some sort of continuity to these kind of longer-term projects that involve time both in the field and out of the field.
Also, in terms of specialized training, you’ll have to offer some training that is separate from any training or in addition to any training that you offer for doing work in the field, because some of the tasks involved in remote fieldwork are a little bit different from those that you do while you’re in the field. So you can try to, for example, recreate the context of being out on the field by working from a different town. So you could then, for example, do like a Skype or WhatsApp call from a different town and kind of pretend that you’re out of the country and try and do some of this remote work and then just see what works and what didn’t work and then get back together and kind of review what happened. But there are a number of tasks that you can do remotely, so you can monitor the progress of community data collection, so if you train community members to do their own data collection, or if they’re doing their data collection on their own to begin with, you can check in with them about what data they’ve collected. You might even be able to get data directly from community members. You can conduct elicitation directly over voice, voice over IP, or indirectly by providing community members with data to be recorded, so of course that requires some specialized training. Voice over IP, it really depends on the connection that you have, if it’s a good connection. So that’s like a Skype call or WhatsApp call, so if you have a good connection and you can get a good recording out of it, then it might be worth your while. Generally, though, I found if you want a nice recording, it’s best to provide them with the data, so, again, a spreadsheet or a text file of some kind, and then give them specialized training. So say, “If I give you this list of words,” like say if you just want a list of nouns, and say, “I want these nouns in singular and plural,” and you kind of train them on how you’re going to notate that, then they can read that list and then translate that to their own native tongue and then kind of do a self-elicitation of sorts. Or they can also elicit that from another speaker, so you can kind of coordinate these sorts of activities remotely, and then you can actually get elicited data while you’re not in the field, which can be very useful. You can also prepare natural speech data to be transcribed or translated by community members depending on your workflow. So previously, I’ve trained community members to use mobile phones to do transcriptions and translations. In order to effectively use that workflow, I had to prepare some files and then post them on a Dropbox, and then community members would download them, and then they would do the transcriptions and translations and then re-upload them, and they would be communicating over WhatsApp to kind of coordinate all of this stuff. And then once you have those initial transcriptions and translations, you can also check in with them to revise the work that you’ve done.
MTB: So are they doing the translations and the transcriptions on the phones, or are they using… Are they using ELAN, or how is it working for that in that way?
Richard T. Griscom: Yeah, so you can do it either way. In my experience, doing transcriptions and translations on the phone is possible and in some contexts is maybe the only way it can be done, so if you’re working with a very remote community that has no electricity, then using a computer might not really be practical unless you set up like a solar system of some kind. Now, having said that, it’s not really ideal. It takes a lot of extra work, so with that sort of workflow, you have to kind of prepare files, so you have to, say, take a text that you’ve recorded, and you have to segment the audio. Then you have to export each segment of audio as a separate file and then probably convert those to MP3, and then you have to put those in a compressed folder like a ZIP file and then send that to the community members. Then they download that, and then they extract it. Then they create a spreadsheet. And then what is kind of cool about it is, they can then enter the transcriptions and translations directly into the spreadsheet, and each row of the spreadsheet corresponds to each segment of the audio, so then when they upload that spreadsheet, then you can take that data and, with some kind of data wrangling, put that into a file in ELAN so that the transcriptions and translations from the spreadsheet then go into different tiers in the ELAN file. So it’s kind of cool how you can do that, but it’s definitely, it’s a process that relies on the fieldworker doing a lot of manual processing of the data.
What is more ideal is to train community members to use laptops with ELAN, if you’re using ELAN for your workflow, and that way, they can just enter the transcriptions and translations directly, and you don’t have to do all this data wrangling. That saves you a lot of time. So that is one issue that I ran into in the past is, it became a… It took a lot of time for me to process this data to use that particular method so that community members in very remote communities could continue to work on transcriptions and translations. It definitely is possible, but it’s challenging, so you have to devote a lot of time to that sort of thing when you might be busy teaching or taking courses, doing other research.
MTB: So even in the most remote scenario with the most remote communities, there’s always the option to use the phones, though, to translate and transcribe.
Richard T. Griscom: Yeah, so as long as they have intermittent access to mobile data, so not necessarily even where they live, but in a town nearby, they can be working on a smartphone and doing transcriptions and translations, they’re doing other kinds of things, and then when they go to town, then they can upload that data. And especially if they’re only uploading text data, then it’s really quite easy. If they’re creating recordings, then you definitely need at least like a 3G connection. Nowadays in Tanzania, there’s 3G all over the place, so it’s actually quite easy, but video, I think, we’re not quite at that point yet, or we can easily upload video, in these sorts of remote communities, but there is a lot of work you can do if you plan ahead and you provide the proper training.
Another aspect of this, at least in East Africa, is that there are now mobile money systems. It’s called M-Pesa is how it’s generally referred to in East Africa. So that means that with services such as WorldRemit, which is a website online that I’ve been using, you can send funds directly from your account to members of a speech community. Even those who are living in areas with no mobile data, with no electricity, they can get funds directly from you. And usually, the services cost like a few dollars each time that you use them, so I was sending perhaps like $100, it would cost between two to four dollars to send that money. Then usually you need to keep a work log so you’re tracking how much each person is working, and then once they’ve worked a certain amount, then you can send the funds, and then it will go to them directly, and then they can take cash out, even in the village. So that enables this kind of remote work as well. And then, of course, if you are able to set up a bank account in the country where the community is, then you can also do that remotely so you can do bank transfers and things like that.
MTB: Yeah. That’s cool.
Richard T. Griscom: It’s something that’s totally new to us. This was not possible at all even 10 years ago, so we’re kind of just discovering it, and we’re just trying to figure out what the possibilities are, what works, what doesn’t work, what’s easy, what’s hard. I kind of find that fascinating, and I feel that we have a lot to gain from trying out these new technologies, because we can continue to do a lot of this work and maintain these connections that we have with communities when we’re not in the field, and I think that we can improve our connections with speech communities a lot through these new technologies.
MTB: Yeah. Yeah, definitely. Can we discuss how we can assess what technological tools are most useful for communities and appropriate for different fieldwork contexts?
Richard T. Griscom: Yeah, so there’s a lot to consider when you’re assessing the technologies that you’re using. I’d say that generally it’s a good idea to try to assess the technology that you’re using at every time that you plan to use it when you go to the field. Well, every time you’re going to the field, you should be assessing. which technologies are you taking with you? What are you planning to use? And there are a lot of factors to consider when you’re making those assessments. So you want to be thinking about whether or not the technology will still be available for the entire duration of the project, so if you start with a certain kind of technology and then a year later or two years later it’s not around anymore, then that could be problematic. You also want to use reliable technology, so you want something that’s robust, something that will last even in remote field conditions. It’s nice to have redundant technologies, so if you have a technology that has some advantages, that could be good, but if it fails, you need to have a backup. So like just thinking of a laptop computer, if your laptop fails, well, you should have a notebook somewhere so you can still write things down if you need to.
Some other criteria that you might consider include things like scalability and interoperability. Scalability has to do with the technology being equally useful for small teams and large teams or even individuals in these larger teams, then also small and large data sets. So you want something that is just as useful for one person working on their own as well as these large teams who are kind of working together. So if you have something that enables you to easily share your data with other people and work on data together, then that’s a plus.
Then for interoperability, it’s also good to choose technologies that can be easily utilized with other technologies, so if the output of your camera or your audio recorder or whatever software you’re using can’t be used by anything else, then obviously that’s kind of a bad idea, so it’s best to try to produce outputs that can be used by as many technologies as possible. So that will make it easier for you, but it would also make it easier for anyone else who wants to use the outputs that you’re creating, so community members who want access to recordings, or other researchers who might want access to your text data.
Also, just thinking about the outputs in terms of quality. You, of course, want to choose technologies that produce as high quality as possible, so in language documentation we have certain standards for audio and video quality, for example. And you also want it to be choosing technologies that produce an appropriate format. And then, of course, finally, you also want to think about the cost of the technology, so looking at your budget. How much can you afford? What’s the best technology that you can afford, given the money that you have?
And then thinking not only about the technology itself, but about who’s going to be using the technology, there’s some other criteria you can consider when you’re making these sorts of assessments, so is the technology easy to use not only for you, but for anyone else who might be using it? So if you have a really nice video camera, and you go into the field, and you’re shooting video, and then you get sick and you have to kind of lay low for a week or two, but there are other community members who could be using that camera to shoot video, but then the camera’s too complex and they can’t figure it out, well, then that doesn’t work out very well for you. It would be nice to have a more simple camera that some other people could use, but also when it comes to audio recorders and software and things like that, you want to be considering ease of use.
And then also, especially when you’re working with teams of speakers, you want to consider the value that the technology holds for the speakers and for the community. So if… Going back to the topic of smartphones. In East Africa, smartphones are very popular, so even if they aren’t always the most effective tool, they are highly valued. It’s sort of a status symbol, so if you give a smartphone to someone, then they will greatly appreciate that, and that will kind of increase the incentives that they have to contribute to the project and to participate. So that’s something to consider, this kind of the social value of the technology.
But then also when you think about kind of the longer-term trajectory of the community when we provide technologies that enable community members to get access to more job opportunities, for example, then that’s always a good thing, so computers are really great for that. So if you can provide training on just how to use a computer, how do you create text documents, how do you surf the web — things like that — then it can be a very powerful experience for community members and can give them access to opportunities that they otherwise would not have access to. And that is generally a good thing for the community and good for you, too. So those are things you want to consider when you’re also choosing the technologies that you employ in the field.
And just a couple of things I would add to that is, when you’re doing assessments, you should always reassess whenever you go back to the field, so if you’ve used the technology before and it went well, well, that’s good, but you want to think about it again before you use that same technology when you go into the field another time. So if there’s a newer technology that has some advantages over the technology, then you might want to consider that, or there’s changes in terms of the use of that technology in the field, then you want to consider that. So when we were talking about mobile phones, again, the availability of mobile data in the field is changing rapidly around the world, so that’s something that you have to kind of update in your mind as you go. “Okay, where is there mobile data available now in my field site?” So what does that mean in terms of which community members can utilize mobile phones or can utilize a computer together with a mobile phone to upload data? So you have to kind of continually be reassessing the conditions.
And generally, I think it’s a good idea just to experiment with different technologies. There are a few reasons for that. One reason is that there always will be something out there that you didn’t know about that might be really useful, but also, technologies are developing rapidly, so there are always going to be new things. Every year, there are new technologies, and we should be trying those out, because there might be new possibilities that we haven’t explored yet, but also, when we utilize new technologies, we then kind of support the development of technologies that are kind of catered towards our work, so when you utilize software that’s designed for linguistics, like ELAN, and you support that software, then that supports its development, and then that results in the improvement of that software. So we make it easier for our discipline when we explore these new tools and their applicability to the kind of work that we’re doing.
Also, these days, fieldworkers have to think a lot about archiving, so most fieldworkers are expected to archive. They’re also expected to make their research reproducible, and that means making it accessible through an archive or some other repository and using things like persistent identifiers and other identifiers so that audio segments can be identified within kind of a longer text. And these things require different types of data processing, this kind of data processing that is not traditionally taught in field methods courses. So now, young and upcoming fieldworkers, they have a lot of work to do, actually, to figure these things out on their own, and that’s something I’ve been kind of trying to share more widely is how we can start to tackle these issues of archiving and reproducibility and incorporate them into our fieldwork methodologies. So you need to think about the end result. What does it actually mean to archive? What metadata is required to archive? What are the formats that you need to archive? And then work backwards. And this kind of backward design can strongly inform your workflow so that you know what to start with. If you start thinking, “Well, I’m just going to go record whatever, and I’m going to record certain kinds of metadata that I’m familiar with, and I’ll figure out that archiving stuff later,” well, you could easily run into some big problems. If you know exactly which metadata categories you need to enter into the archive, then you can collect that metadata when you’re creating the data in the first place, and that will make your life much easier. So that’s also very important when you are assessing the kind of technologies that you’re using and especially the software that you’re using, the data formats.
MTB: Yeah. Yeah. That is so true. When I was working at the Endangered Languages Archive, we had some older collections come to us, and the metadata was in the CMDI format, which at the time was not a format that we could accept, and there’s no kind of clean way to convert CMDI to IMDI, which is what we use at the ELAR, and it was just a nightmare, and it took ages, and it was a lot of work for the depositors and for us to try to get the deposit into good shape to be archived, even though maybe the deposit was perfectly fine, but the metadata was not in the correct, or not correct, but in a format that we could take.
Richard T. Griscom: Right. No, yeah. That’s definitely a huge issue, and I would definitely suggest for anyone who’s considering starting a career in linguistics fieldwork that it definitely helps to do a little bit of coding. And this is something that I picked up just quite recently when I was depositing at ELAR as well, and I found that it’s very helpful if you’re able to convert data from one format to another. And oftentimes, the software that we use enables us to do that, so if you use a certain text editor, you can usually export or save as different formats, but these kind of lesser-used formats like IMDI and CMDI, we don’t really have the tools for converting from those formats, from one format to the other, so sometimes it’s necessary to do that manually. And that’s one reason why I strongly suggest that people focus on outputs that are interoperable, so if you create an output like just a spreadsheet, even, or a tab-delimited text file, then it’s actually very easy to convert that format to any other format, so it’s something that’s kind of not intuitive to most people that actually plain text is oftentimes the best output, because that’s the easiest to convert to a different format, whereas something like a Word document, well, that could be really hard to convert. You know? So that’s something that I think people are now starting to think more about and I think will result in increased efficiency in our workflows in the future.
MTB: Yeah. What do you use to make your metadata? Do you use Arbil, or…
Richard T. Griscom: Well, that’s a good question. So because I’m primarily depositing at ELAR and the ELAR depositing workflow is changing, my colleague, Andrew Harvey, and I are working on sort of a project-specific workflow that involves creating a Python script, which is really quite basic, and just takes speaker metadata from a tab-delimited text file or a spreadsheet and session metadata from another tag-delimited file or spreadsheet and puts it into the IMDI format. So we use that based on a template that ELAR provided us with so we know, say, if you have a speaker filling a certain participant role in a session, that we know how to code that in the text file, and we just create a script to kind of automatically create that based on the information that we have in the spreadsheets. We’re hoping to create that, actually, in the next two weeks so that at the time that we’re ready to deposit our data, we can essentially take our metadata, run the script, and it will automatically create all of the IMDI files, and then we can deposit, which should make it much more efficient. And then of course, it also helps to have the metadata in a digital format to begin with, and then everything just goes a lot faster.
So that’s what we’re working on right now. And I do hope that at some point in the future, we have some sort of tool that’s specifically designed for this kind of depositing workflow with archives such as ELAR so that it’s a lot easier and it’s much more efficient than the workflows that we’ve had, because it can be very time-consuming to convert metadata and to also convert the primary data. So it would be good to find a workflow that enables us to easily produce those outputs based on some sort of standardized format that’s easy to access, like a plain text file.
MTB: Yeah, for sure. Can we talk more about digitizing fieldwork workflows for increased efficiency and accuracy? So digital data entry instead of handwritten notes?
Richard T. Griscom: Right. Yeah, so that’s one of the big points. If you think back 50 years ago, what was linguistic fieldwork like? Well, it involved a lot of non-digital outputs, of course, so things like handwritten fieldwork notebooks, analog recordings on like reel-to-reel tape or cassette tapes, printed photographs, and other things of that nature, but also secondary outputs like publications, they weren’t digital, so the means of sharing your fieldwork data was primarily through publications, so we weren’t actually sharing our primary data directly.
Now, today, of course, the context is very different. So we have the ability to create more or less infinite data sets, and we can share them with anyone around the world, and the communities that we work with, they also often have access to many of the same technologies that we do. However, our fieldwork methods, they have not changed that much yet. They’re starting to change, I think, but they’re still mostly rooted in the context of the past. So when we talk about digitizing fieldwork, we’re talking about kind of a reassessment of the outcomes of fieldwork. So what are we actually trying to produce? Because outputs are changing. So we’re creating larger databases. We’re creating data that is accessible and citable, but we’re also using different technologies in the field, and our methods are starting to change as a result. So this issue of finding the balance between producing larger data sets and then, at the same time, conducting what we call reproducible research — that is, research that can be recreated, reproduced by another researcher — we actually are lucky within the discipline of linguistic fieldwork because we have two solutions available to us to solve this kind of imbalance between the two, or the kind of competing interests of those two.
So we can increase the efficiency or scalability of our methods by using different technologies or different methods. We can also involve community members in as many aspects of fieldwork as possible, kind of the crowdsourcing method. In terms of increased efficiency, as you said, one of the biggest things is just, minimize digitization tasks. So even when I was studying as a PhD student in my field methods course, we were writing notes in a notebook. And there are many benefits to writing by hand in a notebook, because you can draw pictures. You can colour things differently. You can do essentially whatever you want. However, when you create that non-digital record, if you want to produce a digital output, you have to digitize it at some point, and that is going to be done manually, especially if it’s handwritten, and that can take a lot of time, so if you have books and books and books that you’ve written in the field, it could take you weeks to digitize those transcriptions and translations and other notes.
So one easy solution to that is to create a method for entering your data directly into a computer or a mobile phone or a tablet so that the data is digital to begin with. So this means you don’t have to digitize anything, but also, you can create data that is specially designed so that you can then process it easily later. So for example, when you’re creating a recording, oftentimes when I was first starting as a student, I would conduct elicitation sessions with the audio recorder on, and I would be asking follow-up questions, and I would be taking notes and maybe just kind of conversing with the speaker I was working with, and all of that was being recorded. And then after the session, I would look at my notes. I would go, “Okay, I took good notes,” and then I would look at my audio recorder and be like, “Well, huh. What am I going to do with that recording? I just recorded an-hour-and-a-half-long conversation. What am I going to do with that?” And it would just sit there, and I would never do anything with it, because it takes so much time to find each individual production of each individual word if you want to get access to those recordings. And that largely stems out of, again, this historical context that in the past, there was a primary focus on the notebook. So the notebook was kind of the ultimate record, and if you wanted to check the notebook, okay, then you could refer to the recording, but nobody would regularly listen through all the recordings and then try and pull out each of the productions and make those accessible in a database or anything like that.
So now that we have all these technologies available to us, we can change our methods to take advantage of them as best as possible. So we have some software, including software such as Praat, which can automatically segment audio based on certain variables. So the automatic segmentation of Praat is fairly simple, but it’s easy for people to use, and a lot of people are familiar with Praat. So what you can do is, you can create an elicitation session that can then be easily automatically processed by Praat.
The way that you could do that is, if you’re eliciting words that you’ve never elicited before, then you can first do the sort of traditional elicitation session (with the audio recorder running, if you want), and you’re taking notes, and you’re kind of checking your transcriptions and talking, you’re communicating with the speaker. Then after that, you create a recording that is specifically designed to be automatically segmented by Praat. So then you create a list of elicited items that you plan to record, and you have the speaker produce those items in that order, and you could have them do three repetitions, for example, and you do them kind of at a specific pace. And then Praat can very easily automatically segment all of the productions, and then you can easily align that with the text data that you’ve entered into the spreadsheet while you were conducting the first part of the elicitation session.
And this is what my colleague, Manuel Otero, and I call the Digital Notebook Method. So the Digital Notebook Method essentially enables you to conduct an elicitation session and then within half an hour or so, you have the recording together with time-aligned transcriptions and translations. So no longer do you have to wait a week or so until you have time in your schedule to go back through your recording and segment it manually and digitize your notes and then put all of that together, you can do all this automatically, but only if you set it up and plan ahead, so you have to be thinking about these things before you even press the record button. Right? So it kind of takes some forethought. So that’s one of the things I’ve been focusing on right now. My colleague, Manuel, and I, we’re working on a paper for the Journal of Language Documentation and Conservation specifically about this Digital Notebook Method.
MTB: I want to learn more about that.
Richard T. Griscom: I do have, I think, one or two video recordings that you can watch as well where I go through some of the steps of the method so that anyone who’s interested could go online and try it out. And there’s links to that on my website.
MTB: Okay, and I’ll put them in the show notes, too, and on the Field Notes website, so they can find it.
Richard T. Griscom: Okay, great.
MTB: Can you talk a bit more about how community collaboration has impacted your project or improved your project?
Richard T. Griscom: Yeah, so to give a bit of historical context, my colleague, Andrew Harvey, and I, we both received fellowships from the Firebird Foundation last summer, actually, to train members of the Gorwaa and Asimjeeg Datooga communities to collect recordings, and these fellowships, they strongly influenced the development of our approach to community collaboration. So the Firebird Foundation has a strong emphasis on the training of indigenous communities to collect their oral traditions, collect recordings of their oral traditions, and so that emphasis has kind of influenced our entire approach to fieldwork. So now, rather than going in as sort of the lone wolf linguists, we’re going in as sort of project managers or facilitators, so we go in and we start with a training session. We invite members of the community who are interested and have the kind of appropriate background or experience to be trained in language documentation, and this sort of… The training, it builds capacity within the community, so as we were saying earlier, community members who have training in use of audiovisual equipment and then also computers, they have increased prospects for job opportunities, so whenever possible when we do these trainings, we try to think not only about our project goals, but also community goals, so how can this training help the community in other ways?
Community training also changes the relationship between the fieldworker and the community, so when the community collects their own data, then they’re better able to achieve their own goals or language maintenance so that will strengthen the value of the project to the community. And this is really important, because it creates a sort of feedback loop where community members have an incentive to contribute to the project because they get some sort of value out of it, and then that brings more and more people in, and then you get a lot of community buy-in.
But also, again, getting back to the crowdsourcing method, so you want to involve as many community members as possible. So if you provide the proper training, a community member can do essentially anything that a linguist can do, so a community member can collect data; a community member can collect metadata; they can transcribe; they can translate; they can prepare data for archiving. They can do pretty much anything. And also I would say that this not only contributes to kind of the efficiency of your project, but it works towards the decolonization of linguistics, because traditionally linguistics has prioritized the needs of the academic community over those of the speech community, and in this way, you empower members of the community to kind of make their own decisions and to kind of guide the documentation project as it goes on.
Andrew, he’s gone another step, and he has worked with the Gorwaa to establish an advisory committee that meets every month, and that advisory committee monitors the progress of the data collection that is being conducted by Gorwaa community members. And this is a really effective way of increasing community engagement, and it also involves members of the community who might want to be involved in the project in some way, but they might not have the background that enables them to contribute through data collection, so especially like elders in the community who are not very tech-savvy, they can contribute to a language documentation or maintenance project by joining a committee of some kind and then helping to kind of offer guidance on data collection.
It’s also really important for the community to establish some long-term goals for language maintenance and language development. So if you’re working with a community that has no orthography, for example, what goals do they have or orthographic development? Do they have any goals for creating printed materials? Do they have other goals or like some sort of online presence? Do they want to create a cultural centre of some kind? But also, speaking kind of more abstractly, do they have goals in terms of language vitality? So if children are not speaking the language, do they have a goal, at some point in the future, of all the children in their community starting to speak that language? And I think of that sort of nature that involves community leadership and kind of setting objectives for the community to kind of organize together specifically around this issue of language endangerment.
MTB: Yeah. Yeah, for sure. Thank you so much, Richard. This was really great, really interesting. We didn’t have a chance to go over your equipment, but I’ll link all the models you sent me in the show notes so if people are interested in what kind of equipment you use, they can find all that. Where can people find you online and read more about your work, if they’re interested?
Richard T. Griscom: Yeah. Well, I have a website. It’s richardtgriscom.wordpress.com, and you can find some of my publications as well as presentations there, and then also on the Rift Valley Network website, which is riftvalleynetwork.weebly.com.
MTB: Okay. Cool. Thank you.
Richard T. Griscom: Yeah, thank you so much.
You’ve been listening to Field Notes, a podcast about linguistic fieldwork. This podcast is hosted and produced by Martha Tsutsui Billins with production help from Laura Tsutsui. Our music is by Lobo Loco, and our logo is by E.Vill Designs. If you have a question or a fieldwork experience to share, you can email us at firstname.lastname@example.org. You can also follow us on Twitter and Instagram @lingfieldnotes. If you’ve enjoyed this episode, please leave us an Apple Podcast review. Thanks for listening!