Collecting qualitative data from online communities
There seems to be a tendency among new online researchers to assume that you need a way of capturing all the things in order get an understanding of what’s happening in an online community. This is a post looking at some reasons why that might not be the case.
Imagine that you’re setting out to produce an ethnography of the head office of a multinational firm. You select your target company, negotiate access, and go in for your first day on site. When you arrive, you’re shown into a room that is utterly overflowing with paperwork: huge piles of interview transcripts, reports and meeting minutes tower over the small desk. As you look through them, realisation (and a growing horror) dawns on you: you now have a ‘perfect’ image of everything you had intended to study. Every word of every conversation has been recorded, every document and email ever produced has been dragged out of archives, and all you need to do is make sense of it all. Easy, right? Well, no. Of course not.
A more typical approach would be to find ways of immersing yourself in the day-to-day operations of the company, spending some long stretches embedded in the organisation, finding and reading through relevant documents, and interviewing people of interest. You’d try to make sure that you saw a variety of activity: don’t only visit on casual-clothes Fridays, or you’ll draw some odd conclusions about the place. Visit in the quiet periods, but also spend time there during deadline season: the end of the financial year, that sort of thing. Your job is to see enough to understand how and why things work the way they do, filter that through the set of experience and biases that make you into the researcher you are, and communicate what you saw to those who read your work.
Some of those traditions are born out of convenience (or physical limitations). You have a finite amount of time that you can spend in the company. You can’t be in three different meetings at once. You won’t hear every conversation at the water coolers. So, you pick your opportunities as they arise, based on your (emerging) understanding of what’s important to the story that you are going to tell. This meeting is important, as it will help you see who opposes X. That conversation is not important, because you’ve already heard enough about Y.
And while you do it, you write notes. Copious amounts of lovely, lovely field notes. Writing them helps shape your understanding of what you’re observing, and aids recall later on.
So, should online communities be handled differently? You often won’t be bound by the same limitations: you can access every aspect of communications that you weren’t online to witness at the time they were posted. You can pull an archive of hundreds of simultaneous conversations. In our hypothetical scenario, you really can walk into that room and get buried alive under a pile of archive documents.
Last year, I spoke with another grad student who was looking at a group of online forum communities. She had written some software that automated the search-and-archive part of her work, and was using it to gather vast amounts of qualitative data: in the order of several hundred thousand conversations, and millions of words. The software was working fine, and the disk space was filling up. Her biggest question was “what happens now?” Ultimately, you need to analyse whatever you collect, and finish your dissertation before you die of old age…
I don’t think that “I can do X” should mean “I must do X.” Many of the same research decisions that are made regarding face-to-face ethnography are still just as valid when the interactions you’re observing take place online. Qualitative data still needs to get filtered through one or more humans, with their skills, training, biases and limitations. Magnifying the volume of data going in won’t solve your problem, unless your problem requires that volume – and if it does, there is a good chance that you should re-think your choice of methodology.
More data does not always mean that you’ll produce better answers to your research questions. Going overboard on data collection is easy: once started it’s fairly simple to replicate, and feels very productive! You’re not a data collecting machine though: you’re here to analyse and interpret that data, and then do something with the results. Making decisions about what to cut (and why) can be hard, but it gives you a project that can actually be finished.