Wednesday, February 6, 2013

Rapes and the Indian Justice System: An experimental data visualisation

This weekend, I attended an excellent big data visualisation workshop organised by Hacks/Hackers Delhi. The idea was to get journalists and techies to collaborate on investigative, data-driven stories and tell them in intuitive ways. Numbers dull the reader's mind. A well-designed infographic can convey complex ideas in a single frame.

The group I was a part of included journalists Nasr ul Hadi, Rajan Zaveri, Aayush Soni and my colleague at ITG Rohan Venkat. Our 'techies', who did much of the heavy lifting, were Piyush Kumar and Konark Modi. We were also joined by Yuan Lei, a journalism student from Shantou University in Guangdong. Since it's been such an important story in recent weeks, we looked at how rape cases in India are treated by 'the system'.

Before I get to our findings, I want to add a short note. We had just three hours to find data, organise it, clean it, 'query' it, and generate the visualisations. Not a lot of time. I'm sure a team working with more resources (particularly time) will draw more impactful conclusions. Our intention - since this was not a formal editorial process - was to start the conversation. We focused on three questions based on the data we had immediately available.

1. Adjusted for population, which states have the highest incidence of rape? 
For brevity's  sake, we called this 'rape probability'. In other words, how many rapes per thousand people. (Total reported rape cases/ State's Population x 1000).

Some states such as Mizoram appear to have an unusually high 'rape probability'. This may simply be because more rapes are reported, and not necessarily because women are more at risk. The national average was about 0.03 rapes per 1000 people.

2. If a rape case is reported in a state, how often does it result in a formal chargesheet? 
We called this 'chargesheet probability'. (Number of cases where charges are framed / Total reported cases x 100).

The clear outlier here is Manipur with just 9% of reported rapes ending in chargesheets. Is that only because of AFSPA? We cannot draw that conclusion until we know who the suspects are in each reported case. I also noticed an oddity. Three states - Andaman & Nicobar Islands, Tripura and Goa  - have 'chargesheet probabilities' higher than 100%. We didn't have time to find out why, so if someone out there could help in explaining that, I'd be grateful. The national average here was about 80%.

Update:  As Twitter user Pramurto Mukhopadhyay explains here, one likely reason for why the 'chargesheet probability' for Andaman & Nicobar Islands, Tripura and Goa crosses 100% is because there may be more than one accused per case. This is possible in the case of gangrape, or if the main accused had accomplices.

3. Finally, of the total reported cases, how many result in convictions? 
We called this 'conviction probability'. (No. of cases ending in convictions / No. of reported rapes x 100).

The outliers here are Nagaland and Sikkim with convictions secured in nearly 70% of cases that went to trial. Kerala was personally surprising with just a 2.7% conviction rate. The national average was about 18%.

Data sources: 2011 National Crime Records Bureau, National Census data

P.s. Even though we had the data, Google Fusion Tables would not generate visualisations for Jammu and Kashmir. It automatically marked the territory as 'disputed'. Oddly, while it marks Arunachal Pradesh with similar diagonal lines, we still get the data represented on a map. I have contacted the Help Team about this and will post an update if I get a reply. 

P.P.S Several states and U.Ts would show zero in their data fields. That's mostly because data was either unavailable or could not be reliably 'cleaned'. 


  1. Interesting stuff. What was the source of the data?

  2. Hi Sid,

    Sorry for the late reply. (I really need to turn on alerts!)

    We took the population data to normalise our crime stats from the National Census website ( They've got more detailed data sets but you'd need to register for that. It's quite painless so I'd recommend it.

    The crime stats were from the NCRB website (

  3. The word 'probability' is really misleading and open to misinterpretation because there is already a concept of statistical probability. By using the word you're evoking that however your data has nothing to do with probabilities!


Be respectful to others here or your comment will be deleted.