On social media, people are eager to share messages like the one shown below, that should indicate that blacks are targeting whites with their crimes, whereas whites are relatively nice to blacks. In this post, I want to have a look where the data comes from and how to correctly interpret and visualize the numbers.

Even if we assume that the data reported in the plot is correct, its conclusion is misleading and not supported by this data. In short, the reasoning is as follows: the US population has more white people than black people and therefore whites will be more often the victim of a crime than black people. Therefore, one cannot directly compare the ‘black on white’ and the ‘white on black’ bars to conclude that whites would be targeted disproportional by blacks when they commit a crime.

A first simple example

Assume I am walking on Michigan Avenue in Chicago with many shoppers around me. For simplicity, we assume there are \(100\) shoppers in my direct vicinity and they can be divided into two groups. There are 60 people belonging to group \(A\) and the remaining 40 belong to group \(B\).

I am in a bad mood and planning to pick someone from the street to give him/her a knuckle sandwich. If I pick my victim randomly from these 100 people around me, then there is a \(60\%\) probability that my victim will be of group \(A\) and a \(40\% \) probability that the victim will be of group \(B\). If I were to repeat this bad behavior 100 times, I will have, on average, 60 victims of group \(A\) and 40 of group \(B\). However, this would not mean that I am targeting, or that I am more violent towards group \(A\).

Another example

Assume there is a group of 100 offenders. Each of them will pick a victim out of the group of 100 shoppers on Michigan Avenue. The group of offenders has 50 people of group \(A \) and 50 of group \(B \). As such, group \(B\) is over-represented in the group of offenders relative to the group of victims (i.e. the shoppers). However, we assume that each offender picks his/her victim completely random. Offenders belonging to a certain group are not targeting the other group. Moreover, a shopper on Michigan Avenue can, unfortunately, be the victim in multiple incidents.

Since we assume independence between the offenders and the victims, it is straightforward to see that the probability that the offender will be of group \(B\) and the victim will be of group \(A \) is \((0.5)\times(0.6) = 0.3 \). And similarly, the probability that the victim is of group \(B\) and the offender is of group \(A \) is equal to \((0.5)\times(0.4) = 0.2 \). Therefore, out of the 100 incidents, we will have in total 20 crimes labeled ‘\(A\) on \(B\)’ and 30 incidents labeled ‘\(B\) on \(A\)’. However, it does not mean that group ‘\(A\) is a target for group \(B\)’.

The data

The data that is used in the plot on interracial violent is coming from the U.S. Department of Justice in the report ‘Criminal Victimization. 2018’. See: https://www.bjs.gov/content/pub/pdf/cv18.pdf. We start with using part of the data that is in the Table 12 on page 12 in this report. The table below shows in the column Population the distribution of the US population. Assume you are walking on a crowded street in Chicago on Memorial Day and you bump into a person. The probability that the person you bumped into is black is not the same as the probability that this person is white. Indeed, if this street is a representation of the US population, then there will be more whites, so the probability to bump into someone who is white should also be larger. We get from the table that there is roughly a 62.3% probability that the person you just bumped into is a white person, whereas there is only a 12% probability that this person is black.

Parts of Table 12 in Criminal Victimization, 2018 of Bureau of Justice Statistics, U.S. Department of Justice.

Assume I commit a violent crime in Chicago on Memorial day. If I pick my victim randomly from a crowded street in Chicago, the probability that my victim is white is 62.3%, whereas the probability that my victim is black is 12%. However, crime data suggests that a victim is not randomly sampled out of the population. You can observe that in the table in the column victim that the distribution of the victims is different than the population distribution.

If there is a crime, the race of the victim is unknown. Assume that we use the variable \(V \) to denote the race of the victim. Then we have, for example, that:

\[\mathbb{P}\left[V=\text{White} \right]= 66.5\%\]


\[\mathbb{P}\left[V=\text{Black} \right] = 10.8\%.\]

In this notation, we have that \(\mathbb{P} \) stands for probability and inside the brackets we have the event that the victim \(V\) has a given race.

Guessing with and without information

The other table that is used is table 14 on page 13 of the Criminal Victimization, 2018, report.

Offender Race
Victim RaceWhiteBlackHispanicAsianOther
Table 14 in Criminal Victimization, 2018, , Bureau of Justice: Percent of violent incidents, by victim and offender race or ethnicity, 2018.

Assume you are told that a violent incident happened and you have to guess/estimate the race of the offender. You have no information about the specific incident. Instead of just guessing, you can use the available national data, i.e. tables 12 and 14, to make a more scientific guess. For example, you can use Table 12. In that case, your best guess will be that the offender is white, since it will give you a 50% probability of being right with your guess.

Assume some extra information is revealed to you: the race of the victim is black. In this situation you can use Table 14 and see that you better change your guess. Given (or in mathematical terms ‘conditional on the event’) that the victim is black, it is much more likely that the offender will be black than that the offender will be white. Indeed, Table 14 gives the likelihood for the offender (which we denote by the variable \( O\)) given that the race of the victim, which we denote by \(V \) is known. For example, the table states that:

\[\mathbb{P}\left[O=\text{white}|\ V=\text{black} \right] = 10.6\%, \]

and similarly we have that

\[\mathbb{P}\left[O=\text{black}|\ V=\text{black} \right] = 70.3\%, \]

So by changing the guess from white to black, we will have a correct guess for the race of the offender in 70.3% of the situations.

Who is targeting who?

Assume a violent crime happened and you need to guess the race of the victim. We know already from Table 12 that without revealing the race, the probability that the victim will be white is 66.5%. Therefore, if you want to make a scientific guess for the race of the victim, your best choice is to bet on white. It will give you the right answer in 66.5% of the situations.

The question we want to answer is the following. Assume someone reveals new information to you about the race of the offender. You now have to guess the race of the victim, but you can use that the offender is black. There are two questions:

  1. Will you change your bet?
  2. If you do not change, will the likelihood to have the correct guess increase or decrease?

If we want to answer these questions, we first need to look at the following conditional probability:

\[\mathbb{P}\left[V=\text{white}|\ O=\text{black} \right].\]

Indeed, if we already know that the offender is black, how likely is it that he will pick a white person as a victim. Using basic probability theory (Bayes rule more precisely), we can write:

\[\mathbb{P}\left[V=\text{White}|\ O=\text{Black} \right] = \frac{\mathbb{P}\left[O=\text{Black}|\ V=\text{White} \right] \mathbb{P}\left[ V= \text{White}\right]}{\mathbb{P}\left[ O= \text{Black}\right]}.\]

We can change the condition from offender to victim. The conditional probability \( \mathbb{P}\left[O=\text{Black}|\ V=\text{White} \right]\) can be found in Table 14. The probabilities \( \mathbb{P}\left[ V= \text{White}\right]\) and \( \mathbb{P}\left[ O= \text{Black}\right]\) can be found in Table 12. Combining all these probabilities gives the following conditional probability for a black offender:

\[\mathbb{P}\left[V=\text{White}|\ O=\text{Black} \right] = \frac{(0.153)\times(0.665)}{0.217}\approx 46.9\%.\]

Going back to our two questions we can formulate the following answer: ‘if we have to guess the race of the victim, when we know that the offender is black, then our best choice is to bet on white.’ Indeed, there are 4 other categories for the victim and using the same reasoning, you can determine the probability that a black offender picks each of these other categories. You will then find that the probability will not be larger than 46.9% for the other categories. Therefore, your best bet will be to predict that the race of the victim will be white. However, this does not mean that blacks do target whites. Indeed, if we do not know the race of the offender, our best bet would also be to take white. Therefore, we also focus on the difference in likelihood:

\[\frac{\mathbb{P}\left[V=\text{White}|\ O=\text{Black} \right]}{\mathbb{P}\left[V=\text{White} \right]}=\frac{46.9}{66.5}=0.70.\]

This ratio of 70% states that although your best bet is to take white for the race of the victim when you know that the offender is black, the likelihood to end up with the right choice is decreased by 30%.

The distribution of the victim

If we replace white by black, Asian, Hispanic and other, we can determine the so-called conditional distribution of a black offender. This distribution is shown in the plot below with the blue bars. Each blue bar denotes the likelihood that the victim is of a certain race, if we already know that the offender is black. Similarly, we can derive the conditional distribution for a white offender in orange. Finally, the grey bars are representing the unconditional probabilities, that is, the likelihood that the victim has a certain race if no information is revealed about the offender.

The conditional distribution for the victim, given the race of the offender.

This graph shows that the distribution of a black offender is different than the unconditional distribution (grey bars). If you know that the offender is black, the probability that the victim is white is decreased and the probability that the victim is black is increased compared to the unconditional distribution, i.e. grey bars. For a white offender, the opposite is true. The probability that the victim is white is increased when we go from no information about the offender to the situation where the offender is known to be white. In that situation, we also see that the probability for a black victim is decreased. Therefore, we can conclude that neither white or black offenders are targeting the other race! Based on these numbers there is no proof for disproportional interracial violence between black and white people.


  1. Great article , the problem with the BJS victimization stats is the sample is to low 151,055 household interviews but in my opinion you get realistic overview of the crime rate when you correlate to the fbi crime stats wich are from police reports but not even the fbi stats are perfect since not all agencies provide ethnicity data , maybe you can do a similar article on https://ucr.fbi.gov/crime-in-the-u.s/2018/crime-in-the-u.s.-2018/tables/expanded-homicide-data-table-3.xls

Leave a Reply