What does sperm count count?
By Marion Boulicault
How to Count Sperm
One, two, three, four… Counting is usually assumed to be a pretty simple activity; it is, after all, one of the very first things that we learn at school. This assumption is also widely held within the field of semen analysis, the field that assesses and quantifies the properties of semen samples. Sperm count is considered to be a relatively straightforward measure, especially in contrast to more controversial semen measures like sperm morphology (sperm cell shape) and motility (sperm cell movement). In this blog post, I examine the ways in which sperm count is less straightforward than it might appear, and consider what this means for scientific research on global sperm count decline.
Let’s start with the basics: how is sperm counted? The first thing you need to do is obtain a sample to count. In most cases, samples will come from hospitals, fertility clinics or research laboratories, where individuals provide semen samples by masturbation in a private collection room. Once you have the sample, you need to prepare it for counting. There are multiple ways to do this, depending on the counting method and measuring instrument you’re using.
Let’s suppose that you’re using a hemocytometer with a Neubauer-improved counting chamber, a measuring instrument composed of a thick glass microscope slide etched with a grid-like pattern. First created in 1879 to count blood cells, the hemocytometer is widely used in andrology laboratories today. To use a hemocytometer, you first dilute the semen sample with water, which immobilizes the sperm cells and allows individual cells to be distinguished from one another under a microscope. For human semen, a dilution of 1.0mL of water to 19µL of human semen is conventional. Then, it’s time to transfer the diluted sample into the hemocytometer. You place a glass coverslip over the chamber and use a micropipette to insert 10µL of your diluted sample into a small opening (see here for detailed instructions). The next step is to place the loaded chamber onto a microscope plate and fix it in place. Focus the microscope so you can see a sharp image of the sperm cells on the grid.
Now it’s time to count. Start in one square of the counting chamber and count the number of sperm cells, note this number down, and then move on to the next, and so on.
The final step is calculation. To calculate sperm concentration, you take the number of cells counted and divide that by the volume of your sample (and be sure to factor in the dilution!) to return the sperm concentration, e.g. 20 million cells/mL. To calculate the total sperm count (i.e. the number of sperm cells in an individual semen sample), multiply the sperm concentration by the total volume of the sample. That’s how you count sperm.
Well, kind of. In reality, there are many more steps, concerning, for example, calibrating, adjusting, and maintaining the hemocytometer and microscope, making decisions about how to count sperm cells that lie across two squares or whether to count a sperm cell that is unusual or damaged, and engaging in quality control and assurance procedures. This complexity is evidenced by the fact that since 1980, the World Health Organization has published a 286 page laboratory manual (updated every 10 years) for the examination and processing of human semen. The manual offers highly detailed standards for sperm counting.
Yet even with this manual, many procedural questions and issues remain unsettled. We’ll talk more about these complexities in just a minute, but first I want to consider a deceptively simple question about sperm counting that reveals how counting sperm is more controversial than it might appear: what exactly are we counting when we count sperm?
What are we counting when we count sperm?
This question might seem strange in that it appears to contain its own answer. But in fact we can be more specific and informative: when we count sperm, we are technically counting the number of sperm in a particular sample obtained from a particular individual at a particular moment in time that results from following a particular counting procedure using particular measuring instruments. But if that were all we were counting when we count sperm, it would probably not be worth counting in the first place. This is because the reason we are interested in sperm count is because we believe that it measures more than just the number of sperm in a particular sample. Sperm count is a measure of far more significant things: fertility, male health and the possibility for continued human life.
The idea that sperm count measures male fertility and health is what fuels the panic over recent scientific studies and books showing that average sperm count among “Western” men has decreased by 50% between 1970 and 2013. In interviews with the media, Hagai Levine, the lead author of a hugely influential 2017 sperm decline study, describes his results as “very profound, and even shocking.” The BBC pronounced that declining sperm count could “could make humans extinct” and GQ agreed, suggesting that declining sperm counts means that “we're on track... to void the species entirely.”
Measuring our measures: The validity and reliability of sperm count as a measure of fertility
As we show in our paper, “The Future of Sperm,” claims about a reproductive apocalypse depend on many assumptions, among them the assumption that sperm count measures fertility. Is this assumption true? To answer this question, it helps to know what makes a good measure more generally.
There are two central criteria used to judge a measure: validity and reliability. In measurement speak, the thing that we’re trying to measure -- in this case, fertility -- is called the construct. A measure is valid if it measures the intended construct, which in practice means that the measure predictably correlates with the construct. So for sperm count to be a valid measure of fertility, sperm count would have to predictably correlate with fertility. And a measure is reliable if it correlates with the construct across multiple contexts, i.e. if it is a valid measure of the construct when used at different times, in different places and by different individuals. So sperm count is a reliable measure of fertility if it can be consistently used to measure different individuals’ fertility in different locations at different times, regardless of who does the measuring.
Is sperm count a valid measure of fertility?
In his article on Levine et al.’s 2017 study, GQ journalist Daniel Halpern warns his readers that “we are producing half the sperm our grandfathers did'' and that we are thus “half as fertile.” Though it seems plausible, Halpern’s inference is in fact too simplistic: it turns out that, even if we assume that we are producing half the sperm our grandfathers did, it doesn’t mean that we are half as fertile. Here are some reasons why.
First, the relationship between sperm count and fertility is not continuous. Above a critical threshold (around 40 million/mL), sperm count no longer tracks fertility, since a higher sperm count does not necessarily mean greater fertility. And even below 40 million/mL, the relationship isn’t straightforward: even holding other variables fixed, some men with lower sperm counts are more fertile than those with higher sperm counts, for reasons that aren’t clear. Dr Bradley Anawalt, male infertility specialist at the University of Washington, explains, “I always say normal is 15 million sperm per mL . . . But why is it that some men can conceive and other guys with the same numbers don’t conceive? It’s a great mystery what divides those two groups of men.”
The connection between sperm count and fertility is made more complicated by the fact that sperm count is highly sensitive to local conditions. As a New York Times article on sperm measures summarizes, “It turns out that sperm counts . . . are among the more mysterious measures of bodily functions, varying as much as tenfold, depending on such things as whether a man has ejaculated or had the flu recently and whether he is a hot-tub habitue.” Other factors, such as the degree of arousal pre-ejaculation and abstinence time, can also affect sperm count. Dr. Richard Sherins, the director of andrology research at the Genetics and IVF Institute, says that he’s seen men's sperm concentrations increase from 10 to 100 million/mL of semen "based on a few weeks of abstinence.”
To be clear, the fact that there exist limitations and complications concerning the validity of sperm count as a measure of fertility is not in itself a problem. No measure is ever perfectly valid, and sperm count can, when carefully considered in tandem with other measures, provide valuable information about fertility. Rather, the problem arises when these limitations aren’t sufficiently taken into account or clearly communicated in scientific research and in the media. It is irresponsible to move directly from evidence of a trend in average population sperm counts to claims of an impending fertility apocalypse without carefully considering and communicating the complexities of the relationship between sperm count and fertility. It’s irresponsible to claim that half the number of sperm means that men today are half as fertile as their grandfathers.
Is sperm count a reliable measure of fertility?
Is sperm count a reliable measure of fertility? There are many different dimensions of reliability (see here for more on reliability), but here I focus on the dimension of reliability across time, which is particularly salient to sperm decline research. Imagine a semen sample; let’s call it Sample A. If researchers in the 1970s were to count the number of sperm in Sample A, would they produce the same answer as researchers today? This question cuts to the heart of the empirical soundness of sperm decline claims: are we warranted in comparing previous sperm count results to today’s sperm counts? If it were the case that earlier sperm counting methods systematically overestimated sperm counts while those of today did not, this would give us reason to question whether sperm count is declining at all.
In their 2017 study, Levine et al. went to great lengths to increase general reliability. For example, they limited their meta-analysis to studies that used similar sperm count methods, namely the hemocytometer, which is identified as the gold standard in the WHO’s semen processing manual. Prominent andrologists largely agreed that the Levine et al. study was a significant improvement over earlier sperm decline studies; Prof. Richard Sharpe describes it as a “well-designed and statistically rigorous study that has taken on board criticisms/limitations of earlier analyses.” Given this, are there reasons to be concerned about reliability?
In his article, “Are sperm counts declining? Or did we just change our spectacles?”, Prof. Allan Pacey, Chairman of the British Fertility Society, offers some such reasons. Although Pacey’s paper was published prior to Levine et al.’s study, many of his points still apply. First, one might think that the existence of the WHO manual on semen processing would ensure reliability over time: if all studies included in Levine et al.’s meta-analysis adhered to the WHO manual, then they would all count sperm in the same way, guaranteeing that results from the 1970s can legitimately be compared to those of today. But the first edition of the WHO manual for semen processing wasn’t even published until 1980, 7 years into the period analyzed by Levine et al. Further, as reported by Dr. Mathew Tomlinson, even the decision to include only hemocytometer-based studies doesn’t solve all reliability issues: “differences may occur in [haemocytometer] manufacturing standards, age and quality of chambers as well as in how chambers are used from application of the cover glass, to dilution and mathematics. Moreover, the haemocytometer is just as susceptible to errors associated with a lack of homogeneity [in the semen sample] (viscosity, aggregation, agglutination) as any other method, a fact which cannot not be dismissed lightly.”
Second, training and quality assurance for conducting sperm counts has undergone changes over the years. As Pacey explains, training courses are a relatively recent development: “The courses established by the European Society for Human Reproduction and Embryology first took place in 1994… Before this time, it is unclear how training was provided in many parts of the world, if at all.” Quality assurance and control schemes in the UK, for example, weren’t introduced until 1994, and the first three editions (1980, 1987 and 1992) of the WHO manual made almost no reference to the need for quality control. Pacey summarizes this point aptly as follows: “In 2012, a randomly selected andrology laboratory anywhere in the world is more likely to have appropriately trained staff, be following WHO methods, and have effective QA and QC procedures in place compared to an equivalent (or even the same) laboratory only 10 years ago.” In other words, given changes in standardized procedures, training courses and quality control schemes, there are good reasons to at least consider the possibility that sperm counting results in a different number today than it would have in the past.
Pacey further suggests there are reasons to believe the decreased access to training and quality control in the past might not only have led past scientists to produce different numbers, but systematically higher numbers. As Pacey notes, “semen, as a fluid, is not as straightforward to deal with as blood or urine”: its viscosity and non-homogeneity can make it difficult to manipulate in the laboratory. One effect is that those with less training may take longer to perform the counting procedure, meaning that the sample will be left in the counting chamber for longer. This could allow the sample to dry out, thereby concentrating it and leading to the calculation of a higher sperm count. Another way in which less training may lead to overestimation concerns pipetting techniques. To get an accurate count, it’s important to wipe the outside of the pipette to remove any extra sample. Untrained practitioners could easily miss this small step, leading them to deposit extra samples in the counting chamber, thereby resulting in a higher sperm count. If this were the case, it would challenge at the very least the magnitude, if not the very existence, of sperm count decline.
The possibility that sperm counts were overestimated in the past is, at this stage, merely a hypothesis. As Pacey notes in his article, more research is needed on the effects of training and quality control on sperm count results, and on other different sources of error (and, in particular, whether these might lead to systematic over- or underestimation). But, as was the case with validity, the key problem with Levine et al.’s study is not that these issues with reliability might exist. Any retrospective meta-analysis will inevitably face questions about temporal reliability. Rather, significant problems arise when scientists and the media fail to consider these potential sources of unreliability in the interpretation of study results and the conclusions drawn. And this is especially true when making high-impact and panic-inducing claims about the end of our species.
What else are we counting when we count sperm?
Thus far, I’ve considered the ways in which, contrary to assumptions made in the sperm decline literature, we may not actually be measuring fertility when we count sperm. In this final section, we focus not on what sperm count doesn’t measure, but on what it does.
It is likely no surprise to most of us that sperm count is associated with masculinity in the public imagination. As an article in the San Francisco Chronicle put it, "Reduce our sperm count? Why in no time we'll be a nation of pallid, Jell-O-spined wimps, watching Wheel of Fortune rather than Monday Night Football and asking strangers for directions" (cited here). These associations have also made their way deep into scientific research. In her groundbreaking paper “The Egg and the Sperm” (1991), anthropologist Emily Martin describes how scientists’ use of gendered metaphors in biology textbooks portrays sperm cells in the stereotypically masculine role of a strong, aggressive, competitive pursuer that embarks on a “perilous journey” to penetrate the passive, feminine egg cell, thereby perpetuating “some of the hoariest old stereotypes about weak damsels in distress and their strong male rescuers.”
Sociologist Cynthia Daniels builds on Martin’s work by analyzing the role that these kinds of gendered metaphors play in sperm decline research and media representations. Like Martin, Daniels finds evidence that sperm cells are used to represent masculinity, and she shows that the sperm count decline crisis is frequently cast as a masculinity crisis: for a very direct example, consider the fact that the high-impact science journal Nature published a paper on the causes of sperm decline entitled "Masculinity at Risk.” Daniels also makes a further argument: not only is sperm decline cast as a masculinity crisis, it’s a masculinity crisis resulting in part from perceived feminization (see the accompanying blog post for further discussion of this point). She observes that one of the central hypotheses for the cause of sperm decline -- namely, the presence of endocrine-disrupting chemicals -- is described as a feminizing force that imperils masculinity: “The most disturbing effect of exposure to estrogens was often said to be . . . [that] as men became ‘more like women,’ the dissolution of the boundaries between them produced disease and ‘weakness.’ It was this presumed feminization of men that had produced testicular cancer, lower sperm counts, and increased rates of ‘abnormal’ development in men.” Sperm decline research embeds not only stereotypical notions of masculinity, but also the idea of masculinity as in binary opposition to (and therefore threatened by) femininity.
The work of scholars like Martin and Daniels can help us answer questions about what sperm count really counts. That is, it helps us understand the meaning and significance of what happens when we obtain a semen sample, follow a procedure, use a measuring instrument and produce a number. Sperm count has limited and complex validity and reliability as a measure of male fertility. What sperm count does predictably measure, at least in certain socio-cultural contexts and time periods, is social anxieties about threatened white masculinity, particularly as a result of perceived increasing feminization and the blurring of strict boundaries between men and women.
How to Count Better
It’s been 20 years since Martin’s groundbreaking paper, and 13 since Daniels published her sperm decline analysis. Yet harmful gendered metaphors continue to permeate sperm decline research. Just this year, Shanna Swan, one of the authors of the high-impact 2017 Levine et al. study, published Count Down, a book on sperm decline. Some of Swan’s language could be lifted directly from the examples in Martin’s paper: "Even the healthiest, best-shaped sperm don't pause to ask for directions . . . .The reality is, sperm tend to live fast and die young . . . . Despite being microscopic in size, sperm are mighty and resilient swimmers." Swan’s book focuses on the hypothesis that endocrine disrupting chemicals (EDCs) are to blame for sperm decline, describing this as a case of “environmental emasculation.” After warning readers of the impending reproductive apocalypse resulting from the spread of EDCs—“some of what we’ve been thinking of as fiction from stories such as The Handmaid’s Tale and Children of Men is rapidly becoming a reality”—she goes on to echo Halpern’s misleading line from GQ, comparing current men’s fertility to that of their grandfathers: “It’s startling and chilling when you realize that the number of children you may be capable of having is slightly less than half of that your grandparents could conceive.”
We can do better. It is true that sperm count as a measure of fertility is complicated, limited and has historically been saturated with harmful and misleading metaphors and rhetoric. Yet this doesn’t necessarily mean we should stop counting sperm. Sperm count can provide important information about an individual’s fertility, and possibly, about individuals’ and populations’ exposure to harmful toxins. Understanding male fertility and reproductive health are crucial scientific goals, with incredibly significant implications for individual mens’ lives. To do justice to these goals, we need better measures.
First, scientists and science journalists must be clear about the limitations and complications of measures like sperm count. Despite its seeming simplicity, sperm count (especially when compared diachronically) is complex, depends on a number of non-trivial assumptions, and can be prone to systematic errors. It is irresponsible for research to proclaim that “the current state of reproductive affairs can’t continue much longer without threatening human survival” without taking these complexities and limitations into account.
Second, scientists have a responsibility to consider the power of quantitative measures like sperm count, and the social work that these measures perform. Science studies theorist Rebecca Jordan-Young describes measures as assumption containers, i.e. “as vehicles through which assumptions travel in studies without being tested.” As Martin and Daniels have shown, and as we demonstrate in our paper, some of the assumptions pervading sperm count metrics are based on harmful stereotypes about rigid binary gender roles, on social anxieties about how these roles are changing, and on intersecting worries about differential reproduction rates across races, ethnicities, and nations. To count better, we must expose and work to counteract the role that these social anxieties play in the structure, practice and interpretation of a process that begins with the collection of a semen sample and ends in a number.
Suggested Citation
Boulicault, M. “What does sperm count count?” GenderSci Blog. 2021 May 4, genderscilab.org/blog/what-does-sperm-count-count
Statement of Intellectual Labor
Marion Boulicault conceptualized and wrote the blog post. Other GenderSci Lab members, especially Sarah S. Richardson, Jonathan Galka, Annika Gompers, Alex Borsa and Kelsey Ichikawa, assisted with editing.