Data mining Erowid to study drugs
Erowid has thousands of reviews of 100s of different drugs. So it should be possible to easily scrape these reports, and then use various text analysis methods on them to do things like:
Clustering of drugs by similarity of experiences
These should replicate the pharmacological descriptions, e.g deliriants should cluster together
Ratings of overall pleasantness of drugs
This can be compared to ratings of the experiences done by humans
It seems that there has been some progress on this recently. In a blogpost, Roberto Rocha does a crude ordering of drugs by counting the positive, neutral and negative descriptions provided by Erowid itself. Unhelpfully, he posted the results in tableau tables, and it seems to be tricky to scrape these. The data isn't in the HTML source, it seems to be rendered client side from some javascripts. Probably faster to just re-do the Erowid scraping.
Sanz, C., Zamberlan, F., Erowid, E., & Tagliazucchi, E. (2018). The experience elicited by hallucinogens presents the highest similarity to dreaming within a large database of psychoactive substance reports. Frontiers in neuroscience, 12, 7.
Ever since the modern rediscovery of psychedelic substances by Western society, several authors have independently proposed that their effects bear a high resemblance to the dreams and dreamlike experiences occurring naturally during the sleep-wake cycle. Recent studies in humans have provided neurophysiological evidence supporting this hypothesis. However, a rigorous comparative analysis of the phenomenology (“what it feels like” to experience these states) is currently lacking. We investigated the semantic similarity between a large number of subjective reports of psychoactive substances and reports of high/low lucidity dreams, and found that the highest-ranking substance in terms of the similarity to high lucidity dreams was the serotonergic psychedelic lysergic acid diethylamide (LSD), whereas the highest-ranking in terms of the similarity to dreams of low lucidity were plants of the Datura genus, rich in deliriant tropane alkaloids. Conversely, sedatives, stimulants, antipsychotics, and antidepressants comprised most of the lowest-ranking substances. An analysis of the most frequent words in the subjective reports of dreams and hallucinogens revealed that terms associated with perception (“see,” “visual,” “face,” “reality,” “color”), emotion (“fear”), setting (“outside,” “inside,” “street,” “front,” “behind”) and relatives (“mom,” “dad,” “brother,” “parent,” “family”) were the most prevalent across both experiences. In summary, we applied novel quantitative analyses to a large volume of empirical data to confirm the hypothesis that, among all psychoactive substances, hallucinogen drugs elicit experiences with the highest semantic similarity to those of dreams. Our results and the associated methodological developments open the way to study the comparative phenomenology of different altered states of consciousness and its relationship with non-invasive measurements of brain physiology.
I also chuckled at the author names: "Earth Erowid" and "Fire Erowid" are apparently good enough for Frontiers.
Martial, C., Cassol, H., Charland-Verville, V., Pallavicini, C., Sanz, C., Zamberlan, F., ... & Tagliazucchi, E. (2019). Neurochemical models of near-death experiences: A large-scale study based on the semantic similarity of written reports. Consciousness and cognition, 69, 52-69.
The real or perceived proximity to death often results in a non-ordinary state of consciousness characterized by phenomenological features such as the perception of leaving the body boundaries, feelings of peace, bliss and timelessness, life review, the sensation of traveling through a tunnel and an irreversible threshold. Near-death experiences (NDEs) are comparable among individuals of different cultures, suggesting an underlying neurobiological mechanism. Anecdotal accounts of the similarity between NDEs and certain drug-induced altered states of consciousness prompted us to perform a large-scale comparative analysis of these experiences. After assessing the semantic similarity between 15,000 reports linked to the use of 165 psychoactive substances and 625 NDE narratives, we determined that the N-methyl-D-aspartate (NMDA) receptor antagonist ketamine consistently resulted in reports most similar to those associated with NDEs. Ketamine was followed by Salvia divinorum (a plant containing a potent and selective receptor agonist) and a series of serotonergic psychedelics, including the endogenous serotonin 2A receptor agonist N,N-Dimethyltryptamine (DMT). This similarity was driven by semantic concepts related to consciousness of the self and the environment, but also by those associated with the therapeutic, ceremonial and religious aspects of drug use. Our analysis sheds light on the long-standing link between certain drugs and the experience of “dying“, suggests that ketamine could be used as a safe and reversible experimental model for NDE phenomenology, and supports the speculation that endogenous NMDA antagonists with neuroprotective properties may be released in the proximity of death.
So they say there are ~15k user reports on Erowid. That should be plenty for detailed analysis. These papers are both fairly rudimentary and much more could be done. They are both from 2018-2019, so presumably others are trying to do something. Other things of note:
https://mvuorre.github.io/tmasc/articles/erowid/erowid.html Example of Erowid dataset in R (in a data package no less). Seems someone scraped it already and made it available. Thanks a lot to Matti Vuorre.
https://chemicalyouth.org/visualising-erowid/ Another simple analysis of the experiences, looking mainly at the co-use network.
Edited 2019-11-01 Fire Erowid actually saw my tweets, and reached out to say:
Great. I don't know if you think it's a good idea, but we'd love to have you add a little note (if possible) that we're happy to help researchers and analysts understand the data if they contact us. The confounds can be a little more complicated than people understand.