I recalled recently the very first time I searched something on Google. I was an engineering student in a remote part of India with temperamental internet connectivity. Disgruntled with the ultra-crowded Yahoo! search page, I stumbled upon the simple interface of Google. So began my love affair with navigating the world’s knowledge — a decade of “e-curiosity” in which I broke up with books (barring JK Rowling’s, of course), blew off lectures, and got introduced to the life sciences.
Fast forward four years, as a graduate student in Cambridge MA, I was greeted by a resolutely more steadfast internet connection. Slowly and blissfully imperceptibly, that innocuous looking site called Google had taken over the mantle of my mind. Whether it was travel, climate, recipes, browser, calendar, or email, my early dalliance with Google had evolved into a trusted, fruitful and solemn marriage. So much was Google intertwined with my mind, that fellow classmates — who swore by PubMed — were outmatched at lectures, problem sets and ideation marathons. To those pure-bred biologists, who rightfully bemoaned my utter lack of any discernible life science knowledge, it was a mystery how I managed to retrieve the most relevant articles. Google single-handedly resuscitated my attempt at transforming from a wandering engineer into a wannabe biologist.
However, as beneficial Google was and continues to be for retrieving information, a chilling phenomenon has engulfed the world wide web that the site crawls. Unstructured text is accessible to all those fortunate to have internet, but the veracity of any one selected underlying source is highly subjective. This is especially true of life sciences where there is both exponential knowledge growth, with vast troves of irreproducible research. Compounded by the mercurial nature of modern attention spans, we are all susceptible to the “instant gratification curse” — building misplaced core conviction based on the first few document hits. Unknowingly, hypotheses are formed, reshaped or tossed out based even on the scarce phrases highlighted in bold among the search results. It is only the rare skeptical brain that avoids such search-and-navigation behavioral pitfalls. This is a colossal challenge that the coming generations of students, scientists and analysts face.
It would seem that researchers trained as computational biologists, bioinformaticians or clinical biostatisticians are fortunate, for their structured databases generally sidestep the foggy realms of unstructured information. However, each of their domains are often siloed and involve highly specialized vocabulary. These factors encumber communications with researchers who may have even slightly different training. Compounded by the utter lack of intuitive user interfaces that pollinate cross-talk across the biomedical community, there is no connective fabric for these increasingly distinct research efforts. Without technologies that help overcome the fundamental behavioral pitfalls of searching unstructured information, Data Scientists are also thus handicapped in their research quests.
These challenges are amplified within large institutions where inference from exponentially growing knowledge bases is an existential requirement. Some of these organizations spend billions of dollars each year across their operations. Garnering competitive intelligence from public data and proprietary real world evidence factor heavily into these investment decisions. The search-and-navigation hurdles encountered in the public domain apply equally to proprietary institutional knowledge. In fact, there is unprecedented demand for technologies that help synthesize unstructured information. Whether it is amplifying the odds of pharmaceutical teams making therapeutic discoveries, or ensuring that clinical care pathways are optimized for each patient’s unique biological constitution, we are undeniably entering the “Golden Era of Data Sciences”.
Better fact-checking of knowledge by individuals and institutions requires sophisticated triangulation software that serves as the connective fabric across all available data toward establishing concordance (or lack thereof). Software that support such facile triangulation may present a “dogma-agnostic hypothesis generation” mechanism for scientific inquisition. The path toward such a solution is rife with technological obstacles. For example, beyond the recent renaissance in deep-learning neural networks that is contributing meaningfully to knowledge synthesis, absurdly simple user interfaces are essential to mask data complexity and support visual triangulation (lest they should end up as a Yahoo! instead of the next Google).
What if such a living kernel could be leveraged by universities and the industry broadly to capture the diverse workflows scientists develop? Imagine the number of lives that can be made significantly better, and the breadth of discoveries that would give birth to novel medicines. Technology is already creating such a future, and greatly influencing the present, where scientists and students can advance workflows that leverage collective intelligence. In the near future, individual workflows of navigating hypotheses across de-identified data will be methodically synthesized by a universal software platform. Upon ensuring security, bioethics, compliance and patent protection, such a platform can revolutionize how the world generates, validates and understands scientific research.
The natural conclusion is tantalizing to imagine — a curious machine that learns from the synthesis of tens of millions of scientific workflows. Such “research playlists” will power machines that support learners of all ages as their ultimate brain extension — one that does not blindly accept any document’s veracity. These platforms will perform increasingly robust, secure and adaptable knowledge synthesis. They will fact-check the verdict of every data set via rigorous triangulation, aggregate these insights into profoundly instinctive user interfaces, and support complex investment decisions.
And when the scientific counterpart retires to bed at night, perhaps the curious machine will conjure up ideas and questions to pique the human brain by dawn…