Bioinformatics: Dr. Pierre Baldi on AI Applications in Biotechnology

Jul 26, 2024

Bioinformatics scientist and expert Dr. Pierre Baldi who’s a trailblazer in the field that fuses AI and biology shared cutting-edge research insights on The AI Purity Podcast. Dr. Pierre Baldi currently sits as the director of the Institute for Genomics and Bioinformatics at the University of California, Irvine. Over the years, Dr. Baldi’s pioneering work in deep learning and AI has opened new frontiers in learning and understanding complex biological systems.

In this episode of The AI Purity Podcast, Dr. Baldi talks about his career journey, gives us insight into bioinformatics applications and AI integration, and discusses the unique challenges in applying AI or machine learning to understanding genomic data.

Learn about the ethical implications of using AI technology in biology, what is bioinformatics used for, recent research applications from Dr. Pierre Baldi, and more about AI Purity’s elite AI text detection capabilities.

A History Into Dr. Pierre Baldi’s Career Journey

Dr. Pierre Baldi transitioned into a career in computer science and machine learning because of his keen interest in understanding intelligence and the human brain. This interest was coupled with an affinity for mathematics which he took as his undergraduate degree along with psychology. “I didn’t know exactly in which direction to go or how to approach the questions that interested me”, Dr. Baldi shares on why he took that dual degree.

After completing his Diplôme d’études approfondies (DEA) in Paris, he moved to the United States for his PhD at Caltech. He encountered an influential professor during his time there who was pioneering work in neural networks and machine learning. “I immediately understood that was the area of research I wanted to focus on,” Dr. Baldi says and he has since worked in neural networks, deep learning, and AI.

To the untrained, it might seem like there’s a dichotomy between computer science and biology but Dr. Pierre says, “Biological systems are all about computing. You can view a single cell is in many ways a computer.” He continues that, “a single cell is already a very sophisticated computer that computes at every second of time.” Though computing and biology seem to be a very hybrid domain Dr. Pierre points out the analogy wherein deep learning and AI draw inspiration from the human brain. After all, artificial intelligence does emulate human intelligence as best it can. With that technology powered by computing systems, Dr. Pierre says the better AI is built, the better it helps scientists understand biological systems and biological data.

What Is Bioinformatics?

Bioinformatics compared to other fields of research and study in science is “relatively new” and is still an “evolving discipline” according to the Genomics Education Programme. At its core, it combines computer science and biology to help scientists analyze or interpret biological data. By leveraging computational tools, scientists can analyze and interpret a vast amount of biological data that would take a long time to do manually in a more efficient way. An example of this technology being applied to healthcare is drug discovery, a process that would usually take a decade or longer and has become more efficient thanks to biotechnology.

Bioinformatics became an indispensable tool in modern biological research during the 1990s when the Human Genome Project began. Scientists were able to uncover mysteries that were hidden in DNA sequences. They were able to identify genes and even compare genomes across different species. Dr. Pierre Baldi explains that when the Human Genome Project began, it was meant to sequence the entire genome of a human being. This experiment was done over a period of 10 years. After the complete sequence of a human’s DNA, the data expressed is very long consisting of around 3 billion of only 4 alphabetical letters, ACGT. “It looks very cryptic,” Dr. Pierre says, “it’s not something that you can read as a human.” This is where machine learning algorithms, deep learning, and statistics come in to make sense of these sequences and to answer questions like how these genes have evolved over time across different species and so much more.

This field of study has extended far beyond analyzing genomics. If you’re wondering, “What is bioinformatics used for?” besides analyzing genomic and biological data, it can also be used to analyze protein structures, study gene expressions, and even aid in drug discovery to find personalized medicine.

Deep Learning, Machine Learning, and NLP in Biology

Dr. Pierre shares an example of how deep learning techniques are integrated into biological research. In his line of research, Dr. Pierre has been studying circadian rhythms. A person’s circadian rhythm is fundamentally linked to their biology. Dr. Pierre says, “Every cell in your body is oscillating on a 24-hour basis.” Based on this biological clock, certain chemicals and genes are activated. Earlier forms of organisms like cyanobacteria which similarly to plants, use sunlight for photosynthesis also operate based on a circadian schedule. This trait is inherent in all living cells.

Along with his team of bioinformatics specialists, Dr. Pierre has developed methods and databases to analyze these biological oscillations. They create databases that are analyzed using deep learning and determine if some specific genes or proteins exhibit circadian rhythms or behaviors. According to Dr. Pierre, the circadian rhythm is highly adaptable. “If you change your time, if you change your diet, how much you exercise…Anything you change in your behavior, or in your environment, or chemically will change the oscillations somewhere in your body.”

AI methods can now also be utilized to address complex biological problems. The recent advancements in machine learning and natural language processing like GPT-4 can be applied to understanding biological data. These AI models are trained on vast amounts of text data and they can now be adapted to analyze biological sequences like genomes, proteins, and RNA, data that can be represented as text. By training large language models, scientists can leverage AI techniques and technology to design new proteins that have specific functions.

Dr. Pierre says “AI is reshaping all technologies and all areas of science. So, it’s being applied everywhere, and biotechnology is no different.”

AI detectors like AI Purity are like large language models that also use natural language processing to detect similarities between actual human-written text and AI-generated text. AI Purity has trained its model on a vast amount of data to provide the best AI text detection.

The Ethical Implications of Applying AI In Biology

When asked about the ethical implications of using AI in biological research and healthcare, Dr. Pierre says that there are broader implications to AI that extend beyond healthcare and biology.

He drew parallels between artificial intelligence and natural intelligence. “If you think about it, natural intelligence, our intelligence is actually very dangerous”, Dr. Pierre says, emphasizing the importance of placing a lot of measures to counterbalance and prevent both human and artificial intelligence from doing too much damage. He says all it takes is to look at human history to see how much damage humans can do and with artificial intelligence being modeled after natural intelligence, the probability of it doing similar damage should be taken into account. As humans, we have ethical senses, learn from each other, and follow societal laws and commandments to maintain a safe society. For AI models, analogous measures are being developed like constitutional AI, consistent feedback from humans, and the creation of legal frameworks to ensure AI is compliant with safety and ethical standards.

In building AI systems, it is necessary to conduct research especially when AI is being integrated into sensitive areas like healthcare where data privacy issues can arise. AI Purity prides itself on a platform that prioritizes responsible and ethical AI use above all. We have made our tool accessible and specially tailored for students and educators. As an AI tech company, we understand better than most the importance of using this technology responsibly which we will continue to pioneer.

Listen To The AI Purity Podcast

Learn more about bioinformatics and the application of AI systems in the realms of science from the expert himself Dr. Pierre Baldi on the latest episode of The AI Purity Podcast. He encourages everyone to understand AI and how the world is adapting to this new technology. It’s a powerful tool that has both good and bad potentials and it is up to the people to be educated so they can make proper and informed decisions.

For all your AI text detection needs, use AI Purity to get fast AI text detection with more premium features. For other ethical AI discussions, tune in to the next episodes of The AI Purity podcast or revisit our past episode, in Informatics, AI in Healthcare, & Digital Resilience in the AI Age with David Wild.

Listen Now

Pierre Baldi [00:00:00] Biological systems are all about computing. A single cell is already a very sophisticated computer. All the AI that you see today draws its inspiration from the brain. We can use them to better understand biological systems and analyze biological data.

Patricia [00:00:34] Welcome back to The AI Purity Podcast, the show where we explore the intersection of artificial intelligence and the pursuit of truth. I’m your host, Patricia, and today our guest is the director of the Institute for Genomics and Bioinformatics from the University of California Irvine. He’s also one of their distinguished professors and the associate director of their Center for Machine Learning and Intelligent Systems. His expertise spans the realms of computer science and biological sciences, and he has made significant contributions to the theory of deep learning and its application to biological problems. His research focuses on understanding the intelligence in both biological systems and machines. With numerous awards and accolades to his name, our guest is a recognized authority in the integration of AI in biological sciences. Join us as we discuss AI, deep learning, and bioinformatics, and welcome to the show, Dr. Pierre Baldi! Hi, Dr. Pierre!

Pierre Baldi [00:01:24] Good to be here!

Patricia [00:01:25] Thank you for joining us today! Please walk us through how you got started in your journey into the fields of computer science and machine learning, and how you transitioned from mathematics and psychology to focusing on deep learning and artificial intelligence.

Pierre Baldi [00:01:38] In a nutshell, I was always interested in intelligence, and how the brain works, and how we can try to better understand intelligence and build intelligent machines. I always liked math, so that’s why when I was in, you know, an undergraduate, I studied both math and psychology. I didn’t know exactly, you know, in which direction to go or how to approach the questions that interested me. So, that’s the reason for the sort of dual degree that I followed. And then when I moved to the United States for my PhD, I went to Caltech somewhat by chance. And at the time, there was a professor there, John Hopfield, who was developing certain ideas around neural networks and machine learning, if you want. And when I saw that, I immediately understood that was the area of research I wanted to focus on. So, since my PhD, I have been working in neural networks, deep learning, AI.

Patricia [00:02:44] [00:02:44]And how do you approach the interdisciplinary research at the interface of computer science and biology? What motivated you to pursue research in this hybrid domain? [8.3s]

Pierre Baldi [00:02:53] [00:02:53]There are many reasons for doing that. One reason is that biology, biological systems, are all about computing. You can view a single cell is in many ways a computer, which measures the concentrations of a variety of chemicals and adjusts its own production of chemicals, you know, gene expression and so on in a continuous way, in an adaptive way. So, a single cell is already a very sophisticated computer that computes at every second of time. And of course, then cells can form assemblies, form tissues, form organs. So, you can view your liver, as a fairly complex computer. And of course, the brain is the most complex, computers in the universe that we know of, with billions of neurons and connections and so forth. So, biology is very much about computing in some ways. So, there is an analogy there, and we can get a lot of inspiration from biological systems. All the deep learning, all the AI that you see today draws its inspiration from the brain, and vice versa, as we build more powerful AI, better machine learning methods, better statistical methods, we can use them to try to better understand biological systems and analyze biological data, whether it’s, you know, genome sequences or gene expression, recordings from the brain and so on. So, you can park in both directions, and I’ve done a little bit of that throughout my career. Sometimes, you’re stuck on one side, you go to the other side, and you go back and forth. [105.6s]

Patricia [00:04:40] And given your extensive experience, could you share some insights into the specific areas of research within deep learning and artificial intelligence that you find most fascinating or impactful?

Pierre Baldi [00:04:50] Well, there are many, especially these days. I am very interested in the theory behind deep learning and behind artificial intelligence. But, you know, not many people are interested in the theory side. Most people are working on the applications. So, but a few of us are also thinking and working on theoretical questions. So, I like those. With my students, we work on a lot of applications, especially in the natural sciences. So, in physics, and chemistry, and biology, there are many problems to which you can apply deep learning methods. I can tell you why, you know, a single family of techniques can be applied so broadly. There are reasons for that, but we do that. Of course, today, a lot of the excitement is around the large language models, and so called generative AI, and questions of whether, you know, we have already achieved or we are on the verge of achieving AGI, artificial general intelligence. Those are the questions that most people are excited and talk about.

Patricia [00:06:01] [00:06:01]Dr. Pierre, one of your most recent publications is the book Deep Learning in Science. What was the message or the insights that you hope to convey your readers through this? [8.2s]

Pierre Baldi [00:06:10] [00:06:10]Well, I wrote the book. So, in science, has two meanings. One is the scientific study of deep learning of AI. Deep learning and AI is essentially the same thing these days. There is not much difference. But in science, I meant two things. One is, of course, for science the application of deep learning to the natural sciences, which is a lot of the applications we do. So, application of deep learning of AI and deep learning in chemistry, in physics, in biology, in medicine, and so on. So that’s one theme throughout the book, but it’s also in science in the sense of scientific study of AI. So, going back to the theory, what are the results that we can prove mathematically about about deep learning systems? [46.1s]

Patricia [00:06:57] [00:06:57]Dr. Pierre, as the director of the Institute for Genomics and Bioinformatics, could you provide an explanation of what bioinformatics is and its significance in modern biological research? [8.9s]

Pierre Baldi [00:07:07] [00:07:07]Well, there are a number of words like computational biology, bioinformatics, etc. They are all somewhat fuzzy, but they refer to the area of research at the intersection of computer science and biology essentially, at least that’s the the way I interpret them. Some people give these words more specific or narrow meanings, such as the application of, you know, certain algorithms to particular areas of biology. These words became somewhat popular in the 90s when the Human Genome Project was started. So, that was the project to sequence the entire genome of a human being, and it was achieved over a period of roughly ten years in the 90s. And so, when you get the complete sequence of DNA of a human being, it’s a very long sequence. It has the 3 billion letters, roughly, letters being different building blocks of DNA. [60.4s] There is four letters, four building blocks, ACGT. [00:08:11]And so, you get these very long sequence over an alphabet that has only four letters, and it looks very cryptic. It’s not something that you can read as a human. It looks like complete gibberish. So, you need all kinds of algorithms, including machine learning, deep learning, statistics, and so on and so forth to make sense of these sequences, in particular to find, let’s say, genes within those sequences, finding promoter region and so forth. There are all kinds of important genetic landmarks along the genome, and you want to be able, of course, to compare genomes, between humans, but also humans and other species, try to understand how they evolved and so on and so forth. So, there are many questions. This is an area of science that is, you know, that has developed a lot over the past few decades. So, bionformatics initially was focused on applying computer science method to sequence data, genomic data, protein data, RNA data and so forth. But now, it’s even broader than that, because we have other kinds of biological data such as gene expression. Now, we can do single cell gene expression and many other things that I don’t have the time to mention, but basically, that is my understanding or my definition of bioinformatics, application of informatics methods to biology. [83.4s]

Patricia [00:09:36] [00:09:36]And as an expert in both bioinformatics and AI, can you discuss some other ways in which AI techniques are being integrated into bioinformatics to enhance data analysis, interpretation, and predictive modeling? [11.6s]

Pierre Baldi [00:09:48] [00:09:48]Well, the some of the most recent things are, for example, you take large language models. So, large language models are the sort of techniques that are behind things like GPT-4, ChatGPT, and all those advanced AI chat systems or bots, right? So, the same – these techniques are trained primarily on text data. And of course, as I told you, the genome is essentially a form of text data, and proteins are text that can be viewed as text data, and RNA can be viewed also as text data. So, one idea is to train these models using biological text, using biological sequence, and then use them, for instance, to try to design new proteins with particular functionalities. That is just an example of the recent kind of application of AI catching AGI methods to biological data. [56.9s]

Patricia [00:10:46] [00:10:46]And what are some of the specific examples of how AI has revolutionized the field of bioinformatics, and what impact do you think these advancements will have on biological research in healthcare outcomes? [10.7s]

Pierre Baldi [00:10:58] [00:10:58]There are many impacts, and another famous result is AlphaFold, right? Produced by DeepMind. This is an AI program or a, you know, deep learning system that is capable of predicting protein structures with reasonably good accuracy, at least for many proteins. And knowing the structure of a protein is very important to try to understand its function, because it basically tells you what are the molecules it can bind to. And this is very important, for instance, for drug discovery or drug design. Very often you’re trying to find a small molecule that can bind to a pocket in a protein. And by doing so, maybe block the protein or maybe enhance it, depending on the cases, right? So, that’s another example where artificial intelligence, deep learning, may help biology, medicine, drug design, and so on. And there are many examples we can find in the literature. So, it’s a very active area of research. [68.2s]

Patricia [00:12:07] [00:12:07]Can you discuss some of the key projects or collaborations you’ve been involved in that exemplify the integration of deep learning techniques into biological research, particularly in fields such as genomics and bioinformatics? [10.8s]

Pirre Baldi [00:12:19] [00:12:19]One example of the of the line of research that my group has been pushing is the study of circadian rhythms. So, for most people, circadian rhythms look like a sort of oddity that you notice only when you travel. You travel and suddenly you have, jet lag, right? And you notice your jet lag and you think, oh, this is funny. It’s a little curiosity if you want. But when you look at it very carefully, in fact, you you’ll see that circadian rhythms are absolutely fundamental to all of biology. So, every cell in your body is oscillating on a 24 hour basis. The chemicals, the genes that it activates and so on are oscillating, or some of them, are oscillating on a 24 hour basis. So, this rhythm is extremely important for all living systems. And it’s not too surprising, because if you think about evolution, the only thing that is sort of predictable in a stable way about the the world around you is that tomorrow, the sun will rise again. That’s about the only thing that you are almost 100% sure. Anything else, you know, the weather tomorrow or how people around you will behave tomorrow are very uncertain, but the fact that there will be a new day tomorrow is very strong – is the only strong prediction that you can make really about the environment. So, during evolution, organisms had to pay attention to this extreme regularity of the world that comes from the fact that the Earth is rotating on its axis, right? And has been doing that for a trillion times since the origin of life. And the first organisms from which we are derived, for example, cyanobacteria, these organisms were using photosynthesis to produce energy, so they were converting light into energy, like plants are doing today. Now, if you’re using light, you know, the sunlight, to produce energy, you are entirely circadian by definition, because during the day you are receiving energy, at night you have to sort of shut down your batteries and then again the following day. And we are derived from those organisms. So, it’s not surprising that essentially every cell throughout, you know, the entire space of living organisms is oscillating on a 24 hour basis. And so, we have developed a method to analyze those oscillations. We have developed databases of data around those oscillations and so on and so forth, and use deep learning to assess whether a given gene, a given protein in the body is oscillating or not in a, you know, circadian manner. And these oscillations are very interesting, because they are also very plastic. If you change your time, it’s not just traveling. If you change your diet, you should change your, how much you exercise. Anything that you change in your behavior, or in your environment, or chemically will change the oscillations somewhere in your body. So, they are very plastic, very adaptive. And you know this intuitively. I mean, if one evening you want to watch a movie and you go to bed two hours later, nothing happens basically. Maybe you feel a little bit more tired in the morning, but there is no major change. It’s only when you fly, you know, you take a long flight, 12 hours long flight that you really notice that difference. But little adjustments in your lifestyle every day, you almost don’t notice it. You don’t notice them. But the body, the oscillations within your cells are adjusting constantly to to all these little changes. So, it’s a very interesting system. [230.5s]

Patricia [00:16:11] [00:16:11]And in your opinion, what are some of the most pressing challenges or unanswered questions in the field of AI and biology? And how are you and your research group working to address these? [9.5s]

Pierre Baldi [00:16:21] [00:16:21]The funny thing is that the AI we built today, for instance, these large language models, we don’t really understand how they work at all. We know how to build them. It’s relatively easy conceptually to to build them. It’s not a very difficult algorithm, but what you obtain after you train the system is a very complex system. And we don’t understand these systems very well. So, basically, our theories, our understanding of AI is very shallow at the moment. And I think that’s a very important challenge, especially in light of issues of AI safety and so on. [40.5s]

Patricia [00:17:02] [00:17:02]What are some of the potential limitations or challenges associated with using deep learning in natural sciences? How do you address these challenges in your own research or collaborations? [8.6s]

Pierre Baldi [00:17:12] [00:17:12]You know, there are many challenges and many of them are fairly standard. First of all, if you apply to the natural sciences, you have to be willing to collaborate with scientists from the target disciplines. So, I work a lot with physicists, with chemists, with biologists. But you have to be willing to to engage with them and to learn some of their language. And over time, they also learn, you know, the language of machine learning and deep learning. My physicist collaborators are very interested in machine learning and deep learning, and have learned the language, and very often, their students are beginning to use, you know, the, the deep learning methods. So, there is a challenge in the language. There is a challenge in the data. In some area, there is there is a lot of data, but there is still areas of science where there is not a lot of data available or where there are impediments or obstacles to data sharing. So, obviously in the medical domain, this is obvious. But also even in physics, there are things like neutron stars, which are very exotic star objects for which we don’t have much data even today. So, usually deep learning requires, you know, training data. And depending on what problems you are considering, you may have a lot of training data or sometimes not so much. And then you have to think about how to adjust things to deal with data scarcity. And then of course, there is issues of, you know, deep learning architectures, which could be problem specific. So, you may have to design a deep learning system, a neural network, which has certain properties, certain symmetries, certain invariances, and you know, all that data is, those are active areas of research. [110.9s]

Patricia [00:19:03] [00:19:03]On the flip side, since we’re talking about challenges, could you provide some of the examples of successful applications of machine learning or deep learning in genomics? [7.1s]

Pierre Baldi [00:19:11] [00:19:11]I don’t know about, you know, genomics per say. But if you think about, you know, biology and sequence data – I mentioned AlphaFold is probably one of the most famous applications. So, this is a program, again developed by DeepMind to predict the structure, the 3D structure of the proteins, which are very important molecules for living systems, of course. So, that’s an example of success. We have developed a less famous example, would be a deep learning system that we have developed that looks at the measurements of gene expression, for instance, and tells you whether a gene is being expressed in a circadian manner in a given cell or not and analyzes those oscillations. [45.1s]

Patricia [00:19:58] [00:19:58]In your experience, what are some of the key advantages of using machine learning or deep learning approaches over traditional statistical methods in genomics research, and how do these advantages impact the research outcomes? [11.3s]

Pierre Baldi [00:20:10] [00:20:10]Depending on what you mean by traditional methods, their limitations have been, in the past for instance – that they have been developed or they work only on very small data sets. Whereas in genomics, we tend to have, or other areas of natural sciences, we can have very large data set, very high dimensional spaces, but it’s not the case that the, you know, traditional methods and deep learning are exclusive. Sometimes, we are combining statistical methods with deep learning methods in some way. And you can also view deep learning as a statistical method if you want to. But for example, when we are analyzing whether a gene is oscillating or not, we’re using deep learning methods. But at the end of the day, we’re spitting out something called a p value, which is a statistical measure of significance, which is very classical. So, we are combining classical statistical methods with machine learning, deep learning methods. Perhaps the most important thing is that deep learning methods can be applied to data without having a preconceived notion of what should be the right filters, the right ways of pre-processing the data. You’re just learning from the data itself. So, it’s a much more flexible and powerful method of adapting the analysis to the data itself without preconceived human filters or handcrafted filters if you want. [94.7s]

Patricia [00:21:46] [00:21:46]I wanted to talk about now the ethical implications and future of AI in healthcare. Given your extensive experience, what are your thoughts on the ethical implications of using AI in biological research and healthcare? [10.5s]

Pierre Baldi [00:21:57] [00:21:57]There are fundamental problems of AI ethics and AI safety that go beyond the domain of healthcare, which are very general, right? And there is a lot of research in a direction. What I like to say about that, one of the things I like to say is that, in the same way that artificial intelligence is inspired from natural intelligence, from biological intelligence, from the brain, a lot of what you see in AI safety is inspired from natural intelligence safety. Now, this may sound strange to you, but if you think about it, natural intelligence, our intelligence is actually very dangerous. You just have to look at human history to see a lot of, you know, wars, and concentration camps, and the invention of the new methods of torture, and weapons of mass destruction of increasing sophistication, right? So, human intelligence is extremely dangerous. And both nature and we as a society, have put in place a lot of measures to try to balance, to counteract or to prevent human intelligence from doing too much damage. In some sense, it’s a miracle that we are still, as a species, we’re still, flourishing, given the dangers that our daily intelligence pulls. So, what are the mechanisms? First, you have mechanisms of evolution, the way that the brain is being structured, in particular the limbic system. So, inside of us, we have a sense of ethic. We have a sense of what is right and what is wrong to some extent. And we feel guilty if we do something wrong, we have a sort of innate feeling of being guilty. And so, this is something that, we don’t know how to build within AI systems, but people are thinking about that and designing, you know, modular systems where maybe there are some modules that have a sense of morality and so on and so forth. So, that’s our first level of defense. And, I’m being very, you know, very coarse here. The second level of defense we have is learning by example. So, when you’re growing as a child, you know, you look at your parents, maybe you look at your teachers in school, and you try to imitate them. So, hopefully you were born in a nice family where your parents did not steal, did not kill, did not do things like that. And so, just by looking at them and imitating them, you become more ethical. If you want, you behave in a more ethical way. And this as an analogy in AI, there is something that we call the Reinforcement learning from Human Feedback, which is used even in this, you know, large language model like GPT-4 and so on, where the systems are trained by examples provided by other humans. So, the analogy there – there is a very precise analogy there between how we train the AI and how we learn from example as we grow up. Beyond that, we have very simple systems of rules like, for instance, the ten commandments in in the Bible or any other system of the rule, like a constitution that has maybe, you know, a few pages, not very long, with very basic principles… You shall not kill, you shall not lie. Whatever it is – you shall not steal. You know, those sort of things, and that the equivalent is called Constitutional AI. So, you are trying to build a very basic principle that any AI should obey in order to be ethical. And then, you can imagine building this principle in the AI systems. There could be, for instance, in the background of any prompt when you ask something to GPT-4, it could add this principle to your prompt in order to provide, you know, ethical and safe answers in some way. So, that’s the next level. The next level we have, because again, human intelligence is very dangerous, we have many walls – the next level, I think are the – it’s the legal systems. So, that’s a system of rules, but it’s not ten pages, it’s ten books, very thick books with all kinds of detailed rules depending on the area. You have rules for healthcare, you have the rules for transportation, for aviation, for all areas of human engagement. And so, there you can imagine building such very complex systems of law for AI systems, maybe using AI to help you build very detailed sets of rules for different domains, in particular for healthcare, since that’s the one you’re interested in. And again, you could have those in the background of any query, that is any prompt that is presented to an AI system, so that the system, responds in a way that is legal and ethical with respect to the domain of the query, right? So, there are interesting questions there. How can you flexibly, you know, search and bring the laws, the legislation that is pertinent for the question that is being asked to an AI at a given time, and even that is not enough to protect us from, human intelligence. Beyond that, of course, we have police, we have incarceration, we have enforcement systems, right? Armies, the international level. So, that is also as an analogy in the in the world of AI. People are doing research on those things, but you can think, for instance, one idea, a simple idea would be to have killer switches on any robot that is using AI, so that, you know, should the robot begin to misbehave or behave in ways that are unethical or unsafe to humans, there is a switch that can always be activated and that will turn off the robot entirely. That’s one example of one idea in the sort of – as a last resort or enforcement level of this hierarchy of measures that we have in order to protect us from human intelligence, but in a very analogous way from artificial intelligence. And, of course, it’s possible that artificial intelligence is also different from human intelligence. And maybe we need to discover, you know, new ideas for containing artificial intelligence and to make sure that it is, you know, deployed in safe ways. Now, when you go to the specific domain of healthcare, you have issues of, for instance, of privacy that become, you know, very important. And so, you have to think about, you know, how do you preserve privacy, when you apply AI systems to, to healthcare data, to electronic medical records and so forth. So, those are, you know, active areas of research. We have methods for de-identified data and so on, so forth. [460.5s]

Patricia [00:29:39] And how do you anticipate AI-driven technologies reshaping the landscape of biotechnology in the coming years?

Pierre Baldi [00:29:45] I think AI is reshaping all technologies and all areas of science. So, it’s being applied everywhere, and in that sense, you know, biotechnology is no different. But obviously, people are applying AI systems to, you know, at all scales and in all areas. So, for instance, drug discovery, drug design, is an example of an area where people are applying AI. We’re applying AI to predicting chemical reactions, which in turn could be used when if you’re thinking about the synthesis of new drugs, you know, how do you synthesize new drugs? What kind of reactions should you use? But those are just, you know, two examples out of, you know, thousands that you can get just by glancing through the literature.

Patricia [00:30:35] [00:30:35]As someone who has made significant contributions to the theory of deep learning, can you explain to us what deep learning is and how it differs from traditional machine learning approaches? [8.5s]

Pierre Baldi [00:30:44] [00:30:44]I wouldn’t say that deep learning differs from traditional machine learning approach. It is a machine learning approach, but deep learning is the idea that you have a learning system that is roughly, in a very rough way, inspired by the brain. So, it’s made of little neurons, and the neurons are connected to each other, by a connection called – connections that have a weight, it’s called the synaptic weight, right? So, not only you have a connection between neuron A and neuron B, but you have a strength, a weight, on that connection. And then, you have data, for example, inputs and outputs. And machine learning or deep learning is the question of how do you adjust the weights. How do you adjust the strength of these connections between all these neurons in the network, so that the system, the overall network, behaves in the right way, and behaves in the right way is dependent on what task you want to do. So, for instance, if you have images, let’s say in pathology, you have images and you want to detect whether there is cancer or not, right? This could be pathology images, could be MRI images and so on. Then you build such a neural network, and you train it from your data. You have images with cancer. You have images without cancer. This is called a classification problem. And you train, you adjust all these synaptic weights, all these trying from the connections, so that the network learns to give you the right answer as much as possible. So, when you give it an image, and it contains cancer, the answer should be yes. And if there is no cancer, if it’s a healthy image from a healthy subject, it should say no, right? So that’s the behavior you want the system to have. And there is basically one simple algorithm that tweaks all the connections strength in the network, so that you, at the end of the tweaking, which is what learning is, you end up with a system that has a good performance in this classification task. That’s the essence of deep learning. [136.0s]

Patricia [00:33:01] [00:33:01]And can you discuss some of the key advancements or breakthroughs in deep learning that have occurred in recent years? How have these advancements impacted our understanding of intelligence in both biological systems and machines? [10.5s]

Pierre Baldi [00:33:12] [00:33:12]Well, deep learning is not a new idea. It goes back at least to the 80s, and you can trace it even to earlier dates. And so, out of, you know, from a high level, there is really nothing new, not much new in very recent years, because the basic algorithms we’re using, which is essentially Stochastic Gradient Descent for learning, were already known in the 80s, which is quite surprising. There is a single algorithm that is very simple, that is behind all these incredible advances that you’ll see today, and the algorithm was known many decades ago. Now, of course, if you zoom in, you can see that there have been a lot of, incremental, adjustments and progress and so forth. So, the deep learning methods we’re using today are a little bit more sophisticated than what was available in the 80s, but at least in my sense, not that much. What is very different is the computing power. So, today we have computers that are a million times more powerful. We can use cluster of graphical processing units, GPUs, to train these systems. And that’s what really makes the biggest difference. It’s the scale of what we can do today, and to a lesser extent, the fact that we have much more data available to us through the internet, social media, sensor databases, etc., we have much more data in general that is available to us to train these systems. So, I would say the most important is the computing power, second most important is the data, and the third are the incremental adjustments or tweaks that have been produced in the last few decades, with respect to what was already available in the 1980s. But for instance, in the 1980s, we were already training, you know, deep learning system to recognize fingerprints. So, this was already something that we were doing in the late 1980s, using the same techniques that essentially are being used today for processing images and so on. [132.2s]

Patricia [00:35:26] [00:35:26]Earlier, you were talking about, large language models like GPT-4, and I think today when people think of AI, they automatically think of generative AI. I wanted to ask, what is your personal perspective on the capabilities and the potential implications of using this technology, especially as a professor? [15.5s]

Pierre Baldi [00:35:43] [00:35:43]Well, there is no question that I think GPT-4 was an important landmark. You probably have heard the term Turing test. This is the idea that was proposed by Turing, a long time ago, many decades ago. The idea that if you want to detect whether a system is intelligent or not, you let it converse with the human, maybe behind a curtain or through email, and the question is whether the human is able to detect whether he or she is, corresponding or interacting with a machine or with another human. And up to maybe, four years ago, people thought that problem was very difficult and still far from being solved and so forth. And then, with the advent of maybe even GPT-3.5, etc., but definitely with GPT-4, in my view, GPT-4 passes the Turing test. I think that it is capable of fooling a lot of human beings over a relatively short conversation. Many human beings wouldn’t be able to detect whether it is a computer or a human. This is controversial. Some of my colleagues think that it’s still unable to to pass the Turing test, and I think if this is not the point of this conversation, but there is no question that GPT-4 is amazing, absolutely amazing and very powerful. And so, that’s one of the biggest breakthrough I would say in recent years on the AI side. And yes, most people think – identify with with GPT-4 as being the the most cutting edge form of AI that we have today. And I think it is true, notwithstanding that now you can get to multimodal large language models that are capable to deal with images, with video, with voice, and so forth. So, all the – over time, they are going to be able to handle all these different modalities, not just text. [121.8s]

Patricia [00:37:45] [00:37:45]What are the potential implications of using this technology, especially in the context of it being used in educational institutions? [5.8s]

Pierre Baldi [00:37:52] [00:37:52]That is a very difficult question. Of course, they pose major challenge for universities and for education. And I think we don’t have good ideas yet on how to handle this. I mean, the most obvious level is the level of examinations, the science that, you know, there have been studies where – we have conducted studies in my group showing that these LLM are capable to pass not only very basic exams like entrance exams into university but even very advanced, we have studied, very advanced exams in specialties in medicine. So, for instance in anesthesiology, or veterinary, and medicine. So, advanced, very specialized exam, this GPT-4 is capable of passing all the ones we have tried. It can pass the bar exam for lawyers and so forth. So, that’s a problem for education, but the other perhaps more fundamental problems are that, you know, what is – what are the things that we should teach to the next generation, given that these systems already have all the knowledge in some way that has been produced by humans, right? So, what do you teach to the next generation? How do you assess the learning? Those are incredible challenges. At the same time, obviously, you have great opportunities for using AI to improve teaching, right? So, you could have your own personal tutor that is AI based, that knows you very, fairly well, that understands your, you know, where you’re at in your learning trajectory, and which is capable of presenting you questions and problems and exercises that are tailored to your personality and to where you are in your trajectory. I mean, we know that people are – function best when they are functioning of their frontiers. That is when you’re trying to learn something that is not too hard, otherwise, you are discouraged, but also not too easy, otherwise, it’s boring, right? So, you want to keep the interactions, the exam, the exercises and so forth of that edge, which is very personal. And so, you could imagine that having AIs that are doing that for you, having personal to tutor and maybe having a different personal tutor for different disciplines, and you know, that is very exciting and will happen in time, but it’s not something that is yet widely available. [162.8s]

Patricia [00:40:36] What measures do you think educators and institutions should take to ensure that students are equipped with the necessary knowledge and skills to responsibly and ethically utilize generative AI technologies?

Pierre Baldi [00:40:48] Difficult questions… I think all universities are struggling with with these sort of issues. And, you know, we don’t yet have, you know, clear answers to all these questions, but those are, you know, questions most universities are working on or trying to find solutions or best ways to address the problems and to improve education. And it may be that the whole education system will have to change over time in light of these technologies. You know, universities were invented, 1000 years ago. They have to adapt. And maybe, the system – society will create new systems or evolve universities in different ways, so that we can take advantage of all the AI technologies to improve, learning.

Patricia [00:41:37] [00:41:37]Dr. Pierre, before I let you go, just one last question. What do you envision will be the most significant advancement or breakthrough in the fields of AI, deep learning, and bioinformatics over the next decade? [9.8s]

Pierre Baldi [00:41:48] [00:41:48]It’s very difficult, and maybe, almost useless to try to make predictions. You know, no one predicted GPT-4 or almost no one had predicted that GPT-4 would be possible, you know, two years before. So, it’s very difficult to predict what will happen in the next ten years. I think one of the key questions is the question of AGI, which is not a well-defined term, but again, this idea of artificial general intelligence… Will it be achieved within the next ten years? Many people think so, but it’s no one knows exactly, right? So, that would be a major milestone. Or I could even say, you know, can we build intelligence that is ten times smarter than human intelligence? It could happen in the next ten years. Nobody knows for sure, right? It’s very difficult to predict, but I think that is the key question. You know, how smart can machine become? And to me, the most interesting application of that would be to science. Can we have a machine that suddenly can prove very difficult theorems of the level or maybe even beyond the level of a mathematician, a professional mathematician that no one knows if that is going to – if that can happen in the next ten years, but it’s a very interesting and important question. The same for application of AI to fundamental questions of physics and so forth. So, there are a few fundamental questions in science that have been around for a long time where we don’t know the answer. We would love to know the answer, and it would be incredible if AI was able to help us find the answer to those questions in the next decade. You know, my gut feeling is that we won’t get there in the next decade for very difficult questions like there’s a mathematical problem, the so-called P = NP problem, which is a very difficult problem. It would be amazing if we solved that problem in the next ten years. No one knows, but my gut feeling is that we won’t solve it in the next ten years, but I could be completely wrong. And so, I wouldn’t bet anything on what is going to happen in the next ten years. [141.7s]

Patricia [00:44:11] Thank you for that, Dr. Pierre! And before we end the podcast, is there any message you would like to share to anyone listening to this podcast? Maybe some advice?

Pierre Baldi [00:44:19] I’m not in a position to give any generic advice. Just, I think, everyone should try to understand AI or what is happening in the world of AI as much as possible, because it is a very powerful technology with a lot of potential on the upside and also some potential on the downside. And so, it is important to be to be educated and to understand, to try to understand what is happening and where things are going in order to make informed decisions.

Patricia [00:44:57] Thank you so much, Dr. Pierre, for joining us on an episode of The AI Purity Podcast. And of course, thank you to everyone who might be listening to this episode. We hope you enjoyed uncovering the mysteries of AI, and please stay tuned for more in-depth discussions and exclusive insights into the world of artificial intelligence. Don’t forget to visit our website. That’s www.ai-purity.com, and share this podcast to spread the word about the remarkable possibilities that AI Purity offers. Until next time, keep exploring, keep innovating, and keep unmasking the AI. Goodbye, Dr. Pierre! Thank you so much for being here!

Pierre Baldi [00:45:28] Thank you!

Patricia [00:45:29] Thank you, goodbye!

Join our mailing list!

Latest Updates

Algorithmic Bias and How To Prevent Them with Dr. Vered Shwartz

Aug 12, 2024

Algorithmic bias in the context of artificial intelligence, machine learning, and natural language processing could mean varying degrees of negative impacts. For Dr. Vered Shwartz, assistant professor of computer science at the University of British Columbia, building...

Cognitive Scientist Dr. Jim Davies on AI and Its Existential Threats

Aug 2, 2024

Cognitive scientist Dr. Jim Davies is a renowned full professor at the Institute of Cognitive Science at Carleton University where he also directs the Science of Imagination Library. Dr. Davies not only leads in the academic space, but he is also the renowned author...

What Is Informatics: Insights From Konnex.AI Founder David Wild

May 10, 2024

What is informatics and why is this study important today? Professor at Indiana University, David Wild is the featured guest on The AI Purity Podcast episode 6 and he answers this question and more. David Wild brings a wealth of knowledge and experience as the...

For Cyber Security Personnel

For Developers

For Educators

For Machine Learning

For Recruiters

For Students

For Virtual Assistant Agencies

For Writers

888-546-6016

Bioinformatics: Dr. Pierre Baldi on AI Applications in Biotechnology

A History Into Dr. Pierre Baldi’s Career Journey

What Is Bioinformatics?

Deep Learning, Machine Learning, and NLP in Biology

The Ethical Implications of Applying AI In Biology

Listen To The AI Purity Podcast

Listen Now

Join our mailing list!

Thank you for subscribing!

Latest Updates

Algorithmic Bias and How To Prevent Them with Dr. Vered Shwartz

Cognitive Scientist Dr. Jim Davies on AI and Its Existential Threats

What Is Informatics: Insights From Konnex.AI Founder David Wild

Quick Links

Tools

The Team

Our Policies

Contact Us

Pin It on Pinterest