Learn how to use Python and machine learning to build a bioinformatics project for drug discovery.
✏️ Course developed by Chanin Nantasenamat (aka Data Professor). Check out his YouTube
channels for more bioinformatics and data science tutorials:
/dataprofessor/ and /codingprofessor/
🔗 And Medium blog posts for more data science tutorials https://data-professor.medium.com/
⭐️ Code ⭐️
💻 Parts 1-5 https://github.com/dataprofessor/bioinformatics_freecodecamp/
💻 Part 6 https://github.com/dataprofessor/bioactivity-prediction-app
⭐️ Course Contents ⭐️
⌨️ (0:00) Introduction
⌨️ (4:29) Part 1 - Data collection
⌨️ (26:57) Part 2 - Exploratory data analysis
⌨️ (49:41) Part 3 - Descriptor calculation
⌨️ (1:01:51) Part 4 - Model building
⌨️ (1:10:41) Part 5 - Model comparison
⌨️ (1:18:15) Part 6 - Model deployment
🎉 Thanks to our Champion and Sponsor supporters:
👾 Wong Voon jinq
👾 hexploitation
👾 Katia Moran
👾 BlckPhantom
👾 Nick Raker
👾 Otis Morgan
👾 DeezMaster
👾 Treehouse
--
Learn to code for free and get a developer job: https://www.freecodecamp.org
Read hundreds of articles on programming: https://freecodecamp.org/news
Category
Show more
Comments - 322
@
@Alecor_studio3 years agoNever in my life would i think i would get a video for this. There' s not many resources on bioinformatics on youtube. This is godsend! Thank you very much. 388
@
@louzewdie91042 years agoAs someone with a bioinformatics minor who always wants to learn more, i cannot express how ecstatic and grateful i am to find your content. Thank you good sir! 12
@
@helloworld20543 years agoI want to study bioinformatics and i' m currently learning python. So grateful for this course! 138
@
@clodsire_3 years agoWish this vid came out sooner. Im about to graduate as a bioinformatics major and i remember struggling so much bc of the lack of online resources and m . ...Expand43
@
@donutsandpenguins28393 years agoCan' t believe how lucky i am! I was just considering my options among the hundreds of computer science branches yesterday, and bioinformatics and computational . ...Expand38
@
@emmanuelonah45962 years agoThank you professor. It' s so heartwarming to have this quality of information shared for free. The best i have seen onall my questions answered and opened my eye to the many facets of computational drug modeling and simulation. Thank you once again, prof. ...Expand11
@
@omaralam79753 years agoIm starting a job at astrazeneca next month. Thank you sir for the course, i will use the knowledge to the best of my ability. 96
@
@miltonkambarami81293 years agoI am from the virology side of bioinformatics and its wonderful to find atutorial on alearning platform. I embrace more of the computational side to answer biological questions and the two are starting to merge as one. Sooner or later programming is going to be a compulsory skill just like writing but what will matter is types of questions and problems you can solve using programming. ...Expand39
@
@shakuntalabaichoo79223 years agoThis is the best bioinformatics video/tutorial on youtube, i have ever come across. Thank you so much for such a great. 3
@
@jaggyjut3 years agoThis is pure gold. Thank you for creating this tutorial. 13
@
@stringpriest8 months agoThanks so much for this video. I obtained an msc in pharmacognosy and taking another masters in regulatory science. Trying to go into tech, i started a. ...Expand
@
@bassamtork76793 years agoHuge work is put in this video, wonderful method of delivering the course. Many thanks to dr. Chanin nantasenamat. 3
@
@apoorvasetu58403 years agoHello chanin, i am using ec50 standard_type forwhen labeling compounds, is it the same as ic50 or is it different? 1
@
@khanmubeen3 years agoRespected professor! I' m quite amazed of how you stepped out to share and create such a knowledgeable course for all of us. I love that. 2
@
@MrViperHiggins3 years agoLove it man keep up the great work. Really appreciate people like yourself making data science accessible to those of us coming through the process of learning and working with data. 6
@
@leonardomontes29753 years agoYou are always on top guys! I really appreciate the way you share the knowledge! Thanks quincy! 8
@
@benjamintwumasi24806 months agoThanks so much for this video, professor. You have saved a million lives of people who want to learn python for drug discovery. Thank you once again.
@
@jannmikoingelrabagogamingc60122 years agoOh hey! I am so grateful and excited for this one man! I am an upcoming 3rd year bs pharmacy student and planning to do my undergraduate thesis on this area specifically. 2
@
@palakgupta87283 years agoits a great video really helpful. at you are not able to see other types of standard type variable because youve already entered its filter as ic 50 so it shows only ic50 entries. once you remove filter.standard type[ic50] , youll see array with inhibition, ki, ec 50, kd, activity. .....Expand3
@
@LM-ch8rh3 years agoHi. I' m trying to followcourse but what order do i watch them? I started with this video but he then refers to a previous video where he showed us how to download data. Is there a place where i can get the recommended order of videos to watch? Thank you so much. ...Expand
@
@Teflon20003 years agoI' ve study industrial chemistry in college. This video is perfect. 7
@
@0307ismail3 years agoHi data professor, i am doing phd in bioinformatics and my research topic is the same as of this video. Thnx alot for this video. I think i will need more of your support and guideline for my research. 4
@
@azadjain85343 years agoI was searching for this content for long time. Finally this arrived. Thank you so much for invaluable content. 7
@
@gurudeebanselvaraj88883 years agoExcellent session. Hats off to you @data professor. I really enjoyed the whole lecture. It is quite informative for budding researchers in. 3
@
@knockknockyoo58123 years agoI truly appreciate it for sharing this video for free. 2
@
@brunobustos963 years agoComputational neuroscience next please! Love your channel. 12
@
@joyrainbowdress3 years agoI' m a dentistry student but i love computers as much as i love biology, i also would love to explore the world of clinical research and i think i' d take up bioinformatics in my masters along with my clinical practice, it seems like the best intersection between healthcare and it. ...Expand17
@
@sohithreddy4 months agodescriptor_output.csv file used in is same for anything we take right? like i mean padel descriptor.
@
@christianscientist39633 years agoSince you used the chembl database to look at targets and small molecules in your course, is there a library code to extract zinc and pubchem drug database to use their small molecule? 3
@
@prajyotprabhu8273 years agoExcellent! Much appreciated efforts you folks are putting in. Thank you very much. 2
@
@georgevoknerech2283 years agoThank you so much. I' ve never dreamed that we would' ve get a lesson on. 1
@
@andrewchen23493 years agoSo good to see data professor on fcc again! Thank you! 2
@
@OliverShey3 years agoHi sir, thanks so much for the wonderful presentation. I have never watched such a clear video on the subject. Keep it up. I have another concern. I changed. ...Expand4
@
@deeptibhanot7622 years agoHello professor, i want to work on parameter tuning of 3d bioprinters before they print, so it will take input as the tissue type /bone type that we want please help me from where i can get dataset for 3d bioprinted organs as well as dataset of 3d bioprinters. ...Expand1
@
@BerkshireHathawayCRE3 years agoI hope you guys can make more bioinformatics videos! Such an important application! 4
@
@e.m.26553 years agoBeen hoping you would make this video! I have a background in biology and a very small bit of experience in informatics. 9
@
@curiousresearcher308last yearHello! I have a question. The bioactivity class is not showing for me. This is the error i am receiving " #39; bioactivity_class' not in index"
@
@user-fe2oh8oj2u3 years agoLooking forward for " python for finance" 44
@
@kamalikabhattacharjee3363 years agoIn part 5 for " compare mlthe code is showing mecannot import namefrom" any idea how to resolve that. 1
@
@iam1.last yearWould it be recommended to split the test and train data differently, i would assume that if we had less to train and more to test on then the results would be more accurate, but if im wrong please somebody explain.
@
@ccuny13 years agoFantastic. Incredible material on this channel and this is no exception. Thank you. 3
@
@mahmoudal-bassam25077 months agoThis is very helpful! Thank you so much. I tried something like this to compare distribution of normal and -log10 data (importas plt fig, ax =nrows=2) ax=ax[0] x: 50, ax = ax[1] . ...Expand
@
@lukecahalane18113 years agoWhen i try and display the dataframe in the data collection section in pycharm i get a mixture of parentheses and not all the data that is actually present on chembl, any help would be appreciated.
@
@vpundir30243 years agoIts very useful for doctors who are learning coding. 7
@
@jondoe86583 years agoThank you and to all the people who share their knowledge for free. 2
@
@LujoSey5 months agoHi prof, i was following your session closely but i got lost on the mounting of colab after creating the 3 dataframes. Would you please explain the colab jupiter interface for a layman like me? Thanks.
@
@kevindelgado29822 years agoGlad this information exist on youtube, in my thesis i' ll be using the megamolbart model from nvidia:
@
@user-yb6hm1jz6s9 months agoThank you very much data professor for sharing the great learning resource!
@
@ayeshaafzal4884last yearLoved the whole video. Recommendation for part 4 in colab, to prevent error. In the last scatterplot building chunk use: ax =y=y_pred, . 1
@
@WelingtonSilvaMusica3 years agoThat' s exactly what i was searching for, what rich material! Thank u so much!
@
@pacifio3 years agoMan be expert in biology and computer science while me just being dumb 24/7 epic style. 23
@
@crismo77533 years agoThank you very much for this great tutorial! Sometimes i am asking myself how much do you wish to have a better equipment, but i don' t dare to ask you that. I wish you great success and good health!
@
@paulynamagana75603 years agoI' m doing my phd in pharmacy and i wish i had seen your videos before. I have discovered my love fortoo late to change my phd tho. 6
@
@aashishkatyal3 years agoDear professor, it is always awesome to watch and learn from your sessions. Can you make some video regarding docking by using python? 2
@
@dojaibi2753 years agoAm currently study mscthank you for this package. 10
@
@shenglinjing73502 years agoIn part 5, runningdoesn' t give any output and if i doit returns (0, 4) do these 2 lines in the original =x_train, y_train, y_train) =x_test, y_train, y_test). ...Expand
@
@chennakeshvapodila53672 years agoHey are we taking a drug from chembl database and predicting the best drug or are we taking a desease causing protein and finding a cure to it. Plz explain?
@
@GCKteamKrispy2 years agoJust found a book about this and got excited about this field. It sounds interesting.
@
@MoonSahab2 years agoI am trying to save the molecule. Smi file as txt to upload in the prediction app but it doesn' t work out. Can someone please guide me to get the bioactivity i have implemented the complete code but it results out as csv and conversation doesn' t work for me. thanks. ...Expand
@
@hoyingan65772 years agoDear data professor, how could i use other bioinformatic db webpages, like rcsb pdb kegg data and adapt them to your codelab?
@
@exons-codingforthebest82952 years agoDear sir, model building ended up in an error - input contains nan, infinity or a value too large for=y_train) r2 =y_test) r2) can please help in this regard. 1
@
@lllll72602 years agoDoes the coronavirus target protein in the beginning mean protein that binds to coronavirus? I am confused with that.
@
@VyshnavieRSarma-rb7ur3 years agoWow thank you professor. Really enjoy learning a lot from your videos. 2
@
@sinakoohbour3 years agoI am not able to install the libraries. It shows " in [0]quot; next to every item and am not able to get past that part. Any ideas would be appreciated. Thanks.
@
@lo88852 years agoBeginning part 4, big clap for all of us hope to find the thesis subject.
@
@moca3513 years agoWhat modules do i need to get the basics for this course?
@
@user-uj8up5jd6j3 years agoSomehow when I follow along, from I get "NaN" for items 128-132 in the bioactivity class column instead of "inactive" This throws off my results for the rest of the procedures. Does anyone know how I can fix this? .....Expand
@
@scign2 years agoIn the deployed app, you should highlight any significant differences in the removed columns since those parts of the fingerprint could be important for also, what was the purpose of calculating the lipinsky descriptors if only the pubchem fingerprint was going to be used in the model?. ...Expand1
@
@ranggawrnt3 years agoThank you so much for your wonderful video presentation, this is really helpful!
@
@DavidRamirezdmramirezs3 years agoWow great video tutorial. I will try to implement it in my lab. thanks! 1
@
@ayeshaafzal48849 months agoCan this whole process be regarded as a qsar model? 1
@
@OfficialBunnE2 years agoWhat microphone and camera do you use?
@
@DrNoureddinSadawi3 years agoThis is a fantastic tutorial, many thanks!
@
@militant_dilettante6 months agoThank you for such wonderful lesson! So much useful information, and so tightly packed! i have questions, though. In the beginning of both notebooks is this step specifically for calculation of the lipinski' s descriptors? Because later we concatenate the descriptors dataframe with the original dataframe. also, i am doing that lesson more than 2 years after the video was posted, and in thedata chembl fed me at least onevalue of 0. What do you recommend replacing it with: 1 or nan?. ...Expand
@
@molecule_mindslast yearHow to use our created models? Like i created a model forhow to use it like this app?
@
@noshintasnia43026 months agoHello data professor, how can i user your bioactivity app without creating your code? I mean if i download your code file then how could i run, from which software i could run it.
@
@dumbkiddo31892 years agoThis channel should receive an academic award in pedagogy. What you guys are doing is amazing!
@
@shivanipawar22963 years agoThank you so much sir for this useful information. It' s very difficult to get any information about bioinformatics on youtube. I' m currently studying in bsc(looking forward for more information. 2
@
@tejiyo3 years agoIn part 3, Why did you convert the df to .smi file and used command line to apply padel function, can you explain the line " ! bash " please 1
@
@DooDooDaddyTV3 years agoI have my bachelors in biology and have worked in a microbiology lab for the last 4 years. Im currently learning python and r though sites like code academy. Do . ...Expand
@
@miracleuche5862 years agoThank you so much for this amazing tutorial.
@
@molecule_mindslast yearIn part 4 im getting an error. In scatter plot it is showing error like regplot takes from 0 to 1 positional arguments but 2 positional arguments were given.
@
@febaelsamathew93483 years agoThis was a topic i' m searching for several months. Since i' m from a pharma background and started learning python, but don' t have an idea how this will work for job applications to the pharmaceutical industry. ...Expand
@
@valentinaortiz99356 months agoSo, correct me if i am wrong, but what you just taught us is how to potentially find a hit through acomputational screening?
@
@michaelfrimpong63443 years agoDear data professor. Thanks for the wonderful videos. Please i have been able to follow along the video up to part3 where i am struggling. The command. ...Expand1
@
@tinacole14503 years agoHey. Having trouble saving a fastq file and/or txt file . it prints well in my editor (VS) but I only get 1 line of data in the file when I save it. id="hidden42" class="buttons"> ' The file (which is part of a bigger context manager) for fastq_obj in fastqfile: fastq_obj[0] sequence = fastq_obj[1][0:5] bartrim = fastq_obj[1][5:] # to trim sequence barcode data = clinical_data.loc[clinical_data.Barcode==sequence] we loop thru fastq first method: with open('data','w+') as m: print(data, file=m, end='') m.close() next method: csv_data = data.to_csv(path_or_buf = '\ndata.csv', index = True) - If I type print(data) there are tons of lines which print out. but saving only has 1 line .....Expand
@
@theaieducator15953 years agoBioinformatics isfor the helpful videos.
@
@kazishahjalal685210 months agoCan anyone be kind to install the libraries. I have been trying to figure them for about 2. 5 hours and still import does not seem to work. Whenever i write in pycharm it shows error.
@
@indumatisharma36463 years agoThanks for this video. Can the model be deployed in rshiny? 1
@
@MolecularMatt08217 months agoThe gethub items are not coming up for me, would you mind maybe uploading them? Other than that, these videos look amazing! My thesis is bioinformatics. ...Expand1
@
@michaelmoore75682 years agoHow do you say that the best way to learn data sciences to do data science does it have anything to do with abs rule for example? Also our bio informatics. ...Expand
@
@aleksandonov84133 years agoBack in the day i was reading perl in bioinformatics and it was mostly regular expressions. I guess these days it is python and machine learning. 1
@
@kyleerb94733 years agoWhen writing the mann whitney function why not put it in a for loop to allow for multiple descriptors to be added for each call of the function, this reduced. ...Expand
@
@jaggyjut3 years agoWould be able to do a session on hipaa complaint application design. It will be interesting to know how to design the.
Related videos for Python for Bioinformatics - Drug Discovery Using Machine Learning and Data Analysis:
i am using ec50 standard_type forwhen labeling compounds, is it the same as ic50 or is it different? 1
" #39; bioactivity_class' not in index"
fig, ax =nrows=2)
ax=ax[0]
x: 50, ax = ax[1]
. ...Expand
it is always awesome to watch and learn from your sessions. Can you make some video regarding docking by using python? 2
=x_train, y_train, y_train)
=x_test, y_train, y_test). ...Expand
thanks. ...Expand
r2 =y_test)
r2) can please help in this regard. 1
hope to find the thesis subject.
thanks! 1
i have questions, though. In the beginning of both notebooks is this step specifically for calculation of the lipinski' s descriptors? Because later we concatenate the descriptors dataframe with the original dataframe.
also, i am doing that lesson more than 2 years after the video was posted, and in thedata chembl fed me at least onevalue of 0. What do you recommend replacing it with: 1 or nan?. ...Expand
The file (which is part of a bigger context manager)
for fastq_obj in fastqfile:
fastq_obj[0]
sequence = fastq_obj[1][0:5]
bartrim = fastq_obj[1][5:] # to trim sequence barcode
data = clinical_data.loc[clinical_data.Barcode==sequence] we loop thru fastq
first method:
with open('data','w+') as m:
print(data, file=m, end='')
m.close()
next method:
csv_data = data.to_csv(path_or_buf = '\ndata.csv', index = True)
- If I type print(data) there are tons of lines which print out. but saving only has 1 line .. ...Expand