An End-to-end Data Science Project with Anaconda Assistant
Nanette George
Sophia Yang
Anaconda has recently announced Anaconda Assistant, an AI-powered Jupyter Notebook extension in Anaconda Notebooks. Created to enhance your coding experience with Python, Anaconda Assistant offers an intuitive chat interface designed to streamline your data science projects. In this blog post, we’ll walk through the key stages of a data science project using Anaconda Assistant including loading, understanding, and visualizing data, then applying machine learning techniques for classification.
Get started with Anaconda Notebooks and Anaconda Assistant
Anaconda Notebooks is a cloud notebook service that allows anyone, anywhere to start coding and begin their data science journey. Start by visiting https://nb.anaconda.cloud/, where you’ll see a familiar Jupyter Notebook interface. Anaconda Notebooks provides pre-built environments for you to use right away. Click on a tile in the Notebook section to open an empty notebook, then launch the Assistant by clicking this icon:
Load Data
Let’s start our project by loading some data. Not sure where to find data and which data to use? No problem! Anaconda Assistant can help you easily find and load a dataset, just click “Load a DataFrame” to see a list of datasets to choose from. Select a Dataframe—for example, “Load the Palmer penguins dataset”—and Anaconda Assistant will automatically create the code to load this dataset for you. Finally, click “Run code in Notebook” to run the code.
Now we have successfully loaded the penguins dataset into a Pandas DataFrame df in Anaconda Notebooks.
You can also write your own prompt to load a dataset. For example, here is a prompt you can use:
I want to learn about data science and machine learning. Can you help me find and load a dataset into a Pandas DataFrame?
Using this prompt, Anaconda Assistant loaded an iris dataset. Feel free to experiment with the prompt to discover other datasets. Throughout the remainder of this blog post, we’ll be using the penguins dataset as our illustrative example.
Understanding the Data
Before delving into analysis, it’s essential to understand the data. To begin, let’s attach the penguins DataFrame to the chat by clicking “Attach to chat.” This step will send the DataFrame information, including its columns and datatypes, to Anaconda Assistant.
Then we can use various prompts to ask Anaconda Assistant to help us understand the data. For example, we can ask Anaconda Assistant to “help me understand this data” directly. The results show detailed information about each column:
Prompt: help me understand this data
We can further ask questions about the descriptions of the variables where we can find information on the descriptive statistics of each column, including count, mean, std, min, and max, as well as lower, 50, and upper percentiles values.
Prompt: Distribution of the variables?
Curious about the correlation among variables? Just ask in the prompt. Culmen Length (mm) seems positively correlated with Flipper Length (mm) and Body Mass (g). Flipper Length (mm) shows the highest correlation with Body Mass (g).
Prompt: What are the correlations among variables?
In data science projects, we often need to deal with missing values. Luckily this DataFrame has very few missing values, so we don’t need to worry about it. Otherwise, we can ask Anaconda Assistant to help us impute the missing values when needed.
Prompt: Are there missing values in the DataFrame?
There are several categorical variables in the dataset; here is an example showing that we can ask Anaconda Assistant to code a categorical variable to a numerical variable.
Prompt: Help me code the sex variable to a numeric variable
Data Visualization
In a similar fashion, we can explore the data visually using different prompts. Your prompts can range from general to specific, and Anaconda Assistant will help you visualize the data accordingly.
Prompt: Help me understand this data visually
Prompt: Help me understand the data visually for each species
Machine Learning
Once we have a better grasp of the data, we can move on to building models. For instance, we could develop classification models for predicting penguin species and gender. Just as we’ve seen earlier, we can rely on Anaconda Assistant to guide us through the model-building process. During this step, it’s valuable to incorporate the following details into your prompt:
- The DataFrame, such as ‘df’
- The type of model, like a classification model
- The algorithms you’re considering, such as Random Forest
- The features you’re using, such as Culmen Length, Culmen Depth, Flipper Length, and Body Mass
- The target variable, like Species
By specifying these elements, Anaconda Assistant can better understand your project goal and assist you effectively in building the desired models. You can even ask Anaconda Assistant to employ various model algorithms and compare their performance. This is what I did with the example below.
Species Classification
Prompt: With this DataFrame df, I'd like to try classification models using features such as Culmen Length, Culmen Depth, Flipper Length, and Body Mass to predict the Species. Could you please run LogisticRegression, RandomForest, and SVC models? And could you please tell me which model works the best?
The result shows that Random Forest performs the best. We can further investigate model performance through a confusion matrix:
Prompt: Could you show me the confusion matrix for these three models lr_model, rf_model, and svc_model?
Gender Classification
With the same prompt, we can conduct a classification model predicting penguin sex:
Prompt: Using this dataframe df, I'd like to try classification models using features such as Culmen Length, Culmen Depth, Flipper Length, and Body Mass to predict the sex. Could you please run LogisticRegression, RandomForest, and SVC models? And could you please tell me which model works the best?
Conclusion
Anaconda Assistant is a powerful AI tool that empowers data scientists to carry out end-to-end data science projects seamlessly. From loading and understanding data to visualizing insights and applying machine learning algorithms, Anaconda Assistant streamlines the process and enables efficient data analysis. Whether you’re a beginner or an experienced data scientist, Anaconda Assistant can simplify and enhance your data science workflow, making it easier to transform simple prompts into valuable insights. Visit Anaconda Notebooks to try out Anaconda Assistant today!
You may also be interested in:
Talk to an Expert
Talk to one of our experts to find solutions for your AI journey.