Simple Example usage
The tidyversetopandas package brings the familiar syntax of R’s tidyverse to Python’s pandas library, making it an invaluable tool for data scientists and analysts who are transitioning from R to Python. This documentation provides a real-life example demonstrating how to use the key functions of tidyversetopandas, including mutate, filter, select, and arrange.
Getting Started
First, ensure that tidyversetopandas is installed and imported alongside pandas:
# Import necessary packages
import pandas as pd
from tidyversetopandas import tidyversetopandas as ttp
Example Data
We will use a sample dataset representing sales data for illustration. Let’s create a pandas DataFrame:
# Sample data
data = {
"ProductID": [101, 102, 103, 104],
"Sales": [250, 150, 300, 200],
"Region": ["East", "West", "East", "South"],
}
df = pd.DataFrame(data)
# Display the initial DataFrame
print("Initial DataFrame:")
df
Initial DataFrame:
| ProductID | Sales | Region | |
|---|---|---|---|
| 0 | 101 | 250 | East |
| 1 | 102 | 150 | West |
| 2 | 103 | 300 | East |
| 3 | 104 | 200 | South |
Using tidyversetopandas
1. Mutate: Adding and Modifying Columns
Suppose we want to calculate the VAT (Value Added Tax) for each sale, assuming a flat rate of 15%. We can use the mutate function to add this new column:
# Adding a new column for VAT
df = ttp.mutate(df, "VAT = Sales * 0.15")
print("\nDataFrame after applying 'mutate':")
df
DataFrame after applying 'mutate':
| ProductID | Sales | Region | VAT | |
|---|---|---|---|---|
| 0 | 101 | 250 | East | 37.5 |
| 1 | 102 | 150 | West | 22.5 |
| 2 | 103 | 300 | East | 45.0 |
| 3 | 104 | 200 | South | 30.0 |
2. Filter: Row-wise Filtering
If we need to filter the data to include only sales greater than $200, filter comes in handy:
# Filtering rows where sales are greater than 200
df = ttp.filter(df, "Sales > 200")
print("\nDataFrame after applying 'filter':")
df
DataFrame after applying 'filter':
| ProductID | Sales | Region | VAT | |
|---|---|---|---|---|
| 0 | 101 | 250 | East | 37.5 |
| 2 | 103 | 300 | East | 45.0 |
3. Select: Choosing Specific Columns
To focus on specific columns, for instance, ProductID and VAT, use the select function:
# Selecting specific columns
df = ttp.select(df, "ProductID", "VAT")
print("\nDataFrame after applying 'select':")
df
DataFrame after applying 'select':
| ProductID | VAT | |
|---|---|---|
| 0 | 101 | 37.5 |
| 2 | 103 | 45.0 |
4. Arrange: Sorting Data
Finally, to sort the data based on ProductID in descending order:
# Sorting the DataFrame
df = ttp.arrange(df, False, "ProductID")
print("\nDataFrame after applying 'arrange':")
df
DataFrame after applying 'arrange':
| ProductID | VAT | |
|---|---|---|
| 2 | 103 | 45.0 |
| 0 | 101 | 37.5 |
Conclusion
With these simple and intuitive functions, tidyversetopandas makes the transition from R’s tidyverse to Python’s pandas seamless and efficient. Whether it’s data manipulation, analysis, or preparation for visualization, this package ensures a smooth and familiar workflow for R users in the Python environment.
Remember, this package is in development, and more features and improvements are to be expected. For more detailed information, refer to the full documentation and the repository on GitHub.