Simple Example usage

The tidyversetopandas package brings the familiar syntax of R’s tidyverse to Python’s pandas library, making it an invaluable tool for data scientists and analysts who are transitioning from R to Python. This documentation provides a real-life example demonstrating how to use the key functions of tidyversetopandas, including mutate, filter, select, and arrange.

Getting Started

First, ensure that tidyversetopandas is installed and imported alongside pandas:

# Import necessary packages
import pandas as pd
from tidyversetopandas import tidyversetopandas as ttp

Example Data

We will use a sample dataset representing sales data for illustration. Let’s create a pandas DataFrame:

# Sample data
data = {
    "ProductID": [101, 102, 103, 104],
    "Sales": [250, 150, 300, 200],
    "Region": ["East", "West", "East", "South"],
}
df = pd.DataFrame(data)

# Display the initial DataFrame
print("Initial DataFrame:")
df
Initial DataFrame:
ProductID Sales Region
0 101 250 East
1 102 150 West
2 103 300 East
3 104 200 South

Using tidyversetopandas

1. Mutate: Adding and Modifying Columns

Suppose we want to calculate the VAT (Value Added Tax) for each sale, assuming a flat rate of 15%. We can use the mutate function to add this new column:

# Adding a new column for VAT
df = ttp.mutate(df, "VAT = Sales * 0.15")

print("\nDataFrame after applying 'mutate':")
df
DataFrame after applying 'mutate':
ProductID Sales Region VAT
0 101 250 East 37.5
1 102 150 West 22.5
2 103 300 East 45.0
3 104 200 South 30.0

2. Filter: Row-wise Filtering

If we need to filter the data to include only sales greater than $200, filter comes in handy:

# Filtering rows where sales are greater than 200
df = ttp.filter(df, "Sales > 200")
print("\nDataFrame after applying 'filter':")
df
DataFrame after applying 'filter':
ProductID Sales Region VAT
0 101 250 East 37.5
2 103 300 East 45.0

3. Select: Choosing Specific Columns

To focus on specific columns, for instance, ProductID and VAT, use the select function:

# Selecting specific columns
df = ttp.select(df, "ProductID", "VAT")
print("\nDataFrame after applying 'select':")
df
DataFrame after applying 'select':
ProductID VAT
0 101 37.5
2 103 45.0

4. Arrange: Sorting Data

Finally, to sort the data based on ProductID in descending order:

# Sorting the DataFrame
df = ttp.arrange(df, False, "ProductID")
print("\nDataFrame after applying 'arrange':")
df
DataFrame after applying 'arrange':
ProductID VAT
2 103 45.0
0 101 37.5

Conclusion

With these simple and intuitive functions, tidyversetopandas makes the transition from R’s tidyverse to Python’s pandas seamless and efficient. Whether it’s data manipulation, analysis, or preparation for visualization, this package ensures a smooth and familiar workflow for R users in the Python environment.

Remember, this package is in development, and more features and improvements are to be expected. For more detailed information, refer to the full documentation and the repository on GitHub.