PyCharm 2025.1 Help

Work with Data Wrangler

Data Wrangler is a no-code tool that simplifies data cleaning and preparation.

It offers an interactive user interface that allows you to view and analyze the data, displays column statistics and visualizations, and automatically generates Python code.

Open Data Wrangler

  1. Open a Jupyter notebook.

  2. Run code cell to create a pandas dataframe. For example, run cell with the following code:

    import pandas as pd # Data data = { 'Name': ['John', 'Anna', 'Peter', 'Linda', 'Dina', 'Kate', 'Tom', 'Emily'], 'Age': [22, 78, 22, 30, 45, 30, 35, 40], 'Gender': ['Male', 'Female', 'Male', 'Female', 'Female', 'Female', 'Male', 'Female'], 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio', 'San Diego'], 'Occupation': ['Engineer', 'Doctor', 'Teacher', 'Nurse', 'Architect', 'Lawyer', 'Accountant', 'Scientist'] } # Create a DataFrame df = pd.DataFrame(data) # Display the DataFrame df
  3. In the upper-right corner of the output cell, click Edit in Data Wrangler.

    Click Edit in Data Wrangler
  4. Data Wrangler will open in a new tab:

    Data Wrangler tab opened

Use Data Wrangler transformations

Data Wrangler transformations in the Data Wrangler tab

Transformation

Description

Find and replace

Find and replace

Replaces cells with a specified matching pattern from a selected column

Sort and filter

Filter

Filters rows in a selected column based on a specified condition and value

Clean and remove

Drop column

Removes a selected column from a table

Remove duplicates

Removes all rows that have duplicate values from a selected column

Drop missing values

Removes all rows with missing values from a selected column

Remove rows with NaN values

Removes rows that contain empty values from a table

Drop rows

Removes selected rows from a table

Create additional column

Transform column with string

Transforms strings in a selected column. You can select one of the following transformations:

  • Capitalize first character

  • Convert text to lowercase

  • Convert text to uppercase

One-hot encoding categorical variables

Splits categorical data from a selected column into a new column for each category

Normalize and scale

Min-Max scaling

Rescales a selected numerical column between a minimum and maximum value

Z-Score normalization

Transforms the data from a selected column into a distribution with a mean of 0 and a standard deviation of 1

Handling outliers or skewed distributions

Outlier detection with IQR

Detects outliers in a selected column using Interquartile Range

Reduce skewness

Reduces skewness by applying logarithmic or square root transformation to the data in a selected colum

Outlier detection with MAD

Detects outliers in a selected column using Median Absolute Deviation

Outlier detection with Euclidean distance

Detects outliers in a selected column using Euclidean Distance

Other

Fill missing

Replaces cells with missing values with a new value in a selected column

Round numerical

Rounds numbers in a selected column to the specified number of decimal places:

  • Round: rounds a number to the nearest integer.

    If the fraction of the number is 0.5 or higher, it rounds up. If it's less than 0.5, it rounds down.

  • Floor: rounds a number down to the nearest integer

  • Ceil: rounds a number up to the nearest integer

Split column

Splits a selected column into several columns based on a user defined delimiter

Change a type of column

Changes the data type of the selected column

Manage transformed data

You can create a new cell in your Jupyter Notebook with the generated data transformation code, copy the code to your clipboard, or save the transformed dataset as a new file.

  1. Click Export in the upper-right corner of the Steps pane.

    In the pane you can view the history of changes applied to your data.

    Click Export
  2. Select the option from the dropdown menu that opens.

    Select the option

Example: remove duplicate entries

One of the common data cleaning tasks is to remove duplicate entries to prevent biased results from your analysis.

You can use Data Wrangler to transform your data through the interface. Data Wrangler will automatically generate the Python code required for the removing of duplicates.

  1. Open Data Wrangler.

  2. Select Remove duplicates from the list of Transformations.

  3. Select the column from the Column drop-down list.

  4. Check the generated code.

    Check code generated by Data Wrangler
  5. Click Apply.

  6. Click Export if you want to add a new code cell with generated code to your notebook, copy your code to the clipboard, or save transformed data as a file.

Last modified: 15 April 2025