Spark DataFrame coding assistance

The Spark plugin provides coding assistance for Apache Spark DataFrames in your Python code.

Completion for available columns

If you create a DataFrame or read it from a file, PyCharm will assist you in accessing the DataFrame columns, for example, while selecting or filtering DataFrames.

Column completion in PySpark

Detecting unresolved columns

If you refer to a column that doesn't exist in the DataFrame, PyCharm highlights it and suggests replacing it with one of the available column names.

You can enable and disable this inspection in the IDE settings (Ctrl+Alt+S), under Editor | Inspections | Spark | Unresolved columns.

Getting a schema

Completion of column names and the corresponding inspection are available if PyCharm can access the DataFrame schema. The schema can be specified in multiple ways:

  • Columns and their types are specified directly in the read method:

    df = ( .schema("name STRING, value BIGINT, planet STRING") .parquet("aliens.parquet")) .parquet("aliens.parquet"))
  • The schema is specified as a separate variable and then used in the read method:

    schema = StructType([ StructField("name", StringType(), False), StructField("value", LongType(), False), StructField("planet", StringType(), False), ]) df ="aliens.parquet")

If you have not specified schema in either of these ways, you can use the dedicated inlay hint to infer the schema from a Parquet file. The file can be located locally or on a remote storage.

Infer schema from a file

  1. Use the read.parquet() method in your Spark code, for example:

    df ="/myfilepath")
  2. Click the Choose schema inlay hint.

    Choose schema for dataframe
  3. In the window that opens, select a file from which the schema can be inferred.

    The schema inferred from the selected file will be displayed as an inlay hint next to the method. You can hover over it to preview the available columns and their types. And you can click it to insert the schema using the schema method or to select another one.

    DataFrame Schema

You can enable and disable this inlay hint in the IDE settings (Ctrl+Alt+S), under Editor | Inlay Hints | Other | Python | DataFrame analysis.

