Programming Funda

Programming Funda

Share

Top 30 PySpark DataFrame Methods with Example 08/09/2024

✅ 30+ PySpark DataFrame Methods Crash Course for Data Engineers
==============================================

Hello PySpark Developers, Here I have listed some of the PySpark useful DataFrame methods that are very helpful in real-life PySpark applications.

Let's start! 👇

1. show()

The show() method is used to display the contents of the DataFrame. By default, it shows the top 20 rows.

df.show()

2. select():- The select() method allows you to select specific columns from a DataFrame.

new_df = df.select("first_name", "last_name", "age")
new_df.show()

3. filter() or where(): The filter() or where() method is used to filter rows that meet certain conditions.

from pyspark.sql.functions import col
new_df = df.filter(col("age") > 25)
new_df.show()

from pyspark.sql.functions import col
new_df = df.where(col("age") > 25)
new_df.show()

4. groupBy() and agg():- The groupBy() method is used to group data based on one or more columns, and agg() allows you to perform aggregation functions on grouped data.

from pyspark.sql.functions import avg
new_df = df.groupBy("department").agg(avg("salary").alias("average_salary"))
new_df.show()

5. withColumn(): The withColumn() method is used to add or modify a column in the DataFrame. For example, I want to add 5 to each employee’s age value.

from pyspark.sql.functions import col
new_df = df.withColumn("modified_age", col("age") + 5).select(
"first_name", "last_name", "modified_age"
)
new_df.show()

These are some Methods but you can get all 30+ PySpark DataFrame methods in the below tutorial.

💯Access this tutorial:- https://www.programmingfunda.com/top-30-pyspark-dataframe-methods-with-example/

Leave your suggestions in the comment 💬

💯Join Python | Big Data | Data Engineering | Data Science | Django | Programming for more Free Data Engineering and Data Analysis content.

Thanks

Happy Learning ... 🙏

Top 30 PySpark DataFrame Methods with Example In this article, We will see the Top 30 PySpark DataFrame methods with example. Being a Data Engineer, Data Analyst, or PySpark Developer you must know the

A Comprehensive Guide to Pandas Data Structures 25/08/2024

If you are a beginner in Python Pandas then this tutorial is going to be very helpful for you because throughout this article I have explained all about the Pandas data structures with examples.

A Comprehensive Guide to Pandas Data Structures Hi Pandas lovers, In today's article I will talk about Pandas data structures which are essential to the Pandas. You can say data structures in Pandas are the

Want your business to be the top-listed Computer & Electronics Service in Noida?
Click here to claim your Sponsored Listing.

Address


Noida
201301