Programming Funda
08/09/2024
✅ 30+ PySpark DataFrame Methods Crash Course for Data Engineers
==============================================
Hello PySpark Developers, Here I have listed some of the PySpark useful DataFrame methods that are very helpful in real-life PySpark applications.
Let's start! 👇
1. show()
The show() method is used to display the contents of the DataFrame. By default, it shows the top 20 rows.
df.show()
2. select():- The select() method allows you to select specific columns from a DataFrame.
new_df = df.select("first_name", "last_name", "age")
new_df.show()
3. filter() or where(): The filter() or where() method is used to filter rows that meet certain conditions.
from pyspark.sql.functions import col
new_df = df.filter(col("age") > 25)
new_df.show()
from pyspark.sql.functions import col
new_df = df.where(col("age") > 25)
new_df.show()
4. groupBy() and agg():- The groupBy() method is used to group data based on one or more columns, and agg() allows you to perform aggregation functions on grouped data.
from pyspark.sql.functions import avg
new_df = df.groupBy("department").agg(avg("salary").alias("average_salary"))
new_df.show()
5. withColumn(): The withColumn() method is used to add or modify a column in the DataFrame. For example, I want to add 5 to each employee’s age value.
from pyspark.sql.functions import col
new_df = df.withColumn("modified_age", col("age") + 5).select(
"first_name", "last_name", "modified_age"
)
new_df.show()
These are some Methods but you can get all 30+ PySpark DataFrame methods in the below tutorial.
💯Access this tutorial:- https://www.programmingfunda.com/top-30-pyspark-dataframe-methods-with-example/
Leave your suggestions in the comment 💬
💯Join Python | Big Data | Data Engineering | Data Science | Django | Programming for more Free Data Engineering and Data Analysis content.
Thanks
Happy Learning ... 🙏
Top 30 PySpark DataFrame Methods with Example In this article, We will see the Top 30 PySpark DataFrame methods with example. Being a Data Engineer, Data Analyst, or PySpark Developer you must know the
25/08/2024
If you are a beginner in Python Pandas then this tutorial is going to be very helpful for you because throughout this article I have explained all about the Pandas data structures with examples.
A Comprehensive Guide to Pandas Data Structures Hi Pandas lovers, In today's article I will talk about Pandas data structures which are essential to the Pandas. You can say data structures in Pandas are the
Click here to claim your Sponsored Listing.
Category
Contact the business
Address
Noida
201301