Create SparkSession to start PySpark
Create DataFrame from list or RDD
Load CSV file into DataFrame
Execute SQL queries
spark.sql()
Register DataFrame as temp view and run SQL queries.
Register DataFrame as SQL table
df.createOrReplaceTempView("people")
Run SQL query
spark.sql("SELECT * FROM people WHERE age > 25").show()
Display rows of DataFrame
Print schema of DataFrame
Select specific columns from DataFrame