Here's one way to perform a null safe equality comparison: df.withColumn(. Connect and share knowledge within a single location that is structured and easy to search. With your data, this would be: But there is a simpler way: it turns out that the function countDistinct, when applied to a column with all NULL values, returns zero (0): UPDATE (after comments): It seems possible to avoid collect in the second solution; since df.agg returns a dataframe with only one row, replacing collect with take(1) will safely do the job: How about this? Sort the PySpark DataFrame columns by Ascending or Descending order, Natural Language Processing (NLP) Tutorial, Introduction to Heap - Data Structure and Algorithm Tutorials, Introduction to Segment Trees - Data Structure and Algorithm Tutorials. 566), Improving the copy in the close modal and post notices - 2023 edition, New blog post from our CEO Prashanth: Community is the future of AI. 2. Actually it is quite Pythonic. The take method returns the array of rows, so if the array size is equal to zero, there are no records in df. Column
pyspark check if column is null or empty
Share