Remove empty and duplicates lines in a file


How to remove empty and duplicate lines in a file?

The objective of this Python function is to read the contents of a text file, remove any empty lines and any duplicate lines, and then overwrite the file with the updated content that no longer contains empty lines or duplicate lines.

Ready-to-use Python function to empty or blank lines and duplicate lines in a file:

def remove_empty_duplicate_lines_text_file(txtFile):
    # Remove empty lines and duplicate lines from the file
    output = ''
    lines_seen = []
    
    with open(txtFile, 'r', encoding="utf-8") as file:
        for line in file:
            if not line.isspace() and line.replace("\n", "") not in lines_seen:
                output+=line
                lines_seen.append(line.replace("\n", ""))
    if output[-1] == '\n': output = output[:-1] # remove newline character if exists at end of variable

    file = open(txtFile, 'w+', encoding="utf-8")
    file.write(output)
    file.close()

Write your main code as a sample below,

remove_empty_duplicate_lines_text_file("texts to remove blank and duplicate lines.txt")

The output of the code is (check the text file after executing the code),


Text file content before executing the function:

import requests

# Define headers
# Define the API endpoint
url = "https://api.kite.trade/instruments" 
# Define headers

# Define headers

Text file content after executing the function:

import requests
# Define headers
# Define the API endpoint
url = "https://api.kite.trade/instruments" 
How does the function work?

The Python function remove_empty_duplicate_lines_text_file takes one argument, txtFile, which is a string representing the path to a text file. The function’s goal is to read the contents of the file, remove any empty lines and any duplicate lines, and then overwrite the file with the updated content that no longer contains empty lines or duplicate lines.

Here is a breakdown of how the function works:

  1. The function initializes an empty string called output to hold the updated content of the text file.
  2. The function also initializes an empty list called lines_seen to keep track of the lines that have already been seen.
  3. The with statement is used to open the text file specified in txtFile and create a file object called file.
  4. The for loop iterates over each line in the file.
  5. The first if statement checks if the line is not a whitespace character, using the isspace() method of the string.
  6. The second if statement checks if the line, with the newline character removed using the replace() method, is not in the lines_seen list. If the line is not empty and has not already been seen, it is appended to the output string and added to the lines_seen list.
  7. After the loop has finished, the function checks if the last character of output is a newline character, and if so, removes it from the output string.
  8. The function reopens the file with write and read permissions using open() function with ‘w+’ argument. The encoding is specified as ‘utf-8’.
  9. The updated content of the file, which is stored in the output string, is written to the file object using the write() method.
  10. Finally, the function closes the file object with the close() method.

In summary, this function takes a text file, reads its content, removes any empty lines and duplicate lines, and overwrites the file with the updated content that does not contain any empty lines or duplicate lines.

Leave a Reply

Your email address will not be published. Required fields are marked *