Script to remove a huge amount of files in linux

This is a quick entry, I’m going to share the script that I have used to remove more of 10 millions of files in a linux system.

I had this problem, in a directory the amount of files is very excesive that I can’t list, count or even get the size of the entire directory, and that directory is used for some processes on every execution it is impossible to get the infomation from there.

In stackexchage I have found some advice to do this but nothing works for me, I used the rsync command, the find command with a recursve actions, but doesn’t work, the server takes a lot of time to list files or an error is showed.

So, I had to create this script to delete by a set number of files (every 10000 files):

#!/bin/bash
TARGET_PATH=/path/to/dir
while [ "$(ls -U $TARGET_PATH | head -1)" ]
do
      for i in "$(ls -U $TARGET_PATH | head -10000)"
      do
        cd $TARGET_PATH
        rm -r $i
      done
      echo "executed deletion on:"+$(date)
done

References

Making git report to get commits

Here is a quick tip, I’ve to create a report in my work to know how many commits we’ve into a certain period by repository and person (commiter) and show in excel with a chart.

I can’t download the data from our git server because the rest API is disabled, then we’ve to do it manually and I’ve created an script(bat file) to extract the data, but is needed execute per repository.


@echo off
SETLOCAL
for %%f in (%CD%) do set dirname=%%~nxf
set dir_name="%dirname%"
set "TAB= "
git fetch
git log --pretty=format:"%dir_name%%TAB%%%C(yellow)%%h%TAB%%%C(cyan)%%ci%TAB%%%cn" --since="2020-01-01 00:00:00" --before="2020-06-31 23:59:59"
echo .

This can be executed outside of the each repository and aggregate the results in a text file:


cd C:\repositories\repo1

"C:\scripts\getCommits.bat" >> "C:\Users\jaehoo\Desktop\out.txt"

cd C:\repositories\repo2

"C:\scripts\getCommits.bat" >> "C:\Users\jaehoo\Desktop\out.txt"

The result is separated by tab spaces and it could be read in excel:

2020-05-14 09_51_33-Libro3 - Excel

And the pivot table can help to count and show the data.

2020-05-14 09_53_37-Libro3 - Excel

I leave here some variances of the command, may be It could be usefully:


git log --pretty=format:"%C(yellow)%h %C(cyan)%ci %cn" --since="2020-01-01 00:00:00" --before="2020-06-31 23:59:59"

git log --graph --pretty=format:"%cn %m %cs %C(yellow)%h%x09%Creset%C(cyan)%C(bold)%ad%Creset %C(green)%Creset %s" --date=short

git log --pretty=format:"%C(yellow)%h %C(cyan)%ci %cn" ^
--since="2020-01-01 00:00:00" ^
--before="2020-06-31 23:59:59" ^
--author="jaehoo"

git shortlog --since="2020-01-01 00:00:00" --before="2020-06-31 23:59:59" --summary --numbered --email

Saludos

References

https://mirrors.edge.kernel.org/pub/software/scm/git/docs/git-log.html#_pretty_formats