Quick Scan a new Project

These days we were asked to do a quick analysis of an existing code base. Since we had no idea of the code base at all we started with gathering some numbers first.

What’s inside the root directory of the project?

$ ls -1
documentation
php-gui
php-parser
scripts
…

At first glance the primary programming language is PHP.

How many files are in the repository / sub-directories ?

$ find . -type f | wc -l
17780
$ for file in $(ls) ; do ; echo $file ; find $file -type f | wc -l ; done
documentation
     770
php-gui
   11834
php-parser
     140
scripts
     186

Ok. So the majority of source files seems to be inside php-gui and some in php-parser. We’ve got some documentation and scripts.

Same query for better readability as shell script:

for file in $(ls)
do
  echo $file
  find $file -type f | wc -l
done

How many different file types does the repository contain?

We are not the first interested in such kind of information. List all unique extensions for files contained in a directory

$ find . -type f | sed -E 's/.+[\./]([^/\.]+)/\1/' | sort -u | wc -l
240

Impressive, but not a very useful information. The main language seems to be PHP. How many PHP source files are in php-parser and php-gui ?

$ find . -type f -name "*.php" ¦ wc -l 
    4590

More than 4500 PHP source files in php-gui.

Any PHPUnit tests in php-parser or php-gui?

$ find php-parser -type f -name "*php" -exec grep -H -n 'phpunit' {} \;

negative. In the Zend Framework powered php-gui?

$ find php-gui -type f -name "*php" -exec grep -H -n 'phpunit' {} \; | grep -v ZendFramework | wc -l
   3

Ok

How large are those source files (excluding the big input files)?

$ ls -lR **/*.php | grep '^-' | grep -v input | sort -k 5 -rn | head
…

I bet you’ve got other/more ideas to gather first insights of an existing project - just leave a comment below…thanks!