ZTree.com  | ZEN  | About...  

 Index   Back

It's the Hard Links   [Discuss]

By: Ben Kent       
Date: Nov 25,2019 at 18:30
In Response to: It's the Hard Links (John Gruener)

> > I find that a partition holds 32GB+ of "Logged files" in
> > 24,255,205,376 "Used space".
> >
> > No compression is involved -- as far as I know...
> >
> > How to explain this?
>
> I'm surprised no one has questioned this before. Doesn't everyone see
> this discrepancy?
>
> I find that logging any entire Windows 10 system drive, with Junction
> logging disabled and running ZTree as Administrator, the Logged files
> will always exceed the Used space by at least 8 GB, and as much as 12 GB.
> And this is without logging the several directories to which Windows
> denies access. If run as non-Admin, some 60 or so directories will not be
> logged, reducing the difference by about 3 GB.
>
> So what's causing this? As Ben mentions, NTFS Compressed and Sparse
> files take up less space than what ZTree logs, so that accounts for some
> of it. But I think the existence of multiple Hard Links accounts for most
> of the difference. Windows 10 (and to a lesser extent previous versions
> of Windows) makes extensive use of additional hard links to the same
> file. About 80% of these have at least one link in C:\Windows\WinSxS
> subdirectories.
>
> Using Alt-Info in the WinSxS branch shows many files with two links,
> and I've found a few with as many as seven. This means that ZTree is
> logging those files seven times each. There is no practical way for ZTree
> to discover that it has already logged that same file in a different
> directory or by a different name.
>
> A useful tool to find all the Hard Links, Symbolic Links and Junction
> Points on a volume is NirSoft's NTFSLinksView:
> https://www.nirsoft.net/utils/ntfs_links_view.html
> As with most NirSoft utilities, it runs stand-alone without
> installation.
> Scanning C:\ with infinite depth can take 10 minutes or more, but will
> yield a complete list sortable by name, path, link type or created time.
> The bottom status line shows the count found.
>
> I've run this on several Windows 10 system drives, finding at least
> 90,000 and as many as 160,000 hard links counted on each. However, this
> count is a bit misleading since it lists all the hard links to any file
> that has more than one. So each file with one extra hard link shows two
> link lines, and those with multiple extra links have many more. For
> example, a file with four links will have twelve lines since each of the
> four will have a line entry pointing to the other three. ZTree will have
> logged such a file four times. For this reason it's difficult to
> determine just how many actual files these lines represent, and how many
> times ZTree will have logged them.
>
> Nevertheless, while that listing is greatly inflated, it does point out
> that there are tens of thousands of files logged more than once by ZTree.
> For a Windows 10 system drive I'm guessing there are usually at least
> 30,000 files logged more than once.
>
> Another tool that gives quick info about a Hard Link is SysInternals
> FindLinks:
> https://docs.microsoft.com/en-us/sysinternals/downloads/findlinks
> Place it in your F9 Menu script to quickly find all the links to a
> highlighted file. This is designed to list the additional links other
> than the one selected, so the number shown does not count the selected
> link. Also be aware it will truncate the results both left and right if
> the ZTree window is not wide enough for the path.
>
> I think the only way for a program to decipher the actual number of
> physical files on an NTFS drive and the space they take would be to track
> the actual cluster location of each file, then not count links that point
> to an already-located file.

John

Maybe log all the files, including their ID (sort of equivalent to unix inode), then directory entries with the same ID will be hard links to the same file.
I guess that is what NTFSLinksView is doing when searching for hard links.
i.e. you need a way to log the files and not just the directory entries.

Some hints at https://stackoverflow.com/questions/7162...windows-have-inode-numbers-like-linux

I think it works like this
Files exist outside of the directory structure, they have metadata like permissions and dates, which information is stored with the file or in the directory entry I expect depends on the file system. Files have a reference count, so when deleting directory entries if the reference count gets to zero then the file is deleted. Directories are just a type of file. Directory entries store some information and a pointer to the file object.

ZTree could be extended to log the ID, then it could quickly show you the other hard links (directory entries) that point to the same file, maybe as an extension to Alt-Info, but as few people would use the feature, and everyone would have to pay the extra logging memory price that it would need, I expect it's best left to other tools. Also I think (needs to be confirmed) that at least some of the API's need admin rights, and we shouldn't force people to run ZTree elevated.

I remembered the everything tool https://www.voidtools.com/faq/, that tool needs admin rights, as given the way it works, it probably knows about the ID's but it doesn't seem to expose it in the GUI or *dupe: commands, doing this search "path:c:\* sizedupe: dmdupe:" gets you partly there, but my system has 400K matching objects and looking at the results many of them are copies in users profiles and not hard links.
Maybe someone could raise an everything feature request for the ID/inode and link count being added as possible details view columns and for a new iddupe: filter to be added.



Ben

64 views      
 

Messages in this Thread

 
94,666 Postings in 11,937 Threads, 348 registered users, 25 users online (0 registered, 25 guests)
Index | Admin contact |   Forum Time: Dec 14, 2019 - 3:14 am EST  |  Hits:29,409,174  (813 Today )
RSS Feed