ZTree.com  | ZEN  | About...  

 Index   Back

[Discuss] Detecting non-ansi characters ???   [Discuss]

By: Ben Kent       
Date: Oct 16,2023 at 20:57
In Response to: [Discuss] Detecting non-ansi characters ??? (Bob Selby)

> Hi, I have a large bunch of textfiles (many very large) and
> somewhere in that lot is a non-ansi character that a program is
> complaining about and I am struggling to find it.
> Can anyone suggest a way to find the little b****r ??
> Typically the program gives no clue as to which file or position :-(
> I'm running under Win10.
> Bob

Bob

You don't define what you mean by "non-ansi character", maybe you mean non-ASCII, i.e. greater than char 127.

As you say some of the files are large, PowerShell cannot be used as it would be too slow.

I just wrote this using Visual Studio Code
http://www.ztw3.com/upfile/countchars.zip

The exe takes a file name on the commandline and reports the counts by character number, (single character).
Assuming only a few characters greater than char 127, you could convert the character number to hex, and do a hex search in ZTree.

The code could be tweaked to give a more specific report, depending on your requirements.

Another way to find the files (but not the specific chars), would be to do regular expression search for "[^\x00-\x7f]", although I couldn't get that to work with the commandline RE tools I have.

The nearest that I could get to work with the grep that comes with Visual Studio Code is this which allows, line endings, tab and the "printable" characters space to tilde
type file | grep.exe -v -x "[\t -~]*"


Ben

124 views      
 

Messages in this Thread

 
96,565 Postings in 12,213 Threads, 350 registered users, 49 users online (0 registered, 49 guests)
Index | Admin contact |   Forum Time: Dec 4, 2023 - 12:37 am UTC  |  Hits:59,687,918  (359 Today )
RSS Feed