![]() ![]() (Why I used a shell loop instead of xargs, which would've been much faster: I need to prefix each line of the output with the name of the current file. doneīash loop for each line of the list of file paths, put the path into $l and do the thing in the loop. Gives a recursive list of filenames with paths relative to current while read l do. This is absolutely horrible and very slow I'm certain there's a better way and I hope someone can improve on it - but I was in a hurry :P I needed to do this recursively, and here's what I came up with: find -type f | while read l do iconv -s -f utf-16le -t utf-8 "$l" | nl -s "$l: " | cut -c7- | grep 'somestring' done ![]() : Little-endian **UTF-16 Unicode text**, with CRLF line terminators I use this one all the time after dumping the Windows registry as its output is unicode. This searches for the hex version of the string Test (in utf-16) in the file test.txt Also this won't work if the utf-16 in your binary file is stored in a different endianness than your machine.ĮDIT2: Got it!!!! grep -P `echo -n "Test" | iconv -f utf-8 -t utf-16 | sed 's/.//' | hexdump -e '/1 "x%02x"' | sed 's/x/\x/g'` test.txt Unfortunately I think this will end up printing out the ENTIRE file if there is a single match. This is then piped into hexdump so that the query and the input are the same. This is then piped into sed to remove the BOM (the first two bytes of a utf-16 file used to determine endianness). Grep is using a query that is constructed by echoing your query (without a newline) into iconv which converts it to utf-16. How does it work? Well it converts your file to hex (without any extra formatting that hexdump usually applies). ![]() I can only conclude that grep is converting my query to ascii.ĮDIT: Here's a really really crazy one that kind of works but doesn't give you very much useful info: hexdump -e '/1 "%02x"' test.txt | grep -P `echo -n Test | iconv -f utf-8 -t utf-16 | sed 's/.//' | hexdump -e '/1 "%02x"'` If test.txt is a utf-16 file this won't work, but it does work if test.txt is ascii. Here is what I tried: grep `echo -n query | iconv -f utf-8 -t utf-16 | sed 's/.//'` test.txt It seems as though grep will convert a query that is utf-16 to utf-8/ascii. I think it might have to do with endianness, but I'm not sure. ![]() I tried to do the opposite (convert my query to utf-16) but it seems as though grep doesn't like that. But it lists no comparable feature for zsh.The easiest way is to just convert the text file to utf-8 and pipe that to grep: iconv -f utf-16 -t utf-8 file.txt | grep query The one most commonly used does its work in libmagic, which can be used from different programs (perhaps not directly from zsh, though python can).Īccording to File test comparison table for shell, Perl, Ruby, and Python, Perl has a -T option which it can use to provide this information. There is more than one implementation of file. The output of file requires some tuning in either scenario, and is not 100% reliable (it is confused by several of my Perl scripts, calling them "data"). Arguably, matching the "xml" on the end of the mime-type output could be more useful, say, than matching "SVG", but using a script to do that takes you back to the suggestion made here. Which I selected after seeing a thousand files show only 6 with "text" Vile_48x48.svg: SVG Scalable Vector Graphics image Vile-mini.svg: SVG Scalable Vector Graphics image Sink_48x48.svg: SVG Scalable Vector Graphics image Pumpkin_48x48.svg: SVG Scalable Vector Graphics image However, in a test I see these results for svg-files: $ ls -l *.svg A followup commented that the -mime-type works while this approach would not, for. But a script should check for "text" as a word, not a substring.Īs a reminder, file output does not use a precise description which would always have "script" or "text". Likewise script could be part of a word, but I see no problems in this case. SoftQuad troff Context intermediate for HP LaserJet SoftQuad troff Context intermediate for AT&T 495 laser printer Some use the string "text" as part of a different type, e.g., SoftQuad troff Context intermediate Linux Software Map entry text (new format) Just checking strings on a copy of libmagic, I see about 200 cases, e.g., Konqueror cookie text Though of course there may be many special cases which are of interest. You can write a script that calls file, and use a case-statement to check for the cases you are interested in. ![]()
0 Comments
Leave a Reply. |