Extracts portions of the data from an mzML, featureXML or consensusXML file.
pot. predecessor tools | FileFilter | pot. successor tools |
any tool yielding output
in mzML, featureXML
or consensusXML format | any tool that profits on reduced input
|
With this tool it is possible to extract m/z, retention time and intensity ranges from an input file and to write all data that lies within the given ranges to an output file.
Depending on the input file type, additional specific operations are possible:
- mzML
- extract spectra of a certain MS level
- filter by signal-to-noise estimation
- filter by scan mode of the spectra
- featureXML
- filter by feature charge
- filter by feature size (number of subordinate features)
- filter by overall feature quality
- consensusXML
- filter by size (number of elements in consensus features)
- filter by consensus feature charge
- filter by map (extracts specified maps and re-evaluates consensus centroid)
e.g. FileFilter -map 2 3 5 -in file1.consensusXML -out file2.consensusXML
If a single map is specified, the feature itself can be extracted.
e.g. FileFilter -map 5 -in file1.consensusXML -out file2.featureXML
- featureXML / consensusXML:
- remove items with a certain meta value annotation. Allowing for >, < and = comparisons. List types are compared by length, not content. Integer, Double and String are compared using their build-in operators.
- filter sequences, e.g. "LYSNLVER" or the modification "(Phospho)"
e.g. FileFilter -id:sequences_whitelist Phospho -in file1.consensusXML -out file2.consensusXML
- filter accessions, e.g. "sp|P02662|CASA1_BOVIN"
- remove features with annotations
- remove features without annotations
- remove unassigned peptide identifications
- filter id with best score of features with multiple peptide identifications
e.g. FileFilter -id:remove_unannotated_features -id:remove_unassigned_ids -id:keep_best_score_id -in file1.featureXML -out file2.featureXML
- remove features with id clashes (different sequences mapped to one feature)
The Priority of the id-flags is (decreasing order): remove_annotated_features / remove_unannotated_features -> remove_clashes -> keep_best_score_id -> sequences_whitelist / accessions_whitelist
The command line parameters of this tool are:
For the parameters of the S/N algorithm section see the class documentation there:
sn
- Todo:
- add tests for selecting modes (port remove modes) (Andreas)
- Improvement:
- MS2 and higher spectra should be filtered according to precursor m/z and RT. The MzMLFile, MzDataFile, MzXMLFile have to be changed for that (Hiwi) Currently when specifying mz or RT filters, they will also be applied to MS levels >=2 (not really what you usually want). To work around this, you need to extract the MS2 levels beforehand, do the filtering on MS1 and merge them back together.