The application of machine learning techniques to protein folding prediction, has dramatically
improved with the development of Alphafold 2 and the predicted models are now accurate enough to
be considered highly useful for determining protein structures.
The model files that are currently produced though AI driven procedures, Alphafold 2 and Rosettafold,
are often not directly compatible with current phasing programs so the ‘Process Predicted Models’
interface is provided, which creates estimated b-values for the models and also carries out some
limited model processing. The methods which are used for this task can be found in the CCTBX library,
which is used by i2.
The main input to the task is a model file, that will have typically been generated by Alphafold 2
or Rosettafold, from a given sequence file. The user, should after importing the model file, set the
input column type appropriately (for standard files from Alphafold 2 this is typically the pLDDT and for
Rosettafold files it will be rmsd). The user can also choose to use a file which already contains b-factors
in order to remove low confidence regions and/or split the model. Users should note that if they have a
file with multiple models, the first model in the file (which by standard convention is typically the best
estimate) will be used.
It is very important that the appropriate setting is used for the b-factor conversion, if you have this set
incorrectly for the file the conversion will not work. If you see an error reported, this may be the reason.
There is also the possibility to add additional files, which can provide more information on the model.
A PAE file (Predicted Alignment Error) can be given, which provides errors estimates for residues - relative
to the other residues in a structure. A distance model file may also be provided - if such a file is given
the user is advised that weighting by C alpha distance should be switched on in the
additional settings tab.
The user may also, if required, turn off the options which remove low confidence regions of the model, and
split parts of the trimmed model into regions. The intended size of these regions can be tuned with
parameters, which can be found under the additional settings tab.
In addition to the basic settings described above it is also possible to fine-tune the treatment of the
resulting model files. Fine tuning of domain selection can be controlled using these settings, as well as
the confidence limits for resolution.
There are also two boxes which cover settings for when when a PAE file is provided, and also for the
distance model file.
The output files are given as a set of pdb model files. Note that there can be numerous files depending
on the settings and how the model has been split up into domains.
The methodology used to convert the pLDDT and rmsd values (output by AlphaFold 2 and RoseTTAFold) into b-factors
for use in MR) is described in the RoseTTAFold Paper.
Accurate prediction of protein structures and interactions using a 3-track network., Baek M., et al., Science, Vol.373, Issue 6557, pp871-876 (2021).