Process Predicted Models

The application of machine learning techniques to protein folding prediction, has dramatically improved with the development of Alphafold 2 and the predicted models are now accurate enough to be considered highly useful for determining protein structures.

The model files that are currently produced though AI driven procedures, Alphafold 2 and Rosettafold, are often not directly compatible with current phasing programs so the ‘Process Predicted Models’ interface is provided, which creates estimated b-values for the models and also carries out some limited model processing. The methods which are used for this task can be found in the CCTBX library, which is used by i2.

Input Data

The main input to the task is a model file, that will have typically been generated by Alphafold 2 or Rosettafold, from a given sequence file. The user, should after importing the model file, set the input column type appropriately (for standard files from Alphafold 2 this is typically the pLDDT and for Rosettafold files it will be rmsd). The user can also choose to use a file which already contains b-factors in order to remove low confidence regions and/or split the model. Users should note that if they have a file with multiple models, the first model in the file (which by standard convention is typically the best estimate) will be used.

Figure 1: Input data panel.

Figure 1: Input data panel.

It is very important that the appropriate setting is used for the b-factor conversion, if you have this set incorrectly for the file the conversion will not work. If you see an error reported, this may be the reason.

There is also the possibility to add additional files, which can provide more information on the model. A PAE file (Predicted Alignment Error) can be given, which provides errors estimates for residues - relative to the other residues in a structure. A distance model file may also be provided - if such a file is given the user is advised that weighting by C alpha distance should be switched on in the additional settings tab.

The user may also, if required, turn off the options which remove low confidence regions of the model, and split parts of the trimmed model into regions. The intended size of these regions can be tuned with parameters, which can be found under the additional settings tab.

Additional Settings

Figure 2: Additional Settings tab.

Figure 2: Additional Settings tab.

In addition to the basic settings described above it is also possible to fine-tune the treatment of the resulting model files. Fine tuning of domain selection can be controlled using these settings, as well as the confidence limits for resolution.

There are also two boxes which cover settings for when when a PAE file is provided, and also for the distance model file.

Results

The output files are given as a set of pdb model files. Note that there can be numerous files depending on the settings and how the model has been split up into domains.

Reference

The methodology used to convert the pLDDT and rmsd values (output by AlphaFold 2 and RoseTTAFold) into b-factors for use in MR) is described in the RoseTTAFold Paper.

  1. Accurate prediction of protein structures and interactions using a 3-track network., Baek M., et al., Science, Vol.373, Issue 6557, pp871-876 (2021).