Usage

The program can be run from terminal, with the command:

python -m tscode myinput.txt -n [custom_title]

A custom name for the run can be optionally provided with the -n flag, otherwise a time stamp will be used to name the output files.

It is also possible, for simple runs, to avoid creating an input file, and writing instruction in a string after the -cl/--command_line argument:

python -m tscode -cl "csearch> molecule.xyz"
python -m tscode --command_line "csearch> molecule.xyz"

In this case, an input file will be written for you by the program.

Input formatting

The input can be any text file. The extension is arbitrary but I suggest sticking with .txt.

Any blank line will be ignored
Any line starting with # will be ignored
Keywords, if present, need to be on first non-blank, non-comment line

Then, molecule files are specified. A molecule line is made up of these elements, in this order:

An operator (optional)
The molecule file name (required)
Indices (numbers) and pairings (letters) for the molecule (optional)

An example with all three is opt> butadiene.xyz 6a 8b.

TSCoDe can work with all molecular formats read by cclib, but best practice is using only the .xyz file format, particularly for multimolecular files containing different conformers of the same molecule. Molecule indices are counted starting from zero!

Operators

The first step of every run is the execution of the requested operators. See the operators page to see the full set of tools available.

Embedding runs

Then, if the input you provided is consistent with an embedding, one will be carried out. Embeddings can be of six kinds:

monomolecular - One molecule, two reactive atoms (i.e. Cope rearrangements)
dihedral - One molecule, four reactive atoms (i.e. racemization of BINOL)
string - Two molecules, one reactive atom each (i.e. SN2 reactions)
chelotropic - Two molecules, one with a single reactive atom and the other with two reactive atoms (i.e. epoxidations)
cyclical (bimolecular) - Two molecules, two reactive atoms each (i.e. Diels-Alder reactions)
cyclical (trimolecular) - Three molecules, two reactive atoms each (i.e. reactions where two partners are bridged by a carboxylic acid like the example above)

Reactive atoms supported include various hybridations of C, H, O, N, P, S, F, Cl, Br and I. Many common metals are also included (Li, Na, Mg, K, Ca, Ti, Rb, Sr, Cs, Ba, Zn), and it is easy to add more if you need them (from reactive_atoms_classes.py).

Embeds Infographic — *Colored dots represent the imposed atom pairings.*

Pairings

After each reactive index, it is possible to specify a pairing letter representing the “flag” of that atom. If provided, the program will only yield poses that respect these atom pairings. It is also possible to specify more than one flag per atom, useful for chelotropic embeds - i.e. the hydroxyl oxygen atom of a peracid, as 4ab.

Good practice and suggested options (work in progress)

When modeling a reaction through TSCoDe, I suggest following these guidelines. Not all of them apply to all embed types, but they will surely help in leveraging the program in the best way.

1) Assess that the reaction is supported by TSCoDe, and plan on what the input will look like. See Input formatting above for help.

1) Obtain molecular structures in .xyz format. If more conformers are provided, they must be in a multimolecular .xyz file, and atom ordering must be consistent throughout all structures. Otherwise, they will just be skipped by the module used to read molecular files (cclib).

2) If a given molecule is present in the transition state, but it is not strictly involved in bonds breaking/forming, then that molecule needs to be pre-complexed to the one with which it is interacting. That is, the bimolecular complex should be used. This can be the case for non-covalent catalysts. For example, if the reaction between a ketone and a metal enolate is catalyzed by a thiourea that activates the ketone carbonyl group, then the TSCoDe modelization of the reaction should be bimolecular. The first molecule is the ketone-thiourea interaction complex while the second one is the metal enolate.

3) Make sure to use the csearch> and/or csearch_hb> operators or provide conformational ensembles obtained with other software. Note that the CSearch implementation here is meant to be fast, scalable, and efficient, and is not able to sample ring conformations. The most effective conformational search tool (although usually slower) is mtd_search> which runs a metadynamic-based conformational search through CREST.

4) Understand what atoms are reacting for each structure and record their index (starting from 0!). If you are unsure of reactive atomic indices, you can run a test input without indices, and the program will ask you to manually specify them from the ASE GUI by clicking. This is not possible if you are running TSCoDe on STPs with no desktop access. When choosing this option of manually picking atoms, it is not possible to specify atom pairings. Therefore, I suggest using this option only to check the reactive atoms indices and then building a standard input file.

5) Optionally, after specifying reactive indices, the CHECK keyword can be used. A series of pop-up ASE GUI windows will be displayed, showing each molecule with a series of red dots around the reactive atoms chosen. This can be used to check “orbital” positions or conformer reading faults (scroll through conformers with page-up and down buttons). Program will terminate after the last visualization is closed.

6) TSCoDe default parameters are tentatively optimized to yield good results for the most common situations. However, if you have more information about your system, specifying details of the pairings and options for your system is likely to give better results. For example, embedding trimolecular TSs without imposed pairings generates about 8 times more structures than an embed with defined pairings. Also, if reactive atoms distances in the transition state is known, using the DIST keyword can yield structures that are quite close to ones obtained at higher levels of theory. If no pairing distances are provided, a distance guess is performed based on the atom type (defaults are editable in the parameters.py file).

7) If the reaction involves big molecules, or if a lot of conformations are to be used, a preliminar run using the NOOPT keyword may be a good idea to see how many structures are generated and would require optimization in a standard run.

8) If TSCoDe does not find any suitable candidate for the given reacion, this could be bacause of different factors. An option is to try the SHRINK keyword (see keywords section). Note that SHRINK optimization could be a little slower, as distance refinement through ASE will require more iterations to reach the target distances. Note that the use of RIGID, while speeding up cyclical embeds considerably, could jeopardize finding some transition state arrangements.

9) If you are not sure about what to do, or have any other questions I will be happy to have a chat with you. Send me an email here.