Chemists routinely synthesize huge numbers of different compounds on the basis of parallel synthesis and combinatorial chemistry, which must be verified. Confirming chemical identity and molecular formulae information helps identify possible impurities, typically undertaken by a screening approach to obtain rough estimates which highlight purity levels or incorrect identity.
It is often impossible to use all samples at the same time; requiring monitored storage. The multiple spectroscopic techniques used all demand manual inspection of the spectra, a very cost-intensive process that requires a breadth of human expertise in spectral interpretation that hard to come by. The ideal solution would be automated and offer integration of the different methods.
Complete molecular confidence (CMC) is a concept for a fully-integrated NMR/LC-MS based solution optimally complemented by X-ray spectroscopy, supporting molecular formulae determination and automated structure verification for small molecules and natural products. CMC assists the analysis on data combined from the information-rich spectroscopic techniques and will finally output the molecular profile in the form of a compact report delivering a probability for the verification of the proposed molecular structure, approximate purity information and quantity of the sample.
From Mass Spectrum to Molecular Formula
This signals a new dimension in compound verification. Molecular formulas can be calculated directly from the MS spectrum. Because mass spectra do not automatically convey elemental information, data analysis tools are necessary to extract the information inherent in MS spectra and provide molecular formula candidates. The software program typically combines measured mass information and expected mass accuracy with valence and electronic configurations to produce a list of potential candidates near the measured mass. Mass accuracy is an essential parameter to limit the number of potential candidates. Figure 1 shows the plot of the number of possible sum formulae generated by a generic formula generator using maximum composition (C200H400N50O50S5P5) against the compound mass.
Unique ESI-TOF Technology
Ideally, an instrument that can provide both high mass accuracy and stable, true isotopic pattern (TIP) information could provide greater information content. This technical concept allows for a two-dimensional analytical method: a combination of accurate mass determination with the analysis of the isotopic distribution. Combining both complementary data is essential to find the correct formula for the elemental composition.
Maximum certainty in small molecule identification requires a resolution typically of 15,000–20,000 at a high acquisition speed of 20 spectra/s is mandatory to cope with ultra fast chromatography systems. Mass resolution and mass accuracy have to be maintained in all scan modes and speeds; in MS as well as in MS/MS. The outstanding dynamic range is related to the fast repetition rate of the TOF (5,000–20,000 Hz) and the adequate analogue-to-digital-conversion (ADC) technique. This allows exact mass determination over the whole dynamic range as it is not compromised by dead-time effects found in the more common time-to-digital-conversion (TDC).
The ADC technique combined with high resolution TOF-MS is also important for the accuracy of the relative intensities of the isotopic peaks. In MS/MS spectra generated with the micrOTOF-Q II, the isotopic pattern information and the accuracy is also retained in the fragment ions. Sum formula proposals can be made for the fragment ions in the same way as for MS spectra, which add a third level of confidence.
The SmartFormula approach considers the isotopic pattern distribution for MS spectra. After generation of a list of all possible formulae for a selected mass of an LC-MS peak, the measured isotopic pattern is compared with the theoretical isotopic pattern—resulting in a statistical match factor, the sigma factor (s). The sigma factor reported is simply the statistical variance between the measured and theoretical isotopic profile based on the intensity values of the peaks in the pattern. This comparison is done for all the generated molecular formulae. The sigma value is then used to put the formula.
Enhanced Probability-Based Concept
An isotopic pattern is described by three characteristic properties: the mass position of the peaks in the pattern; the peak intensities; and the peak distances within the pattern. Because of the precise isotopic patterns delivered by the MS instrument, it is possible to combine the mass position, the distance between the isotope peaks and their intensity into an integrated scoring with higher confidence for the individual hits. This procedure results in a scoring value which can be used to reduce the list of hypothetical formulas based on a true quality criterion, reflecting the quality of the matching property of the true isotopic profile for the individual molecular formula.
In many cases there are several molecular formulae with well-matched properties. It is not sufficient to consider only one hit with the best scoring factor; there are a few hits satisfying the overall criteria. The overall ranking of the formulae candidates has to be extended into a probability-based scoring concept, modelling the distributions of mass accuracy and true isotopic pattern matching for a number of possible candidates. Considering all of the generated formula candidates using a Bayesian statistical modelling of the deviations, a score value with a range of 0 to 100 can be derived.
Precision in Formula Generation: True Isotopic Pattern of Fragments
A new approach utilizes accurate measured mass and additionally accurate measured isotopic pattern (see note). It could generate a confident list of formulae simultaneously for the precursor ion and all fragment ions. In contrast to the already known algorithms, both candidate lists are already drastically pruned, thus reducing the time to evaluate all possible relationships between the potential precursor formulae and the related productions.
From Molecular Formula Determination to Automated Structure Verification
A typical task for an analytical service laboratory in the pharmaceutical industry is to answer three typical questions:
- Did I really synthesise the correct compound?
- What is the compound purity level?
- What are the impurities?
These can be answered using an automatic compound verification concept. Using this technique, the chemist only needs to provide the molecular formula of the anticipated compound.
NMR provides a complementary fully quantitative technique that yields the concentration as well as an estimate of purity. The same principle can be used to determine the water content in DMSO samples that, for example, originate from typical liquid-store repositories in pharmaceutical industry.
Using 100 mm long NMR tubes in well-plate format enables simple prep by a commercial liquid handler with an automated sample changer enables high throughput. The first step in the verification of a compound is the acquisition and evaluation of a 1H spectrum. For this analysis a 1H spectrum is predicted for the proposed structure based on the supplied MOL or SDF file using a chemical shift and spectral prediction tool. In the subsequent automatic consistency analysis (ACA), the predicted and experimentally acquired spectra are compared and, in an iterative process, the predicted spectrum is adapted to the experimental data. The likelihood that the experimental data correspond to the proposed structure is reported. This is based on the number of changes that were applied to the predicted spectrum and the similarity of iterated and experimental spectrum with respect to peak position, integrals and coupling constants. Signals arising from the solvent are also taken into account.
This analysis also works for more complex spectra, which can be shown in the example of strychnine. However, the limited chemical shift range of 1H data for larger or more complex molecules can lead to an overlap of signals. Conclusions, therefore, become increasingly uncertain. For this purpose, typically, an HSCQ spectrum is acquired. This separates out the signals in a second dimension and, in addition, provides larger spectral dispersion through the 13C information. In contrast to traditional 13C spectra, an HSQC experiment produces quick access to proton carrying 13C resonances with low sample concentration, thus making it the experiment of choice for screening methods.
The analysis of the 2D file is also automated. The predicted ranges and experimental data are overlayed and the detected signals are assigned to the predicted ranges. The result is a matching factor that describes the similarity of the experimental and theoretical data.
Quantification and Purity
In contrast to LC-MS the signal integral in an NMR spectrum correlates directly to the amount of detected compound as long as a few simple experimental parameters are respected. The only additional sample information required is the number of protons that contribute to the NMR signal used for the quantification.
With a proposed structure, this information is available through the predicted NMR spectra from the PERCH routine. Here an assignment of signal and structure element is automatically provided. Even without a structure the NMR spectrum, by itself, provides, in most cases, enough information to identify at least the structure element of one signal group. This is already sufficient for the quantification.
The CMC concept includes a solution package for computer-assisted compound quantification. This solution package includes all the necessary parameters for the setup, the acquisition, the processing and the analysis of the NMR spectra. It supports the use of samples of minute quantity stored in normal (that is, nondeuterated) DMSO, as it is most commonly used in compound libraries in pharma industry.
For a data interpretation, a human-logic-emulation based interpretation wizard is supplied, which analyses a 1D NMR spectrum in a way like a human spectroscopist would do it. This wizard assists data interpretation for compound quantification by suggesting proton numbers for certain signals in the spectrum and thus determines concentrations from parameter-free spectra interpretation without the need to know the chemical structure.
The analysis-assistant’s user interface allows one to browse through, check or modify these suggested results in a very simple way. At the end of such an analysis run a comprehensive report is generated on the complete set of analyzed compounds. A by-product of the identification and quantification is the purity. From the prediction of the NMR spectrum the integral regions of the NMR signal of the compound of interest are known. A relative comparison of this integral and the integral of the total NMR spectrum can be used as a measure for purity. The addition of a fully automated benchtop X-ray diffractometer provides accurate and precise measurements of molecular dimensions, highly useful in new synthetic chemicals, catalysts, pharmaceuticals, natural products and many more.
Why not watch our webcast on Small Molecule Characterization? The authors of this article together delivered an informal seminar that goes into more detail. Go here to watch the webcast now.