br Conclusion br Methods br
Introduction To predict such topologies, several algorithms have been developed. Quadparser (QP), developed by the Balasubramanian group have taken an approach where sequences matching G≥nN≤mG≥nN≤mG≥nN≤mG≥n model in which the lengths of G-tracts are at least n and lengths of the loops are between 1 and a given maximum, m, are regarded as typical putative GQs (Huppert, 2005). This algorithm soon became generally accepted and implemented in web-based tools specifically designed for GQs (Kikin et al., 2006, Menendez et al., 2012, Scaria et al., 2006). Another algorithm by Niedle group scans for successive G-tracts in windows of a given size along the sequence (Todd et al., 2005). While both algorithms are designed to find sequences fitting a conventionally accepted G-quadruplex-forming pattern, later observations concluded that G-quadruplexes may also be formed by sequences following atypical patterns. For instance, in a particular study by Mergny group focusing the effect of the loop length on the G-quadruplex stability, it on average was shown that GQs can tolerate a single loop to be upto 30 nt long, at least in vitro, given that the G-tracts were adequately long (≥3) (Guédin et al., 2010). This is not only the case for artificially designed GQs but also, biologically relevant GQs are found to be capable of hosting extreme loops. A recently discovered GQ in BCL2 gene has been shown to accommodate a single loop of 12 nt (Onel et al., 2016). While these studies suggest that an extreme loop should be considered, QP only applies a standard limit to each loop. Other types of atypical GQs are ones with unusual G-tracts. Several studies discovered that GQs may be formed in vitro despite G-tracts with mismatches or non-guanine residues bulging the tract (Mukundan and Phan, 2013, Tomasko et al., 2009, Kaluzhny et al., 2009). These studies indicated that G-tracts are more flexible than originally thought permitting a degree of such irregularities in the tract. To improve the identification of putative GQ-forming sequences (GQFS) including aforementioned atypical GQs, other approaches have been developed. G4Hunter (G4H) by Mergny group and another algorithm, scoring according to guanine density and repetition instead of looking for a particular pattern, have been developed (Beaudoin et al., 2014, Bedrat et al., 2016). Although G4H showed increased sensitivity over QP, it provides decreased specificity, leaving space for improvement. Another algorithm, ImGQfinder, was developed to scan for a QP-like pattern that allows a single mismatch or bulge, discarding the other atypical GQ feature, the extreme loop (Varizhuk et al., 2014). PQSFinder (PQSF) is another alternative algorithm written in R designed to include imperfect G-tracts and have greater accuracy than the former (Hon et al., 2017). PQSF requires arbitrarily selected penalties for imperfect G-tracts to calculate an overall score while extremely long loops are not considered. Here we describe a novel algorithm, called G4Catchall (G4C), searching for patterns that considers atypical GQ features and present G4C's improvement over other algorithms in terms of specificity and sensitivity for a set of experimentally confirmed GQ-forming or non-GQ-forming sequences. A software tool which is made available online as a Python package from http://github.com/odoluca/G4Catchall and as a webtool using PHP from http://homes.ieu.edu.tr/odoluca/G4Catchall/.
Results Combinations of the categorical parameters (minimum G-tract length, the permission of extreme loops, and the number and the types of permitted atypical G-tracts) yield 30 different combinations, which are referred to as models (Table 2). Each model was tested against the reference set with different combinations of an extreme loop length maximum ranging between 2 and 45 and a typical loop length maximum between 2 and 15 yielding 90,370 parameter sets. Loop minima for both loop types were set to 1.