Pseudorandom Vs. Random Polymers - How to Improve the Efficiency of Lithography-based Synthesis

Phillip Stafford

doi:10.31021/scbj.20181102

Research Article

Pseudorandom Vs. Random Polymers - How to Improve the Efficiency of Lithography-based Synthesis

Phillip Stafford

DOI10.31021/scbj.20181102

Manuscript IDSCBJ-1-102

Volume / IssueVolume 1, Issue 1

JournalSystems and Computational Biology Journal

ReceivedDecember 07, 2018

AcceptedDecember 27, 2018

PublishedDecember 29, 2018

Author Affiliations

VP Bioinformatics, Caris Life Sciences, Phoenix, Arizona, United States

Corresponding Author

Phillip Stafford, VP Bioinformatics, Caris Life Sciences, Arizona, United States

Citation

Stafford P. Pseudorandom Vs. Random Peptides-How to Improve the Efficiency of Lithography-based Synthesis of Peptide Microarrays. Syst Comput Biol J. 2018 Dec;1(1):102

© 2018 Stafford P. This is an openaccess article distributed under the terms of the Creative Commons Attribution 4.0 international License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Abstract

Computer lithography has been used for decades in the electronics industry. Biochemists have used light-directed lithography and photoacids and photobases to enable BOC- and FMOC-based peptide synthesis, enabling the creation of high-density peptide microarrays. By reducing the size of the features, more peptides can be produced on industry-standard microscope slides. Phage display has set the standard for the utility of probing a high density peptide library. However, a sufficiently high-density peptide microarray can produce similar results in a few hours vs. weeks of panning and selection.

For the specific case of random peptide microarrays, we examined a method to reduce manufacturing costs by exchanging perfectly random peptides to pseudorandom ones. The simplest mathematical formula for predicting the number of synthesis steps is M (steps)=AA (# of different amino acids) * L (peptide length). However, when M< AA*L, peptides can no longer be perfectly random, and a sequence bias will result. M affects synthesis time, AA and L affect performance given that performance is directly related to the lack of sequence bias. We isolated M as a variable and tested how different values of M affect the performance of immunosignatures. Immunosignatures require random peptides to detect patterns of antibody binding. As M decreases, the peptides gain sequence bias. The immunosignature assay is used to quantify the impact of M. We first measured the predicted properties of peptide libraries for M=35, 70, 140 and 272, then we synthesized actual peptides and tested how 51 different monoclonal antibodies bound to the non-random peptides. We then tested whether Coccidiomycoses possadasii (Valley Fever) infections could be distinguished – a very simple task for a random peptide library. At low M the disorder in the peptide library decreases, and one sees a near-monotonic signal with low information content. There is a value of M, however, where synthesis steps can be reduced without unduly compromising the information content of the resulting immunosignature. This tunable parameter should be examined as a way to reduce synthesis time and manufacturing costs for random peptide microarrays.

Keywords

in situ peptide synthesis; Photolithography; Random peptide; Immunosignature

Introduction

Many peptide microarrays are in the form of tiled epitope arrays [1]. They are commonly used to map antibodies to a segment of a protein. However, there are other uses for peptide microarrays-cell binding [2], small-molecule binding [3], protein binding [4], kinase activity [5], even enzyme modulation [6] and a process that we use called ‘immunosignaturing’ [7]. Epitope arrays rely on antibodies recognizing and binding the exact sequence against which they were raised. Immunosignatures, on the other hand, use random sequence peptides and do not rely on perfect antibody-peptide matches. On an immunosignature peptide microarray, antibodies can bind to their cognate peptide, but they can also bind apparently unrelated peptides with equal or higher affinity [8,9]. This non-cognate binding may be to mimotopes of the eliciting antigen [10]. Another explanation is that off-target binding is an intrinsic property of an antibody, theoretically as important as on-target binding. Immunosignatures detect both properties. There is a great deal of information that can be extracted from an immunosignature assay, even to the point of diagnosing disease [11-14].

Epitope arrays can be very expensive to manufacture, limiting their use to smaller experiments. Immunosignature arrays can be produced at very high numbers, since they use manufacturing processes taken from the semiconductor industry. Increasing the number of arrays, manufactured would actually reduce the cost per unit. We will describe a method to reduce manufacturing complexity and describe how simplifying manufacturing affects the randomness of a peptide library and how randomness affects immunosignatures.

Immunosignatures use random sequence peptides to gauge specificity of antibodies

Immunosignatures measure the binding between random peptides and antibodies. Immunosignatures measure both on-target and off-target binding of antibodies simultaneously [8]. As a reference, phage display or phage panning probe extremely large libraries. The selection process leaves only strong binders. Those survivors are usually closely related to the actual epitope. In contrast, peptide microarrays do not utilize a selection process. All of the binding data is retained, and is scalable. Phage libraries can be extremely large [15] which is to their benefit, while microarrays typically contain a few million peptides at best [16]. However, each peptide interaction can be more informative than the final pool of phage display peptides. In addition, even a relatively small random peptide library contains many unique epitopes [17-19] if longer peptides are used [1,14], while phage display often relies on 6-10mers [20]. Phage libraries may be diverse but microarrays can extract low binders as well as high binders. By analyzing binding intensities one can identify residues at key positions that play a role in enhancing binding vs. those that prevent binding. When sera or an antibody is applied to a random peptide microarray, the resulting ‘immunosignature’ [7] can provide information about how many different peptides the antibody binds, how diverse are the binding sequences, where the critical residues lie in the epitope, and how amino acids bind to the paratope. The immunosignature can be so informative and reproducible that diseases can be diagnosed with high accuracy [11,13,14,21,22]. This diagnostic feature does not require information about peptide sequences but even so, the peptide sequences in a signature can still predict linear epitopes from infectious agents [17] or cancer antigens [19].

In situ peptide synthesis

There are many ways to manufacture peptide microarrays. For a review see Gao et al. [27,28]. Light-directed chemical synthesis has been quite successful, dating back to 1995 (Fodor et al. [29], US Patent 5405783A). Some synthesis methods, direct light on a photoacid [16] or photobase [30] generator which removes amino acid protecting groups. Other methods use photoactivatable amino acids [29]. Some methods use shadow masks [16], some use digital micromirrors [31], some utilize a peptide ‘toner’ and a 20 ‘color’ laser printer [32], while still others use electrical circuits to alter the pH in a microwell [33]. Each of these synthesis methods relies on the systematic addition of amino acids to a growing peptide. For random peptide libraries, all of these methods can take advantage of reduced synthetic steps using pseudorandom peptides.
One of the highest throughput methods for the production of peptide microarrays is shadow-mask lithography [34]. Shadowmasks are pre-made so they are not as flexible as programmable light-directed arrays, and they require substantial up-front costs. However, by implementing high-resolution steppers, automated aligners, synthesizers and large wafers, mask-based peptide microarrays can be very inexpensive, high density, high precision and high throughput. These characteristics are important for high volume/low costproduction.
Random vs. pseudorandom libraries
Many immunological experiments make use of random sequence peptides [35]. Phage display uses random sequence libraries [36] and has been used to identify therapeutics [37], epitopes [38], even enzyme modulators [39]. Panning large libraries can identify very specific biomarkers, but these methods are neither cheap nor high throughput. Immunosignatures use a fixed library of a given size for any sample. A diverse surface of 3D shapes produced by random sequence peptides has been very successful in many applications [11,13,21,40-42]. Surprisingly, 10,000 peptides were sufficient to distinguish over 15 different diseases simultaneously, with >95% accuracy [14]. However, 10,000 peptides likely had a finite level of specificity and sensitivity. It was decided to create at least 300,000 peptides for the next immunosignature iteration. When M=AA*L, synthesis of 300,000 17mers would take over 2 weeks of continuous synthesis. Altering the library from random to pseudorandom could drastically reduce manufacturing time.
How in-situ peptide synthesis is affected by amino acid selection
The simplest algorithm for the number of synthesis steps (or photomasks) required to produce a peptide of length L is M (number of masks)=AA (number of amino acids used) * L (length). For a 17mer using 20 amino acids, 20*17=340 synthetic steps are required. In mask-based peptide photolithography, 340 steps would take over 2 weeks to complete. For immunosignature peptides, M can be less than AA*L, but the resulting peptides become pseudorandom. Each mask costs several thousand dollars. Each synthesis step takes approximately 15 minutes. It is beneficial to reduce M. Figure 1 illustrates the bias imposed by forcing M to be less than AA*L.
Experimental design
We asked two questions about reducing synthesis steps. First, how do the chemical characteristics of the peptide library change as mask numbers are reduced? Second, how does reduced diversity alter the outcome of a biochemical assay? The first question was addressed using predicted characteristics of hypothetical peptides. For the second question, an immunosignature of monoclonal antibodies and an immunosignature of human sera are used to measure performance. We first established the starting parameters. Kuznetsov [43] noted that some amino acids are preferentially used in immunosignatures [43]. To maintain consistency with the actual immunosignature arrays that will be manufactured, we eliminated Methionine, Threonine and Isoleucine because they appear rarely in peptides that make up diagnostic immunosignatures [43]. Cysteine can form disulfide bonds under aqueous conditions, and was removed for that reason. Thus our virtual 17mer peptide library used 16 amino acids (no C, T, M, or I). We created two control sets: M=272 uses 16 amino acids and no other mask restrictions. M=323 leaves out only Cysteine. The M=323 case matches exactly the existing 10,000 peptide spotted peptide library that has been used in several early publications (NCBI GEO accession GPL17600). We also created intermediate test sets where M=35, M=70, and M=140.
The first analysis was completed using 100,000 virtual peptides. For each library, an algorithm (see Supplemental Information) creates M masks with 100,000 features and ‘synthesizes’ peptides using these virtual masks. The resulting library is analyzed for chemical properties using ProtParam ((http://web.expasy.org/protparam/).
The second analysis involved synthesizing a library of actual peptides corresponding to the same mask restrictions. 384 peptides were selected at random from each 100,000 virtual peptide libraries and sent to Sigma Genosys (St. Louis, MO) for synthesis. We already had several M=323 10,000 peptide microarrays made, so the human sera experiments for M=323 were done by randomly picking 384 peptides from the 10,000 existing peptides to simulate a 384-peptide library. Each of these 384-peptide sets (except M=323) were tested using 51 different commercial monoclonals (Supplemental Information Table 1). We previously published data on monoclonals using the 10,000 peptide microarray [7]. Human sera from 8 controls (healthy) humans and 8 patients diagnosed with Coccidiomycoses [11] were used for all libraries including M=323.

Materials and Methods

In silico peptide generation

We created an R script called “Peptide Generator” in order to simulate how peptides would be created when the number of synthetic steps is less than AA*L (Supplemental Information). The program uses the following variables:

P is the library size, number of peptides

M is the number of virtual masks

N is peptide length

DD is the number of amino acids used

aa is the one-letter amino acid code

peptide is the output of P peptides

The script can use one-letter code for D- or L-amino acids. The algorithm currently uses loops to cycle through the synthesis steps to allow non-coders to clearly see the steps, but for efficiency the loops should be replaced by ‘apply’ statements.To compare the predicted chemical characteristics of peptides made by Peptide Generator, we created 100,000 virtual 17mer peptides using 16 amino different acids for each mask restriction scenario (M=35, 70, 140 and 272) as well as 100,000 completely random peptides using 19 amino acids (M=323). The method for creating the peptide libraries is by the greedy algorithm, where masks are made available as amino acids are cycled through one by one. The amino acids are added one step at a time to the growing peptide chain. The program cycles through each amino acid alphabetically and stops when every peptide is exactly 17 amino acids long and every mask is used once. The algorithm as written does not attempt to make the distribution of amino acids even during synthesis. As a result, some peptides tend to catch up in length as the masks run out by quickly adding any available amino acid(s) to the N-term of the synthesis. This creates a bias in sequence,driven by the order of the amino acid addition. This bias was deemed acceptable since each peptide library has the same bias, and performance comparisons were relative. Methods to artificially compensate for this bias introduced an unpredictable distribution that was more complex to decipher.

In order to examine the chemical characteristics of each library of 100,000 peptides, we used ProtParam (http://web.expasy.org/ protparam/). A library of 100,000 peptides was deemed sufficient to summarize the trends for each mask set. ProtParam takes as input one-letter amino acid sequences and then predicts the chemical properties of each peptide. We examined molecular weight and theoretical pI to gauge how our generated peptides differed from completely random peptides, and we measured the number of repeated 2mer, 3mer, 4mer, 5mer and 6mer sequences to estimate library diversity. Each mask set is summarized by the average molecular characteristics of 100,000 peptides.

Peptide synthesis and array printing

The second part of the project is a biochemical test with actual peptides. Since 100,000 synthesized peptides would cost millions of dollars and many months to synthesize, we chose to use a much smaller library and increased the number of antibodies tested on these arrays to 51 very disparate monoclonals. Small-scale peptide synthesis yielded enough peptide to print thousands of microarrays. 384 peptides per mask set were synthesized at 80% purity, 5 mM scale by Sigma Genosys (St. Louis, MO). Failed syntheses were repeated until Sigma deemed the synthesis successful using their quality metrics. MALDI spectrograms indicated that each peptide had a major peak at the expected mass corresponding to an average purity of >92% according to Sigma Genosys. S1a and S1b Figures show an example of the MALDI chromatograms of a partial truncation that was resynthesized (S1a Figure) and a properly synthesized peptide (S1b Figure). Peptides were synthesized C-term to N-term, with a terminal Glycine-Serine-Cysteine linker that enabled attachment of the free N-term cysteine to a sulfo-SMCC linker (sulfosuccinimidyl 4-N-maleimidomethyl) cyclohexane-1-carboxylate, catalog number 22322, Pierce/Thermo Fisher Inc.) to a Schott A+ Nexterion aminosilane slide (catalog number 1064875, Schott, Jena, Germany). Truncated peptides were capped by the manufacturer and would not have a terminal cysteine ensuring that they do not attach to the slide. Peptide stocks were made by diluting 2 mg of peptide in distilled water + 20% acetonitrile at a final concentration of 2 mg/ml, based on measurement of peptide quantity by Agilent BioAnalyzer 2100 in protein mode (Agilent Inc., Santa Clara, CA). Microarray spotting plates were prepared using 384-deepwell plates by diluting the stock 2 mg/ ml peptides into HEPES pH 7.3 with 2% TCEP (tris(2-carboxyethyl) phosphine) to a final concentration of 0.5 mg/ml. Printing was done at Applied Microarrays (Tempe, AZ) by piezo non-contact printing of approximately 150-200 pL per feature. Post-dispense, slides were stored at RT in 80% humidity to allow the maleimide conjugation reaction to occur. Prior to use, the arrays were washed with 30% ethanol, 500 mM EDTA pH 7.8, 40% acetonitrile to remove unbound peptides (Immunosignature effects paper). Every peptide was printed twice per assay. By duplicating the spots we ensured that any discrepancy in printing could be measured. The duplicate peptides were averaged for analysis. We used a 2-up assay system from Tecan (HS 4800 Hybridization Station, Tecan, Månnedorf, Switzerland) to perform the antibody incubations. The automation of the HS 4800.

reduced operator and day-to-day variability. For every monoclonal, two technical replicates were run on different slides. Averaging across replicates accounts for a possible discrepancy in the assays. The technical replicates were run on physically different slides to reduce systematic bias in the assay. The technical replicates must exceed 0.85 correlation coefficient in order to achieve our quality threshold. Assays that failed would have been re-processed. However, during the experiment no replicate failed to meet our quality threshold. Postassay measurements yielded peptide-to-peptide correlation average >0.99 suggesting that peptide printing was an insignificant source of variation. The average correlation across technical replicates was 0.96 suggesting that day-to-day and operator variance were insignificant sources of variation. Each microarray was incubated with one of the 100 different commercially sourced monoclonals listed in Table 1.

Immunosignature assays-monoclonals

For the monoclonal assay, only M=35, M=70, M=140 and M=272 peptide libraries were used. M=323 was only tested against human serum. Each of the 51 commercial monoclonals was added to 800 uL incubation buffer (3% BSA, 1x PBS pH 7.2, 0.05% Tween 20) to a final concentration of 5 nM. Incubation buffer + monoclonal was added to each microarray of 384 peptides. The Tecan HS4800 was programmed to incubate each slide with shaking for 1 hr at 37°C. The wash program included 3 washes of 5’ each with 1x Tris-buffered saline pH 7.3 followed by 3 washes of 5’ each using distilled water. When complete, the HS 4800 dried each slide with compressed nitrogen. Detection was by goat anti-Mouse IgG (H+L) directly conjugated to Alexafluor-555 (Life Technologies/Thermo Fisher) incubated at 4 nM for 1 hr followed by identical wash steps. Images were recorded by an Agilent ‘C’ scanner. Slides were scanned at 100% PMT, 70% laser power at 10 μm resolution using the 545 nm laser. Arrays were aligned using GenePix 6.0. Data was analyzed in R (CRAN.org) and GeneSpring 7.3.1 (Agilent, Santa Clara, CA).

Immunosignature assays–human sera

Eight patients who were diagnosed with Coccidiomycoses [11] with a measured titer of 1:256 against coccidioidin were tested against eight otherwise healthy age- and gender-matched volunteers in a standard immunosignature assay [11]. We measured the minimum p-value obtained between case and control for each mask set. All immunosignature assays were run in duplicate and averaged. T-tests were performed across all 384 peptides for each mask set and recorded. For M=323, a 10,000 random peptide microarray was used (NCBI GEO platform accession GPL17600), and 384 peptides were selected at random for analysis.

Analysis methods

In order to compare how effective different peptide cohorts work for immunosignatures, we first made the assumption that signatures for different monoclonal antibodies must be different [7]. We therefore calculated the Pearson’s Correlation Coefficient for each set of 384 peptides across each of the 51 different monoclonal antibodies. If every monoclonal was mathematically distinguishable using immunosignatures, it follows that the correlation coefficients across the different antibodies would be low. Conversely, if the monoclonals could not be distinguished, or the information content per peptide library is low, it is likely that the correlation coefficients would be high. We did this for the 4 different mask sets from M35 to M272. This data is listed in Table 1.

For the coccidiomycoses set, we assayed eight different case and eight different control serum samples according to previous protocols [11]. We report the minimum p-value for each mask set, M=35, 70, 140, 272 and 323. For M=323, we performed the immunosignatures on the CIM 10,000 random peptide microarrays, and selected 384 peptides at random for the analysis. This prevented having to re-synthesize completely random (M=323) peptides, since completely random peptides already exist on the 10,000 peptide printed microarray. All peptide lists and data are publicly available at GEO (series accession numbers GSE50044, GSE50045).

In order to assess how peptide disorder affected immunosignature performance, we calculated Shannon’s Entropy for each of the peptide libraries, since this method is a direct measure of disorder [44]. We used Support Vector Machines as the classifier for predictor of the identity of each monoclonal using a train/test paradigm. We mathematically removed 38 peptides from the 384 peptide libraries, and trained the SVM classifier. We then gave the classifier the leftout 38 peptides and asked it to predict the identity of each of the 51 different monoclonals. Thus, the classifier is completely naïve to the predictor peptides. This provided a way to directly assess peptide entropy vs. immunosignature performance. We used Shannon’s Entropy formula:

Entropy for each peptide library was calculated per peptide and averaged.

Table 1

Table 1: Test of immunosignaturing performance-This table lists values that were used to characterize immunosignature performance. From left to right: M represents the number of masks; Avg. CC Ab represents the average correlation coefficient calculated across every possible combination of the 51 monoclonal antibodies; Ave CC serum is the average correlation coefficient calculated across every possible combination of control and infected sample; ANOVA p -val is the p -value from the F-test using every monoclonal as a class, and the technical replicates as the measurements; Avg. raw value Ab is the average of all 51 monoclonals for each of the tested mask libraries; Median raw value is the median of all 51 monoclonals for each of the tested mask libraries; CV is the Coefficient of Variation (standard deviation corrected by mean) for every peptide across every monoclonal for each mask set. Note that the M=323 library was only used for the serum experiments

Figure 1: Analysis of amino acids prevalence across each peptide position using mask restrictions. 100,000 peptides were contructed in silico using a randomization method that takes into account the total number of shadow masks used in synthesis of 17mer peptides. Each graph shows the number of times each of the 16 amino acids is used at each of the 17 positions for a given number of masks. The X-axis for each panel represents the position in the peptide from C-terminus (position 1, facing the surface of the microarray) to the N-terminus (position 17, facing away from the surface). The Y-axis is the number of times each amino acid appears at that position. The legend associates graph colors with an amino acid. The lower middle graph is M323, and is used as a baseline comparison. M323 is unique as it uses 19 amino acids (excluding cysteine) and mimics a library of random peptides if synthesized on beads and cleaved/ purified, with no mask number restrictions. M323 should have no positional bias. M=35 (upper left) shows the highest positional bias, suggesting that diversity of sequence would be compromised using only 35 masks, even though the number of steps is nearly 10-fold less than fully random. M=70 (middle top), M=140 (right top), and M=272 (bottom left) show progressively decreasing bias trends, and smoother amino acid distributions. M=323 has the lowest variation in positional bias across the length of the peptide since each position was determined by a random number generator, independently

Figure 2: Left 5 panels: distribution of duplicated 5-mers for M=35, M=70, M=272 and M=323. As the number of masks decreases, diversity in the resulting sequences is reduced, creating redundant sequences. The 5 panels above illustrate the number of duplicate 5mers. The X-axis is the total counts of duplicated 5mers in the 100,000 peptides, the Y-axis is the proportion of 5mers. The average for M323 (far right, top) is 1. The average number of duplicated 5mers for M=35 is 2 suggesting that redundancy is a hallmark of reduced mask numbers. Right 5 panels: Analysis of predicted peptide characteristics as a function of mask number. As in the left panels, each mask set created a distribution of data. Here the data is Molecular Weight. The trend seen here is similar for Extinction coefficient, beta sheet tendency, Alpha helix tendency, Beta turn tendency, Antigenicity, Isoelectric point, Aliphatic character, Average residue volume, Hydrophilicity, Flexibility, Sequence complexity, and Accessibility (all taken from ProtParam). X-axis is MW, Y-axis is the proportion of the total library for each MW value. The lower mask numbers tend to create sets of peptides with very specific and extreme molecular weights. For M35, weights ranged from 1856 to nearly 2600 while for M=323, weights ranged from 2380 to 2418. High mask numbers tend toward a more Gaussian distribution of molecular weights.

Figure 3: Display of unique 2mer, 3mer, 4mer, 5mer and 6mers contained within 100,000 peptides using the M=35, M=70, M=140, M=272 and M=323 mask restriction sets. The Y-axis is the proportion of unique nmers, and the X-axis is increasing length of subsequences.

Figure 4: From left to right Top: Four images from antibodies AAT, A10, 8A1 and 3A1, the same as those used in subsequent figures. Bottom: Same monoclonals on the M=272 library. These images show a promiscuous monoclonal that binds many peptides (AAT), two different moderate-specificity antibodies (A10 and 8A1) and a high specificity monoclonal that binds very few peptides but with high intensity (3A1). These four antibodies produce very different binding profiles on M=272 while M=35 produced very similar binding profiles.

Figure 5: Bar graphs of each mask set (M=35, red; M=70, yellow; M=140, cyan; M=272, blue) for all of the 51 tested monoclonals. Each dataset was median normalized, ensuring that the median signal is 1 for every monoclonal. The solid box is the first quartile, centered on the median. The dashed lines are the 95% percentile, outliers are show as colored dots

Figure 6: Heat map four of the 51 total monoclonals across four of the mask libraries: 3A1, 8A1, A10 and AAT, all processed on M=35, M=70, M=140, and M=272. Each peptide was converted from a sequence to a number, from 1 to 384, and sorted from lowest to highest. The rows represent the rank-order of the intensity values for each of these four monoclonals. The vertical colored bars (red, yellow, cyan, blue, and purple) represent the k-means clustering of the values, with the purple group containing the highest intensities and the red group containing the lowest intensities.

Figure 7: Principal Components Map (PCA) of all 51 monoclonals (colored dots) for each of the four mask sets (colored by mask set). The first two principal components are plotted on the X and Y-axes, respectively. The least dispersed is M=35, the most M=272 (as it is spread across both component 1 and component 2).

Figure 8: Shannon’s Entropy for each of four peptide libraries vs. cross-validation accuracy of prediction of monoclonals Shannon’s Entropy, a standard measure of disorder, was calculated for each of four different peptide libraries (Y axis). Each of 50 different monoclonals was predicted by Support Vector Machines using a randomly selected 1/10 of the total 384 peptides per library. This was done 10 times, and the average accuracy for each peptide library is plotted on the X axis. R2 for a linear fit is 0.7029.

Conclusions

The data provided here can be applied to maskless peptide syntheses as well, by reducing the number of steps. In cases where amino acids are added simultaneously to growing chains of peptides in multiplexed reaction vessels, the number of cycles can be reduced. As the number of peptides simultaneously synthesized increases, the value of restricting the number of steps increases. Perhaps most importantly, it is not necessary to use the output of Peptide Generator as-is. A more useful method is to produce every possible peptide for a given mask restriction. Then, filter this large collection of peptides. The user can decide what filter is most important, whether reducing redundancy, reducing certain physical characteristics such as hydrophobicity, or picking random peptides with the most life-space motifs. Many of the biases described in this manuscript can be moderated by this post-hoc filter. Depending on the assay, performance of a restriction-set of peptides, if properly filtered, might be competitive with a fully random set. The assay drives the balance between reducing synthesis steps and retaining peptide diversity and disorder. One of the quickest methods for improving the utility of a random peptide library is reducing 3mer, 4mer and 5mer redundancy. Even without restriction, a random library groomed for minimal 5mer repeats is more likely to contain more unique linear targets than an unfiltered set. That single change could enhance epitope identification by an order of magnitude or more. Judicious selection of a library of mask-restricted peptides would not only improve the performance of the resulting random peptide library, but the impact to synthesis time and cost would be substantial.

Although this manuscript focused on shadow-mask lithography, our results can be applied to many peptide synthesis methods. Random peptide libraries are quite useful for a number of immunological and biochemical techniques. As peptide microarrays become denser, the advantages of using random microarrays over phage display grows. Although there are many ways of making peptide microarrays, costs are still very high and production volume remains relatively low. In order to use these arrays for any sort of diagnosis, the costs must drop orders of magnitude and production volume must scale up as well. Shadow mask synthesis can facilitate extremely high throughput production, low cost synthesis equipment, and scalable costs that trend the same way semiconductor manufacturing does. This enables commodity-level high quality peptide microarrays, facilitating new applications in medical care, diagnostics, and even therapeutic antibody quality control and assessment.

Data are provided at the Gene Expression Omnibus at NCBI. Platform: GPL17490, GPL17679 and series Series GSE49217, GSE50044 and GSE50045, “In situ synthesis of peptide microarrays - shadow mask design”.

Acknowledgement

There were no funding agencies for this work. Data are freely available at the Gene Expression Omnibus at NCBI. Platforms: GPL17600, GPL17490, GPL17679, Experimental Series: GSE49217, GSE50044, GSE50045.

References

Buus S, Rockberg J, Forsström B, Nilsson P, Uhlen M, et al. Highresolution Mapping of Linear Antibody Epitopes Using Ultrahighdensity Peptide Microarrays. Mol Cell Proteomics. 2012 Dec;11(12):1790-800.
Domenyuk V, Loskutov A, Johnston SA, Diehnelt CW. A Technology for Developing Synbodies with Antibacterial Activity. PLoS ONE. 2013 Jan;8:e54162.
Boltz KW, Gonzalez-Moa MJ, Stafford P, Johnston SA, Svarovsky SA. Peptide microarrays for carbohydrate recognition. Analyst. 2009 Apr;134(4):650-652.
Sinzinger MD, Chung YD, Adjobo-Hermans MJW, Brock R. A microarray-based approach to evaluate the functional significance of protein-binding motifs. Analytical Anal Bioanal Chem. 2016 Feb;408:3177–3184.
Houseman BT, Huh JH, Kron SJ, Mrksich M. Peptide chips for the quantitative evaluation of protein kinase activity. Nat Biotech. 2002 Mar;20:270-274.
Fu J. Microarray Selection of Cooperative Peptides for Modulating Enzyme Activities. Microarrays. 2017 Jun;6(2):8.
Stafford P, Halperin R, Legutki JB, Magee DM, Galgiani J, et al. Physical characterization of the ‘Immunosignaturing Effect’. Mol Cell Proteomics. 2012 Apr;11(4):M111.011593.
Halperin RF, Stafford P, Legutki JB, Johnston SA. Exploring antibody recognition of sequence space through randomsequence peptide microarrays. Mol Cell Proteomics. 2011 Mar;10(3):M110.000786.
Geysen HM, Rodda SJ, Mason TJ, Tribbick G, Schoofs. Strategies for epitope analysis using peptide synthesis. J Immunol Methods. 1987 Sep;102(2):259-274.
Huang J, Ru B, Zhu P, Nie F, Yang J, et al. MimoDB 2.0: a mimotope database and beyond. Nucleic Acids Res. 2012 Jan;40:D271-D277.
Navalkar K, Magee DM, Galgiani J, Cichacz Z, Johnston SA, et al. Application of immunosignatures to diagnosis of Valley Fever. Clin Vaccine Immunol. 2014 Aug;21(8):1169-1177.
Restrepo L, Stafford P, Magee DM, Johnston SA. Application of immunosignatures to the assessment of Alzheimer’s disease. Ann Neurol. 2011 Aug;70(2):286-295.
Singh S, Stafford P, Schlauch KA, Tillett RR, Gollery M, et al. Humoral Immunity Profiling of Subjects with Myalgic Encephalomyelitis Using a Random Peptide Microarray Differentiates Cases from Controls with High Specificity and Sensitivity. Mol Neurobiol. 2018 Jan;55(1):633-641.
Stafford P, Cichacz Z, Woodbury NW, Johnston SA. Immunosignature system for diagnosis of cancer. PNAS. 2014 Jul;111(30):E3072-E3080.
Rodi DJ, Soares AS, Makowski L. Quantitative Assessment of Peptide Sequence Diversity in M13 Combinatorial Peptide Phage Display Libraries. J Mol Biol. 2002 Oct;322(5):1039-1052.
Legutki JB, Zhao ZG, Greving M, Woodbury N, Johnston SA, et al. Scalable high-density peptide arrays for comprehensive health monitoring. Nat Commun. 2014 Sep;5:4785.
Richer J, Johnston SA, Stafford P. Epitope Identification from Fixed-complexity Random-sequence Peptide Microarrays. Mol Cell Proteomics. 2015 Jan;14(1):136-147.
Halperin R, Stafford P, Johnston SA. GuiTope: an application for mapping random-sequence peptides to protein sequences. BMC Bioinformatics. 2012;13:1.
Donnell B, Maurer A, Papandreou-Suppappola A, Stafford P. TimeFrequency Analysis of Peptide Microarray Data: Application to Brain Cancer Immunosignatures. Cancer Inform. 2015 Jun;14(Suppl 2):219-233.
Zwick MB, Shen J, Scott JK. Phage-displayed peptide libraries. Curr Opin Biotechnol. 1998 Aug;9(4):427-436.
Restrepo L, Stafford P, Johnston SA. Feasibility of an early Alzheimer’s disease immunosignature diagnostic test. J Neuroimmunol. 2013 Jan;254(1-2):154-160.
Williams S, Stafford P, Hoffman S. Diagnosis and early detection of CNS-SLE in MRL/lpr mice using peptide microarrays. BMC Immunol. 2014 Jun;15:23.
Hughes A, Cichacz Z, Scheck AC, Coons SW, Johnston SA, et al. Immunosignaturing can detect products from molecular markers in brain cancer. PLoS ONE. 2012;7(7):e40201.
Stafford P, Wrapp D, Johnston SA. General Assessment of Humoral Activity in Healthy Humans. Mol Cell Proteomics. 2016 May;15(5):1610-1621.
Wang L, Whittemore K, Johnston SA, Stafford P. Entropy is a Simple Measure of the Antibody Profile and is an Indicator of Health Status: A Proof of Concept. Scientific Reports. 2017 Dec;7:18060.
Young JD, Huang AS, Ariel N, Bruins JB, Ng D, et al. Coupling Efficiencies of Amino Acids in the Solid Phase Synthesis of Peptides. Pept Res. 1990 Aug;3(4):194-200.
Gao X, Pellois JP, Na Y, Kim Y, Gulari E, et al. High density peptide microarrays. In situ synthesis and applications. Mol Divers. 2004;8(3):177-187.
Gao X, Gulari E, Zhou X. In situ synthesis of oligonucleotide microarrays. Biopolymers. 2004 Apr;73:579-596.
Fodor S, Read J, Pirrung M, Stryer L, Lu A, et al. Light-directed, spatially addressable parallel chemical synthesis. Science. 1991 Feb;251(4995):767-773.
Choung RS, Marietta EV, Van Dyke CT, Brantner TL, Rajasekaran J, et al. Determination of B-Cell Epitopes in Patients with Celiac Disease: Peptide Microarrays. PLoS One. 2016 Jan;11(1):e0147777.
Pellois JP, Zhou X, Srivannavit O, Zhou T, Gulari E, et al. Individually addressable parallel peptide synthesis on microchips. Nat Biotechnol. 2002 Sep;20(9):922-926.
Beyer M, Nesterov A, Block I, König K, Felgenhauer T, et al. Combinatorial synthesis of peptide arrays onto a microchip. Science. 2007 Dec;318(5858):1888.
Maurer K, McShea A, Strathmann M, Dill K. The removal of the t-BOC group by electrochemically generated acid and use of an addressable electrode array for peptide synthesis. J Comb Chem. 2005 Sep-Oct;7(5):637-640.
Agarwal PB, Pawar S, Reddy SM, Mishra P, Agarwal A. Reusable silicon shadow mask with sub-5 μm gap for low cost patterning. Sensors and Actuators A: Physical. 2016 May;242:67-72.
Kay BK, Adey NB, Yun-Sheng H, Manfredi JP, Mataragnon AH, et al. An M13 phage library displaying random 38-amino-acid peptides as a source of novel sequences with affinity to selected targets. Gene. 1993 Jun;128(1):59-65.
Noren KA, Noren CJ. Construction of High-Complexity Combinatorial Phage Display Peptide Libraries. Methods. 2001;23(20:169-178.
Cai C, Dai X, Zhu Y, Lian M, Xiao F, et al. A specific RAGE-binding peptide biopanning from phage display random peptide library that ameliorates symptoms in amyloid β peptidemediated neuronal disorder. Appl Microbiol Biotechnol. 2016 Jan;100(2):825-835.
Ge M, Yan A, Luo W, Hu YF, Li R-C, et al. Epitope screening of the PCV2 Cap protein by use of a random peptide-displayed library and polyclonal antibody. Virus Res. 2013 Oct;177(1):103-107.
Chin CF, Tan SJ, Gan CY, Lim TS. Identification of Peptide Based Inhibitors for α-Amylase by Phage Display. Int J Pept Res Ther. 2015 Feb;21(3):237-242.
Kukreja M, Johnston SA, Stafford P. Immunosignaturing microarrays distinguish antibody profiles of related pancreatic diseases. J Proteomics Bioinform. 2012 Jan;S6:001.
Legutki JB, Johnston SA (2013) Immunosignatures can predict vaccine efficacy. PNAS. 2013 Nov;110(46):18614-18619.
Boltz KW, Nagaraj V, Svarovsky S, Lake D, Stafford P, et al. High throughput technology for the identification and characterization of glycan binding peptides. Glycobiology. 2006;16(3):1155-1156.
Kuznetsov IB. Identification of non-random sequence properties in groups of signature peptides obtained in random sequence peptide microarray experiments. Biopolymers. 2016 May;106(3):318-29.
Giancarlo R, Scaturro D, Utro F. Textual data compression in computational biology: a synopsis. Bioinformatics. 2009;25(13):1575-1586.