Informacije

Ali obstaja vir za neobdelane podatke o pogostnosti genotipa SNP?

Ali obstaja vir za neobdelane podatke o pogostnosti genotipa SNP?



We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Na spletnih mestih, kot je SNPedia, nekatere strani vsebujejo pogostost zadevnega SNP v različnih populacijah na podlagi objavljenih raziskav. Poskušam napisati skript, ki vzame podatke 23andme in jih primerja s frekvenco SNP, da najdem redke SNP-je, ki jih ima uporabnik. Mislim, da je edini način za to, da ga postrgamo iz baze podatkov SNP. Ali kje veste, da so te informacije na voljo v bolj dostopni obliki, idealno vnaprej oblikovani za razčlenjevanje?


Morda boste lahko dobili nekaj neobdelanih podatkov o frekvenci SNP s paketnim poizvedovanjem po bazi podatkov dbSNP. Sam ga pa nisem uporabljal.


To je eden od razlogov, zakaj je bil ustvarjen projekt 1000 genomov.


Ali ste pogledali ALFRED (zbirko podatkov ALlele FREquency)? Podatki so iz leta 2011, vendar se zdijo obsežni in imajo zip datoteke, ki jih je mogoče prenesti na http://alfred.med.yale.edu/alfred/alfredDataDownload.asp


K hitri in natančni genotipizaciji SNP iz podatkov zaporedja celotnega genoma za diagnostiko ob postelji

Motivacija: Genotipizacija niza variant iz baze podatkov je pomemben korak za identifikacijo znanih genetskih lastnosti in variant, povezanih z boleznijo, znotraj posameznika. Naraščajoča velikost variantnih baz podatkov in velika globina podatkov zaporedja predstavljata izziv za učinkovitost. V kliničnih aplikacijah, kjer je čas ključnega pomena, metode, ki temeljijo na poravnavi, pogosto niso dovolj hitre. Da bi zapolnili vrzel, Shajii et al. predlaga LAVA, metodo genotipizacije brez poravnave, ki je sposobna hitreje genotipizirati polimorfizme enega nukleotida (SNP), vendar ostaja veliko prostora za izboljšave v času delovanja in natančnosti.

Rezultati: Predstavljamo metodo VarGeno za genotipizacijo SNP iz podatkov sekvenciranja celotnega genoma Illumina. VarGeno gradi na LAVA z izboljšanjem hitrosti poizvedovanja k-mer in natančnosti strategije genotipizacije. VarGeno ocenjujemo na več prebranih nizih podatkov z uporabo različnih seznamov SNP za genotipizacijo. VarGeno deluje 7-13-krat hitreje kot LAVA s podobno uporabo pomnilnika, hkrati pa izboljšuje natančnost.

Razpoložljivost in izvedba: VarGeno je brezplačno na voljo na: https://github.com/medvedevgroup/vargeno.

Dodatne informacije: Dodatni podatki so na voljo na spletni strani Bioinformatika.


Ozadje

Populus nigra je glavna drevesna vrsta iz evroazijskih obrežnih ekosistemov in ena od 3 glavnih starševskih vrst, ki se uporabljajo v programih vzreje topolov za razvoj visoko produktivnih medvrstnih kultiviranih hibridov. Zaradi teh razlogov je bilo nedavno ustanovljenih več pobud za ustvarjanje genomskih virov znotraj te vrste kot orodja za izboljšanje strategij ohranjanja in vzreje [1, 2]. Glavni cilj takšnih pobud je odkriti in tipizirati genomske različice, kot so polimorfizmi enojnih nukleotidov (SNP), za različne aplikacije, vključno z identifikacijo in kvantificiranjem introgresij iz kultiviranih kompartmentov, študijo strukture populacije in identifikacijo variant, povezanih z ekonomsko oz. ekološko pomembnih fenotipov prek asociacijske genetike.

Zgodnje študije v P. nigra so se osredotočili na ponovno sekvenciranje specifičnih kandidatnih genov iz poti lignina [3–5], vendar je novejše delo razširilo obseg analiz z razvojem genotipskega čipa iz SNP, odkritih s sekvenciranjem celotnega genoma [1, 2]. To orodje za genotipizacijo je bilo uspešno uporabljeno za preučevanje strukture genetske raznovrstnosti vrste [1] in za identifikacijo nekaterih genomskih regij, povezanih z gospodarsko pomembnimi lastnostmi [6]. Vendar pa je bila genotipizacija omejena na 7903 SNP, prednostno lociranih znotraj določenih kandidatnih regij, na katerih temeljijo nekateri lokusi kvantitativnih lastnosti (QTL), o katerih so predhodno poročali pri dvoparentalnih križanjih. Poleg tega je pogostost SNP znotraj P. nigra Zdelo se je, da so populacije pristranske navzgor, kar je omejilo analize na običajne različice [1]. Posledično bi lahko bila uporaba tega čipa, zlasti v asociacijski genetiki, omejena, kot je poudarjeno z majhnim številom poročanih pomembnih povezav [6]. Dejansko bi glede na hitro razpadanje neravnovesja povezav (LD) znotraj te vrste in njene velikosti genoma, bi izčrpna študija povezav na ravni genoma (GWAS) zahtevala med 67.000 in 134.000 enakomerno razporejenih SNP, kar je med 8 in 16-krat več od števila SNP-ji so na voljo iz zgoraj navedenega čipa [7, 8].

Za dostop do velikega števila SNP-jev, kot je običajno potrebno za izčrpen GWAS in P. nigra, bi bilo na voljo več možnosti, ki se zanašajo na zaporedje naslednje generacije. Če se zdi, da je zaporedje celotnega genoma še vedno predrago za precej veliko število genotipov, zmanjšanje kompleksnosti genoma pred sekvenciranjem na primer z restrikcijskimi encimi (GBS [9] RADseq [10]) ali zajemanjem zaporedja (zaporedje eksomov, [11]) se zdi obetavna pot za doseganje ciljev. Dejansko je bil zajem zaporedja pred kratkim uspešno uporabljen za genotipiziranje okoli 350.000 SNP v P. deltoides in identificirati domnevne regulatorje bioenergetskih lastnosti [12]. Zaporedje RNA (RNAseq) predstavlja tudi stroškovno učinkovit način za zmanjšanje kompleksnosti, medtem ko se osredotoča na izraženi del genoma [13]. Vendar se je do danes RNAseq pogosteje uporabljal za odkrivanje SNP kot za neposredno genotipizacijo velikih populacij. Na primer, Geraldes et al. [14] je preko RNAseq odkril okoli 500.000 SNP-jev za razvoj sekundarnega ksilema v P. trichocarpa, kasneje pa je bil SNP čip razvit delno iz predhodno odkritih RNAseq SNP-jev [8] za nadaljnje izvajanje asociacijskega skeniranja [15, 16]. Kljub temu so nedavne študije uporabljale RNAseq kot orodje za odkrivanje in genotipizacijo velikega števila SNP v populacijah [17–21], kar poudarja zanimanje tega pristopa za študije populacije in kvantitativne genomike. Vendar, kolikor nam je znano, nobena študija doslej ni ocenila natančnosti genotipizacije SNP iz podatkov RNAseq.

Cilj te študije je oceniti RNAseq kot orodje za tipkanje dovolj velike količine SNP v naravnih populacijah P. nigra za izvedbo GWAS. V ta namen smo izvedli RNAseq na bazenih mladega diferenciranega ksilema in kambija, zbranih na 2 bioloških ponovitvah 12 genotipov, ki izvirajo iz 6 naravnih populacij. Nadalje smo razvili namenski bioinformatični cevovod za odkrivanje in tipkanje SNP znotraj zaporedij. Natančnost nastalih SNP, ki temeljijo na RNAseq, je bila ovrednotena tudi s (i) primerjavo njihovega položaja in alelov s tistimi, o katerih so predhodno poročali v kandidatnih genih [3, 4], (ii) oceno njihove natančnosti genotipizacije glede na čip SNP [1 ], (iii) ocenjevanje njihove medletne ponovljivosti. Končno so bili dobljeni potrjeni SNP uporabljeni za izvajanje osnovnih genetskih analiz za ponazoritev uporabnosti sproščenega nabora podatkov SNP.


Uvod

Visoko zmogljiva genotipizacija, ki vodi do identifikacije velikega števila polimorfizmov z enim nukleotidom (SNP), spodbuja izvajanje študij povezav na celotnem genomu (GWAS), ki povezujejo različice DNK z zanimivimi fenotipi (Taranto et al., 2018 ). Pri rastlinskih vrstah je GWAS omogočil kartiranje genomskih lokusov, povezanih z gospodarsko pomembnimi lastnostmi, vključno z donosom, odpornostjo na biotske in abiotske obremenitve ter kakovostjo (Boyles et al., 2016 Pavan et al., 2017 Hou et al., 2018 Liu et al. sod., 2018 On in sod., 2019). Te informacije so bile nadalje uporabljene za izvajanje selekcije s pomočjo markerjev (MAS) v rejskih programih in odkrivanje genov, na katerih temeljijo fenotipske variacije (Liu in Yan, 2019).

Na voljo je več metod genotipizacije (pregled Scheben et al., 2017), ki jih običajno izvajajo komercialne osebe ob prejemu vzorcev DNK. Za uporabo v GWAS široko sprejete možnosti genotipizacije spadajo v tri kategorije: ponovno zaporedje celotnega genoma (WGR), zaporedje zmanjšane predstavitve (RRS) in nizi SNP. WGR in RRS temeljita na tehnologijah zaporedja naslednje generacije (NGS) in bioinformatičnih cevovodih, ki odčitke uskladijo z referenčnim genomom in kličejo tako SNP kot genotipe (Nielsen et al., 2011). Nizi SNP se zanašajo na alel-specifične oligonukleotidne (ASO) sonde (vključno s ciljnimi lokusi SNP in njihovimi bočnimi regijami), pritrjenimi na trdno podlago, ki se uporabljajo za zaslišanje komplementarnih fragmentov iz vzorcev DNK in sklepanje genotipov na podlagi interpretacije hibridizacijskega signala. Izbira najustreznejše (stroškovno učinkovite) metode genotipizacije za GWAS pridelka zahteva natančno preučitev več vidikov, in sicer namena in obsega študije, genomskih značilnosti, specifičnih za pridelke, ter tehničnih in ekonomskih zadev, povezanih z vsako metodo genotipizacije.

Neobdelani nabori podatkov SNP, ki izhajajo iz poskusov genotipizacije, so običajno netočni in nepopolni. Poleg tega imajo lahko geni, povezani s fenotipi, majhen učinek na genetsko varianco. V tem scenariju so postopki nadzora kakovosti (QC) ključnega pomena za zmanjšanje lažno pozitivnih ali lažno negativnih povezav, ki se imenujejo napake tipa I in tipa II. QC vključuje filtriranje slabe kakovosti ali domnevno umetnih lokusov SNP, filtriranje posameznikov v zvezi z manjkajočimi podatki, klic anomalnega genotipa in genetske sopomenke ter karakterizacijo odnosov prednikov med posamezniki populacije GWAS. Odlični pregledi so se osredotočali na QC podatkov SNP pri ljudeh (Turner et al., 2011 Marees et al., 2018), vendar je postopek QC lahko precej drugačen za vrste pridelkov. V tem primeru spremenljivke, ki jih je treba upoštevati, vključujejo sistem parjenja, ki prevladuje pri pridelkih (samo- ali odprto opraševanje) in zgodovino razmnoževanja specifične populacije GWAS.

Namen tega pregleda je zagotoviti priporočila o tem, kako načrtovati poskuse genotipizacije in najboljše prakse o tem, kako izvajati QC pri rastlinskih vrstah.


Reference

Metzker, M. Tehnologije zaporedja — naslednja generacija. Nature Rev. Genet. 11, 31–46 (2010). Ta članek ponuja odličen pregled tehnologij NGS in njihovih aplikacij.

Li, R. et al. Zaporedje in de novo sestavljanje genoma velike pande. Narava 463, 311–317 (2010).

Ng, S. B. et al. Exome sekvenciranje identificira vzrok mendelske motnje. Narava Genet. 42, 30–35 (2010).

Nagalakshmi, U. et al. Transkripcijska pokrajina genoma kvasovk, opredeljena z sekvenciranjem RNA. znanost 320, 1344–1349 (2008).

Guttman, M. et al. Ab initio rekonstrukcija transkriptomov, specifičnih za celični tip pri miših, razkriva ohranjeno večeksonsko strukturo lincRNA. Narava Biotech. 28, 503–510 (2010).

Trapnell, C. et al. Sestavljanje in kvantifikacija transkriptov z RNA-seq razkriva neoznačene transkripte in preklapljanje izoform med diferenciacijo celic. Narava Biotech. 28, 511–515 (2010).

Liti, G. et al. Populacijska genomika domačih in divjih kvasovk. Narava 458, 337–341 (2009).

Li, Y. et al. Ponovno zaporedje 200 človeških eksomov identificira presežek nizkofrekvenčnih nesinonimnih variant kodiranja. Narava Genet. 42, 969–972 (2010).

Durbin, R.M. et al. Zemljevid variacije človeškega genoma od sekvenciranja na populacijski lestvici. Narava 467, 1061–1073 (2010). Ta dokument 1000Genomes ponuja uporabo številnih najsodobnejših metod za analizo podatkov NGS.

Flicek, P. & Birney, E. Sense iz zaporednih branj: metode za poravnavo in sestavljanje. Naravne metode 6, S6–S12 (2009).

Kim, S.Y. et al. Oblikovanje študij povezav z združenimi ali nezdruženimi podatki zaporedja naslednje generacije. Genet. Epidemiol. 34, 479–491 (2010).

Li, H., Ruan, J. & Durbin, R. M. Preslikavanje kratkih odčitkov sekvenciranja DNK in klicanje variant z uporabo ocene kakovosti preslikave. Genom Rez. 18, 1851–1858 (2008). Ta članek opisuje MAQ, predhodnico učinkovitih algoritmov za poravnavo, ki temeljijo na hash za kratka branja. MAQ proizvaja tudi klice genotipov. V tem prispevku je predstavljen koncept kakovosti preslikave branja.

Li, J. B. et al. Ciljno zaporedje z multipleksno ključavnico razkriva človeško hipermutable CpG variacije. Genom Rez. 19, 1606–1615 (2009).

Li, R. et al. Odkrivanje SNP za množično vzporedno ponovno zaporedje celotnega genoma. Genom Rez. 19, 1124–1132 (2009).

Li, R. et al. SOAP2: izboljšano ultrahitro orodje za poravnavo kratkega branja. Bioinformatika 25, 1966–1967 (2009).

Ewing, B. & Green, P. Bazni klic avtomatiziranih sledi sekvencerja z uporabo phred. II. Verjetnosti napak. Genom Rez. 8, 186–194 (1998).

Quinlan, A.R. et al. Pyrobayes: izboljšan osnovni klicalnik za odkrivanje SNP v pirosekvencah. Naravne metode 5, 179–181 (2008).

Wu, H, Irizarry, R. A. & Bravo, H. C. Normalizacija intenzivnosti izboljša klicanje barv pri zaporedju SOLiD. Naravne metode 7, 336–337 (2010).

Kircher, M., Stenzel, U. & Kelso, J. Izboljšana osnovna zahteva za analizator genoma Illumina z uporabo strategij strojnega učenja. Genom Biol. 10, R83 (2009).

Kao, W. C., Stevens, K. & Song, Y. S. BayesCall: algoritem baznega klica, ki temelji na modelu, za visoko zmogljivo zaporedje kratkega branja. Genom Rez. 19, 1884–1895 (2009).

Kao, W. C. & Song, Y. S. naiveBayesCall: učinkovit algoritem za bazno klicanje, ki temelji na modelu, za visoko zmogljivo zaporedje. Lekt. Notes Comp. Sci. 6044, 233–247 (2010).

Burrows, M. & Wheeler, D. Algoritem stiskanja podatkov brez izgub za razvrščanje blokov. Tehnično poročilo 124, Digital Equipment Corporation. Tehnična poročila HP Labs [na spletu], (1994).

Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrahitra in pomnilniško učinkovita poravnava kratkih zaporedij DNK v človeški genom. Genom Biol. 10, R25 (2009).

Li, H. & Durbin, R. Hitra in natančna poravnava kratkega branja s transformacijo Burrows-Wheeler. Bioinformatika 25, 1754–1760 (2009).

Lunter, G. & Goodson, M. Stampy: statistični algoritem za občutljivo in hitro preslikavo odčitkov zaporedja Illumina. Genom Rez. 27. oktober 2010 (doi:10.1101/gr.111120.110).

Sundquist, A., Ronaghi, M., Tang, H., Pevzner, P. & Batzoglou, S. Zaporedje in sestavljanje celotnega genoma z visoko zmogljivimi tehnologijami za kratko branje. PLOS ONE 2, e484 (2007).

Zerbino, D. R. & Birney, E. Velvet: algoritmi za de novo sestavljanje kratkega branja z uporabo de Bruijnovih grafov. Genom Rez. 18, 821–829 (2008).

Butler, J. et al. VSE POTI: de novo sestavljanje mikroodčitkov s celotnim genomom. Genom Rez. 18, 810–820 (2008).

Simpson, J.T. et al. ABySS: vzporedni sestavljalnik za kratke podatke zaporedja branja. Genom Rez. 19, 1117–1123 (2009).

Chaisson, M. J. P., Brinza, D. in Pevzner, P. A. De novo sestavljanje fragmentov s kratkimi parnimi branji: ali je dolžina branja pomembna? Genom Rez. 19, 336–346 (2009).

Brockman, W. et al. Ocene kakovosti in odkrivanje SNP v sistemih zaporedja po sintezi. Genom Rez. 18, 763–770 (2008).

McKenna, A. et al. Zbirka orodij za analizo genoma: okvir MapReduce za analizo podatkov o zaporedju DNK naslednje generacije. Genom Rez. 20, 1297–1303 (2010).

DePristo, M.A. et al. Okvir za odkrivanje variacij in genotipizacijo z uporabo podatkov zaporedja DNK naslednje generacije. Narava Genet. 10. april 2011 (doi:10.1038/ng.806).

Harismendy, O. et al. Vrednotenje platform za zaporedje naslednje generacije za študije zaporedja, usmerjene na prebivalstvo. Genom Biol. 10, R32 (2009).

Wang, J. et al. Diploidno zaporedje azijskega posameznika. Narava 456, 60–65 (2009).

Hedges, D. et al. Eksomsko zaporedje večgeneracijskega človeškega rodovnika. PLOS ONE 4, e8232 (2009).

Martin, E.R. et al. SeqEM: prilagodljiv pristop klicanja genotipa za študije sekvenciranja naslednje generacije. Bioinformatika 26, 2803–2810 (2010).

Sherry, S.T. et al. dbSNP: baza podatkov NCBI o genetskih variacijah. Nukleinske kisline Res. 29, 308–311 (2001).

Dai, J. Y. et al. Metode imputacije za izboljšanje sklepanja v študijah povezav SNP. Genet. Epidemiol. 30, 690–702 (2006).

Minichiello, M. J. & Durbin, R. Preslikavanje lokusov lastnosti z uporabo sklepnih grafov rekombinacije prednikov. Am. J. Hum. Genet. 79, 910–922 (2006).

Scheet, P. & amp Stephens, M. Hiter in prilagodljiv statistični model za obsežne podatke o genotipu populacije: aplikacije za sklepanje manjkajočih genotipov in haplotipske faze. Am. J. Hum. Genet. 78, 629–644 (2006).

Browning, S. R. & amp Browning, B. L. Hitro in natančno faziranje haplotipa in sklepanje manjkajočih podatkov za študije povezav celotnega genoma z uporabo lokaliziranega združevanja haplotipov. Am. J. Hum. Genet. 81, 1084–1097 (2007).

Marchini, J., Howie, B., Myers, S., McVean, G. & amp Donnely, P. Nova večtočkovna metoda za študije povezav na celotnem genomu z imputacijo genotipov. Narava Genet. 39, 906–913 (2007).

Howie, B.N., Donnelly, P. & Marchini, J. Prilagodljiva in natančna metoda imputacije genotipa za naslednjo generacijo študij povezav na ravni genoma. PLoS Genet. 5, e1000529 (2009).

Marchini, J. & Howie, B. Imputacija genotipa za asociacijske študije na ravni genoma. Narava Rev. Genet. 11, 499–511 (2010). Ta pregled ponuja izčrpen pregled razpoložljivih statističnih metod za imputiranje genotipov in obravnava različne uporabe imputacije.

Huang, L. et al. Razmerje med napako imputacije in statistično močjo v študijah genetskih povezav v različnih populacijah. Am. J. Hum. Genet. 85, 692–698 (2009).

Schaid, D. J., Rowland, C. M., Tines, D. E., Jacobson, R. M. & Poland, G. A. Testi točkovanja za povezavo med lastnostmi in haplotipi, kadar je faza povezovanja dvoumna. Am. J. Hum. Genet. 70, 425–434 (2002).

Servin, B. & amp Stephens, M. Analiza asociacij, ki temelji na imputaciji: kandidatni geni in kvantitativne lastnosti. PLoS Genet. 3, e114 (2007).

Hellmann, I. et al. Populacijska genetska analiza sklopov pušk genomskih sekvenc iz več posameznikov. Genom Rez. 18, 1020–1029 (2008).

Johnson, P. L. F. & amp Slatkin, M. Obračunavanje pristranskosti zaradi napake sekvenciranja v populacijskih genetskih ocenah. Mol. Biol. Evol. 25, 199–206 (2008).

Johnson, P. L. F. & amp Slatkin, M. Sklepanje populacijskih genetskih parametrov v metagenomiki. Čist pogled na neurejene podatke. Genom Rez. 16, 1320–1327 (2006).

Yi, X. et al. Zaporedje 50 človeških eksomov razkriva prilagoditev na visoko nadmorsko višino. znanost 329, 75–78 (2010).

Li, H. et al. Format poravnave/preslikave zaporedja (SAM) in orodja SAM. Bioinformatika. 25, 2078–2079 (2009).

Le, S. Q. & Durbin, R. SNP odkrivanje in genotipizacija iz podatkov sekvenciranja z nizko pokritostjo na več diploidnih vzorcih. Genom Rez. 27. oktober 2010 (doi:10.1101/gr.113084.110).


Uvod

Posamezni nukleotidni polimorfizmi (SNP) so spremenljivi enobazni položaji znotraj genoma, ki predstavljajo najpreprostejši in morda najpogostejši tip genetske variacije. V skladu s tem so se SNP pojavili kot močno orodje za sledenje dednosti in genetskih variacij in so postali še posebej priljubljeni za študije povezav na celotnem genomu fenotipa (1, 2). Kritična vloga laboratorijske miške je privedla do številnih prizadevanj za obsežno zbiranje in analizo mišjih SNP (3𠄷).

Baza podatkov Centra za dinamiko genoma z enim nukleotidnim polimorfizmom (CGDSNPdb) je bila zasnovana tako, da združi več virov podatkov o mišjih SNP, hkrati pa jih preverja natančnost in doslednost med viri. CGDSNPdb se odlikuje po vključitvi dveh edinstvenih podatkovnih nizov:

Imputed SNP Genotype Resource (IGR) (8), ki ga ustvari model skritega Markova (HMM), ki dodeljuje verjeten genotip in povezane stopnje zaupanja za več kot 8 milijonov SNP v 74 sevih miši.

Podatki, zbrani iz več kot 140 sevov laboratorijskih miši (filtrirani na 72 samooplodnih sevov v trenutni izdaji, različica 1.3) z Mouse Diversity Genotyping Array [MusDiv (9)], mikromrežo visoke gostote s sondami, ki ciljajo na 623 124 SNP in več kot 900 000 invariantnih genomskih regij, ki ciljajo na funkcije, kot so eksoni in variacije števila kopij. Podatki MusDiv SNP bodo posredovani tudi dbSNP po objavi analiznega rokopisa (v pripravi).

Iskalnik CGDSNPdb omogoča številne različne poizvedbe, vključno z iskanjem po kromosomskih regijah, bližnjih genskih opombah ali identifikatorjih SNP. Rezultate je mogoče vrniti kot dinamični html ali v obliki zapisa vrednosti, ločene z vejico (CSV).

Opombe v CGDSNPdb vključujejo značilnosti SNP (npr. prisotnost v dinukleotidu CpG, frekvence glavnih/manjših alelov), skupaj s funkcionalnimi značilnostmi genov, ki kodirajo beljakovine, na katere vpliva SNP (npr. spremembe fizikalnih in kemičnih lastnosti aminokislin, spremembe kodona uporaba in prekrivajoči se ali najbližji sosednji geni). Vse opombe so bile ustvarjene z uporabo avtomatiziranega cevovoda analize z naknadnimi kontrolami kakovosti, opisanimi spodaj.

CGDSNPdb je bil zasnovan predvsem kot vir za podporo projektov imputacije in matrike raznolikosti miške, vendar je na voljo kot nekoliko zmanjšana velikost, vendar z visoko stopnjo zaupanja zbirka mišjih SNP. Posodobitve baze podatkov bodo poganjale razpoložljivost novih ali posodobljenih sklopov genoma, posodobljene izdaje večjih zunanjih podatkovnih nizov SNP, novi viri podatkov SNP in vzdrževanje. Prihodnja rast baze podatkov bo usmerjena predvsem v obsežne projekte, kot je ‘projekt genomov miši’ (http://www.sanger.ac.uk/resources/mouse/genomes/), kot tudi na nize podatkov, ki lahko poveča zastopano raznolikost sevov. Za izboljšano vizualizacijo podatkov ali osnovne postopke nadzora kakovosti se lahko ustvarijo tudi manjše izdaje CGDSNPdb. Ta rokopis ponuja pregled na visoki ravni glavnih komponent CGDSNPdb od različice 1.3 (januar 2010).


Materiali in metode

Vzorčenje, izolacija DNK in genotipizacija

Skupaj 240 osebkov trske z 9 lokacij na 7 ICES (Mednarodni svet za raziskovanje morja) razdelki vzdolž transekta čez Baltsko morje, Kattegat in Severno morje (slika 10, tabela 5) so bili zbrani med oktobrom 2012 in avgustom 2013. Sponke plavuti so bile shranjene v 70 % etanolu pri –70 °C. Genomska DNK je bila izolirana z uporabo kompleta krvi in ​​tkiv Qiagen DNeasy 96 v skladu z navodili proizvajalca in shranjena pri -20 °C. Koncentracijo DNK smo določili z UV-vis spektroskopijo z uporabo spektrofotometra Epoch Microplate (BioTek Instruments, Inc., Winooski, ZDA). Po normalizaciji smo vzorce genotipizirali po meri Gadus mohua SNP-matrika (Illumina, ZDA), ki vsebuje 10.923 testov SNP in jo je razvil norveški konzorcij, ki ga sestavljajo štiri raziskovalne organizacije: Norveška univerza za življenjske vede (NMBU), Univerza v Oslu (UiO), NOFIMA AS in Inštitut za morske raziskave (IMR) 38,68,69 . Vzorci so bili obdelani v skladu z navodili proizvajalca in genotipi, pridobljeni iz Genome Studio (V2011.1). Po filtriranju za odstranitev slabo združenih SNP-jev (neuspešni testi, različice na več mestih) je ostalo skupno 8221 diploidnih SNP-jev. Ta nabor podatkov je bil dodatno obrezan, da bi odstranili: SNP z relativno visoko manjkajočo ravnjo podatkov (več kot 20 % n = 15), monomorfni SNP (n = 32) in SNP z manjšimi frekvencami alelov (MAF) < 0,01 (n = 98) . Končni niz podatkov je vključeval genotipe iz 8076 lokusov.

Zemljevid, ki prikazuje mesta vzorčenja in pododdelke ICES. Vzorčne lokacije in kode so podrobno opisane v tabeli 5. Tanke črte prikazujejo meje med pododdelki ICES.


Genetski divji zahod: neobdelani podatki 23andMe vsebujejo 75 mutacij Alzheimerjeve bolezni

Pljuvačka prispe po pošti v lepi škatli, na kateri je napisano veselo sporočilo »Dobrodošli pri vas«. Potovanje v vaš genom se začne nekaj tednov pozneje z e-pošto, ki vas, stranko 23andMe, vabi, da raziščete vaš DNK. Naučili se boste zabavnih dejstev o svojih prednikih, o svojih neandertalskih ostankih in o tem, ali boste zaradi enega samega koktajla verjetno postali rdeči. Če pa želite, bo podjetje razkrilo tudi genetska tveganja za zdravje in tukaj se stvari obrnejo resno. Zdravstvena poročila 23andMe, ki jih je aprila 2017 odobrila FDA, ocenjujejo genetsko tveganje za štiri dosedanje bolezni, vključno s Parkinsonovo in Alzheimerjevo boleznijo – slednja temelji izključno na prenašanju alela ApoE4. Stranke se tudi naučijo, ali so nevede nosilci katerega od 42 recesivnih alelov, ki zanje ne predstavljajo nevarnosti, lahko pa škodujejo njihovim otrokom. Pomembno je, da lahko z nekaj več kliki odprete Pandorino skrinjico, do konca napolnjeno z vašim genotipom na približno 600.000 položajih posameznega polimorfizma (SNP). Ta nepopisni seznam kombinacij A, C, T in G lahko prenesete v svoj računalnik. Na tej točki ste sami. 23andMe opozarja, da ti podatki niso potrjeni. Nekateri SNP imajo v objavljeni literaturi napačne povezave. Drugi bi lahko zadali precejšen udarec. Čip 23andMe razkrije vaš genotip pri 75 dominantno dednih mutacijah, za katere je znano, da povzročajo Alzheimerjevo bolezen. Ko so ti podatki v vašem računalniku, se lahko uprete, da ne pokukate?

Posledice so razburljive, saj ima neoviran dostop do lastnih genetskih podatkov velike priložnosti: v primeru ApoE4 lahko motivira nosilce, da se pridružijo preskušanju preprečevanja AD ali sprejmejo bolj zdrav življenjski slog. Ljudje, ki v svojih neobdelanih podatkih odkrijejo, da nosijo avtosomno prevladujočo mutacijo AD (ADAD), lahko najdejo pot do prevladujoče podedovane Alzheimerjeve mreže (DIAN). Vendar pa se na žalost raziskovalcev AD 23andMe ni strinjal, da bi nosilce teh mutacij napotil na takšne klinične študije.

Posledice so tudi vznemirljive. Podjetje je za Alzforum povedalo, da je 2 milijona ljudi naročilo komplet 23andMe od leta 2007, vendar ni želelo povedati, koliko jih je izvedelo njihov status ApoE ali dostopalo do njihovih neobdelanih podatkov. Kljub temu splošna rast industrije testiranja neposrednih potrošnikov (DTC) omogoča verjetno, da se bo milijone ljudi kmalu spopadlo s pomenom zapletenega, okornega in ogromnega nabora podatkov, ki je njihov genom. Ponudniki zdravstvenih storitev in genetski svetovalci, ki so usposobljeni za usmerjanje ljudi k sprejemanju informiranih odločitev o tem, ali naj naročijo genotipizacijo, so vse pogosteje pozvani, da poberejo koščke pozneje. To pomeni krizno svetovanje za zbegane potrošnike, ki jih rezultati ujamejo. Drugi morda ne povedo niti duše, morda se bojijo diskriminacije delodajalcev ali zavarovalnic na podlagi njihovih genov. Nekateri menijo, da so prizadevanja podjetja, da bi stranke povezali z genetskimi svetovalci, da bi jim pomagala razumeti presenečenja, ki se skrivajo v neobdelanih podatkih, nezadostna.

Na splošno vzpon genetskega testiranja neposredno na potrošnika, na področju, ki ga je uvedel 23andMe, sproža znanstvena, etična in družbena vprašanja, ki jih je treba obravnavati, je dejal Carlos Cruchaga z univerze Washington v St. Louisu. "Jasno je, da se to dogaja," je povedal za Alzforum. »Človeška genetika se zelo hitro premika. Podatke je enostavno pridobiti, toda razumevanje, kako ravnati z njimi, je bilo vedno izziv." Raziskovalci se močno ne strinjajo glede etike prodaje neobdelanih genetskih podatkov širši javnosti.

Po zgodnjem spotikanju je genetsko testiranje DTC izključeno
23andMe je začel ponujati svoje storitve genetskega testiranja neposredno potrošnikom pred več kot desetletjem, naenkrat pa je svojim strankam zagotavljal ocene genetskega tveganja za 254 bolezni poleg informacij o njihovih prednikih in drugih fizičnih lastnostih. To se je v Združenih državah spremenilo leta 2013, ko je FDA zaprla komponento genetskega tveganja podjetja, dokler ni pokazala, da so njeni testi in interpretacije veljavni (glej novice iz februarja 2014). Aprila 2017 je FDA pooblastila podjetje, da trži tako imenovana poročila o genetskem zdravstvenem tveganju za 10 bolezni, od katerih je vsaka potrjena za tehnično ponovljivost in klinični pomen na podlagi znanstvene literature. Poleg AD in PD odobrena poročila o tveganju razlagajo različice, povezane s pomanjkanjem α-1 antitripsina in dedno trombofilijo. Poročila za Gaucherjevo bolezen tipa 1, celiakijo, zgodnjo primarno distonijo, pomanjkanje faktorja XI, pomanjkanje glukoza-6-fosfat dehidrogenaze in dedno hematokromatozo so bila odobrena, vendar še niso na voljo. Nosilnost alela ApoE4 se uporablja za oceno tveganja za AD stranke, medtem ko se mutacija G2019S v LRRK2 in mutacija N370S v GBA uporabljata za PD. Dandanes lahko kupci izbirajo med dvema izdelkoma: cenejši rezultati samo za prednike ali poročila o prednikih in zdravju.

23andMe je moral v okviru svojega dela pred izdajo dovoljenja pri FDA izvesti študijo, ki je pokazala, da lahko potrošniki razumejo njihove rezultate in kaj pomenijo. Podjetje se je strinjalo, da bo strankam predstavilo opozorila in omejitve, preden odklenejo njihova poročila o genetskem zdravstvenem tveganju. To vključuje izjave, da je genetsko tveganje le ena od komponent celotnega tveganja, da testi ne diagnosticirajo bolezni in da lahko rezultati nekatere ljudi vznemirijo. Podjetje tudi predlaga, da bi nekatere stranke lahko imele koristi od genetskega svetovanja - bodisi pred ali po prejemu rezultatov testov - in jih usmeri v Nacionalno združenje genetskih svetovalcev (NSGC), da poiščejo svetovalca.

Susan Hahn, članica NSGC, ki je specializirana za AD, je za Alzforum povedala, da je testiranje neposredno na potrošnika preusmerilo delovno obremenitev svetovalcev na post-svetovanje in ne na predhodno svetovanje. Večina svetovalcev sodeluje z izvajalci zdravstvenih storitev in strankam pomaga pri odločitvi, ali je genetsko testiranje prava izbira zanje. "Genetski svetovalec lahko postavi težka vprašanja, ki si jih niste mislili zastaviti," je dejala. Z naraščanjem testiranja DTC, je Hahn dejala, da svetovalci vedno pogosteje vidijo stranke šele potem, ko so rezultati testov objavljeni. »Takrat pogosto samo nadzorujete škodo,« je dejala, pri čemer se je sklicevala na stranke, ki so opravile teste, ne da bi v celoti upoštevale. posledice.

Obdelava nepričakovanih informacij o genotipu je lahko težavna. 57-letna Jamie Tyrone iz San Diega je ugotovila, da je nosila dve kopiji alela ApoE4 kot udeleženka v raziskovalni študiji, ki je raziskala stališča o genetskem tveganju. Študiji se je pridružila, da bi izvedela več o genetskih osnovah multiple skleroze, vendar se ni zavedala, da bo odkrila svoj genotip ApoE. V okviru študije ji niso ponudili genetskega svetovanja, je povedala za Alzforum. "Če bi videla svetovalca, bi se odločila, da ne bom sodelovala," je dejala. Tyronov oče je trpel za AD, zato je poznala resnost bolezni. Rezultat jo je pahnil v temno luknjo, je dejala, in diagnosticirali so ji posttravmatsko stresno motnjo, stanje, za katerega je trdila, da je stalo 40.000 dolarjev v svetovanju. Ironično je, da je študija, v kateri je sodeloval Tyrone, ugotovila, da večina prostovoljcev ni utrpela klinično pomembnih ravni anksioznosti ali stiske (Boeldt et al., 2015).

Dandanes Tyrone sodeluje v longitudinalni študiji, ki jo sponzorira inštitut Banner Alzheimer's Institute v Phoenixu, katere cilj je prikazati predklinični potek AD. Ko bo dopolnila 60 let, upa, da se bo kvalificirala za klinično preskušanje. "Možnost sodelovanja v raziskavah je edina korist, ki sem jo prejel od spoznavanja mojega genotipa ApoE," je povedal Tyrone za Alzforum.

23andMe svoje uporabnike opozarja na posledice učenja o ApoE4. Kljub temu je Tyrone za Alzforum povedala, da se ji zdi ta opozorila neustrezna za pripravo ljudi na posledice.

Drugi nosilci ApoE4 to vidijo drugače. 55-letna Julie Gregory iz Long Beacha v Indiani je leta 2012 ugotovila, da je nosila dve kopiji ApoE4 z uporabo 23andMe. Takrat podjetje še ni ponudilo poročil o genetskem zdravstvenem tveganju, ki bi pojasnila rezultate. Pretresen in zmeden je Gregory začel sočustvovati s stotinami drugih operaterjev na forumih strank 23andMe. Te interakcije so jo motivirale, da je oblikovala spletno stran ApoE4.Info, kjer se nosilci ApoE4 podpirajo in razpravljajo o najnovejših raziskavah. "Čeprav sem bila sprva travmatizirana, je bilo to, da sem spoznal svoj status, dobro zame," je dejala. "Znanje je moč."

Gregory je za Alzforum povedal, da bi bilo genetsko predsvetovanje v pomoč, vendar dvomi, da ga večina ljudi uporablja. "To je odličen nasvet, a nikoli nisem videla, da bi ga kdo upošteval," je dejala. Še pomembneje je, da se ljudje lahko obrnejo po tem, ko ugotovijo svoje rezultate, in da se počutijo motivirani za spremembe življenjskega sloga, da bi preprečili svoje genetsko tveganje, je dejala. Gregory je dodala, da čeprav pozna ljudi, ki so utrpeli hude psihične poškodbe, potem ko so spoznali njihov genotip, vključno s PTSD, so ponavadi izjema.

Gregorjevo opazovanje ima podlago v raziskavi. V nedavni študiji je le 4 odstotke strank genetskega testiranja DTC poiskalo svetovanje (Koeller et al., 2017). Scott Roberts z Univerze v Michiganu v Ann Arborju je vodil študijo, ki je leta 2012 anketirala več kot 1000 strank 23andMe in Pathway Genomics, drugega podjetja za genetsko testiranje, preden je 23andMe naletel na FDA. Nekaj ​​udeležencev, ki so uporabljali svetovanje, je običajno imelo predhodne izkušnje z genetskimi svetovalci, so bili visoko izobraženi, premožnejši in mlajši. Far more people shared their genetic testing information with primary care providers than with genetic counselors. This deference to physicians could be a case of “first-stop shopping,” Roberts told Alzforum, or a consequence of the dearth of available genetic counselors. He added that many genetic counselors have long waiting lists they also tend to give DTC customers low priority because they consider at-home tests less urgent than those ordered by a doctor.

Does 23andMe point its customers toward clinical studies they could join? Not in Alzheimer’s disease. As part of its genetic health risk reports, the company shares general research information with its customers, including risk statistics for each variant, other non-genetic risk factors (such as cardiovascular disease, education, and lifestyle), and data supporting the potential benefits of exercise and diet. However, the company does not point them toward specific studies geared to their genotype.

For ApoE4 homozygotes, an obvious choice would be the Generation program by Novartis and Banner. This set of two secondary prevention/early intervention trials is evaluating a BACE inhibitor and an Aβ vaccine from Novartis in asymptomatic ApoE4 homozygotes and heterozygotes (Sep 2016 conference news). These trials are ramping up, seeking a total of 3,340 participants. Based on the allele frequency of ApoE4, more than 100,000 people will have to be screened genetically to fill those trials alone. Researchers universally agree that recruiting asymptomatic, at-risk participants who are willing to learn their ApoE status represents a significant challenge for trial sponsors and participating sites in academia and industry.

23andMe could help with this challenge. After all, presumably many thousands of people have learned their ApoE genotype through the company. According to Jessica Langbaum at Banner, the institute has tried, but has not been able to come to such an agreement with 23andMe. Banner, Novartis, and 23andMe all declined to discuss the issue further with Alzforum, citing ongoing negotiations or legal concerns. In the past, 23andMe has monetized its genetic data, receiving $60 million from Genentech for access to its Parkinson’s data (Jan 2015 news). For now, absent a referral partnership with 23andMe, Langbaum told Alzforum that Banner is stepping up its own efforts to raise awareness about the Generation program, in hopes of catching the attention of the growing number of ApoE4 carriers who have learned their genotype.

Langbaum noted that several former 23andMe customers have found and joined the Generation program. Even though they entered the study already aware of their genetic status, these volunteers undergo the same extensive counseling as other participants in the trial, and then get genotyped again to confirm their status. Prior to joining a Generation trial, many of these participants had already done extensive private research about ApoE alas, many of them only got a taste of one-on-one genetic counseling upon joining the study, said Langbaum. “For most people, this is their first chance to ask someone questions,” she said.

Your Data in the Raw: A Trip Down the Genomic Rabbit Hole?
If the impact of learning your ApoE genotype seems unpredictable, imagine opening a data file containing 600,000 genotypes. 23andMe gives customers access to their personal file, with the stipulation that variants other than the select few included in the genetic health risk reports are not validated for accuracy. The company tells customers that the raw data is only suitable for “research, educational, and informational use, and not for medical, diagnostic, or other use.” Still, inquiring minds may want to know.

Data in the Raw.

A text file of raw data from 23andMe lists the rsid number, chromosome position, and genotype associated with each of more than 600,000 polymorphisms.

Customers can browse their raw genotype data—either by chromosome, gene, or SNP—on 23andMe’s secure website. They can also download it, and upload it to a third-party service for interpretation. For example, for $5 and with a simple click on a bright-green button on its website, Promethease, currently the most popular of these genome-interpretation services, will upload anyone’s 23andMe data. Less than 15 minutes later, the customer can browse their SNPs within Promethease. This site curates information about the potential meaning of DNA variations from SNPedia. For its part, SNPedia derives its data from a variety of sources, including the scientific literature, the NIH-supported database ClinVar, and even the Alzforum mutations database. The current 23andMe genotyping chip contains about 20,000 of the roughly 100,000 SNPs curated by SNPedia, according to SNPedia and Promethease co-founder Greg Lennon.

Perhaps most importantly, a 23andMe/Promethease customer can type in any of the genes known to harbor ADAD mutations—PSEN1, PSEN2, or APP—and voila, a list of potentially pathogenic mutations pops up, along with your predicted genotype, and whether that genotype is “good” or “bad,” and how bad on a scale of one to 10 (see example below).

Dominant Data. One of many ADAD mutations listed in Promethease, found by uploading a 23andMe raw data file and searching for PSEN1. “Magnitude” refers to the size of the variant’s effect. This sample person carries the T allele if he or she carried the G allele in this location, “Repute” would indicate “Bad,” and “Magnitude” would likely indicate the maximum assigned for the pathogenic variant, in this case 7.

According to an analysis conducted by Cruchaga, the custom Illumina genotyping chip that 23andMe currently uses contains 75 familial AD mutations: 53 in PSEN1, nine in PSEN2, and 13 in APP. In addition, Alzforum found dozens of autosomal-dominant mutations for frontotemporal dementia (FTD) in the tau (MAPT), progranulin (GRN), and CHMP2B genes. A bevy of risk-associated SNPs are also represented, including many of the current top 10 GWAS hits listed for AD (see AlzGene), PD (PDGene), and ALS (ALSGene).

Outside of neurodegeneration, SNPs on the 23andMe chip have been linked to myriad other diseases or traits, including cancer, diabetes, cardiovascular disease, addictive behavior, even lack of empathy. Other SNPs were included on the chip to derive ancestry information.

Customers who open their raw data file can find out their genotype at any of the 600,000 SNP positions contained on the chip, far beyond the 18 SNPs that are used for the genetic health risk reports that 23andMe shares with FDA approval. The raw data is available to all customers, even those who purchase the less expensive ancestry-only product that does not come with genetic health risk reports or carrier status, Wu told Alzforum. Therefore, if an ancestry-only customer carrying an ApoE4 allele were to upload his or her raw data to Promethease, they could see their ApoE genotype (see below). 23andMe is not the only DTC genetic testing company that shares raw data with its customers. For example, Ancestry.com customers can also riffle through their raw data, which contain myriad clinically relevant SNPs as well.

Ready or Not: ApoE4. A carrier uploading his or her data to Promethease would face this entry. gs141 designates the ApoE3/4 genotype. Carriers are referred to Julie Gregory’s ApoE.Info website for support.

Surprises in a person’s raw data raise questions about accuracy and once again highlight the importance of genetic counseling. Consider Summer Warner, a Midwestern U.S. woman in her early 20s. Warner was just curious about her ancestry, but got blindsided by an apparent increased risk for developing a deadly neurodegenerative disease. She received her 23andMe results back in 2010. Uploading her raw data to Promethease a few years later, Warner discovered that she had variants associated with the C9ORF72 hexanucleotide expansion that can cause amyotrophic lateral sclerosis or frontotemporal dementia. Terrified, she followed a link from 23andMe’s website to Informed DNA, a genetic counseling service. The company informed Warner that they would not discuss the result with her, Warner told Alzforum. Seeking advice, she reached out to 23andMe,which assured her that Informed DNA would indeed talk with her. She tried again, and received no response. When she reached out to genetic counselors at Washington University in St. Louis, the closest major city, they told her they would not discuss raw data from 23andMe, as they were not validated to clinical standards.

Nevertheless, Warner persisted, and ultimately found her way to Chris Shaw of King’s College London, a geneticist and co-author on the ALS research study mentioning her variant. Shaw relieved Warner’s worries over email. “From our analysis it appeared that the SNP data did not increase her risk of carrying the C9ORF72 disease allele,” Shaw told Alzforum separately. “Her haplotype itself is very common in Europeans and is in no way a proxy for the expansion mutation.” Warner said that while this information assuaged her anxiety, it would have been preferable to speak with a genetic counselor. “Instead, I had to go this crazy route,” Warner said.

Shaw declined to comment about 23andMe directly. He told Alzforum that he considers it irresponsible to share this kind of raw genetic information with people without counseling. Shaw added that complex genetic data, and its relationship with disease risk, is often misunderstood or disagreed upon even among geneticists, counselors, and clinicians, let alone the average person. Even supposedly solid risk factors such as ApoE4 have varied effects in different populations, he said. “Misinformation about genetic data can generate a lot of fear,” he said.

The amount and quality of the data backing up the alleged phenotypic meaning behind a given genotype varies greatly it evolves along with the progress of human genetics research overall.

Adam Boxer of the University of California, San Francisco, said he would approach 23andMe’s raw data with a hefty dose of caution. In particular, Boxer questioned the ethics of handing over information about causal mutations outside of a clinical context, without counseling, and using unvalidated data. Boxer heads the ARTFL consortium, which conducts longitudinal studies on FTD (Nov 2014 conference news). Among ARTFL’s participants are asymptomatic carriers of autosomal-dominant mutations in tau, progranulin, and C9ORF72, some of which can be found on the 23andMe genotyping chip. Boxer said ARTFL adheres to strict protocols for disclosing information about causal mutations to its participants. While he stopped short of suggesting that people not be allowed to access their raw data files, he suggested some sort of warning system or firewall be put in place to alert carriers of severe mutations that they may be about to view potentially alarming information.

Bradley Boeve of the Mayo Clinic in Rochester, Minnesota, heads LEFFTDS, a subset of ARTFL that includes carriers of autosomal-dominant FTD mutations. Boeve told Alzforum that he would be “uncomfortable” with any DTC company offering testing for causal mutations. He cited the potential for psychological harm to people who learn the information in the wrong context. Echoing Shaw and Boxer, Boeve added that there are aspects of genetics that uninformed people would not understand without counseling, such as incomplete penetrance.

Some of the causal mutations genotyped on 23andMe’s chip—including most FTD­ mutations—require genetic know-how to find, as not all of them pop up readily via a third party like Promethease. Even so, in essence, 23andMe informally makes available hard-hitting genetic risk information far beyond the formal health reports sanctioned by the FDA.

Shirley Wu of 23andMe reiterated to Alzforum that the company does not recommend that its customers attempt to extract clinical information from the raw data, but added that they should be free to look through it. “We don’t want to be the gatekeepers of this information,” she said. “It is really the individual’s choice what they want to do with their data.”

According to Wu, the genetic counseling landscape is changing, as a growing cadre of counselors are beginning to specialize in the interpretation of direct-to-consumer genetic results. Brianne Kirkpatrick, owner of Watershed DNA in Crozet, Virginia, is such a counselor. She started her service after encountering anxious people in Warner’s situation, she told Alzforum. Kirkpatrick counsels people before and, more commonly nowadays, after they undergo direct-to-consumer genetic testing. She helps people interpret findings in their raw data files. Kirkpatrick told Alzforum that confusion and anxiety are common. For example, one recent client had a scare upon seeing the list of pathogenic Alzheimer’s mutations in Promethease, because she did not understand that she carried the normal, rather than pathogenic, variant of each one.

Kirkpatrick said she sometimes helps clients navigate the literature supporting disease- associated SNPs, making a point not to overinterpret weak or contradictory information. If a pathogenic mutation, such as an ADAD mutation, is reported, Kirkpatrick recommends that clients obtain a clinical-level genetic test to confirm it. She added that the disclaimers direct-to-consumer genetic testing companies use to caution clients against overinterpreting their raw data don’t work with most people. “Even with the disclaimer, people still believe the data is reliable,” Kirkpatrick said. “The disclaimers are not sufficient to educate the public.”

Case in point, Alzforum found several heart-stopping errors within a 23andMe raw data file obtained from a volunteer. Using rsid numbers associated with mutations in the AD and FTD Mutation Database, Alzforum found genotypes for 26 pathogenic progranulin mutations in the volunteer’s raw data file. According to the genotypes called by 23andMe, this person was homozygous for three separate dinucleotide deletions, each reported to cause FTD in an autosomal-dominant fashion. Homozygous progranulin mutations trigger the childhood lysosomal storage disease neuronal ceroid lipofuscinosis (NCL). However, this volunteer is a healthy adult with no family history of FTD or NCL.

Why the error? One possibility is that the mutations—which have been documented in the genetics literature in one to seven families each—are not actually pathogenic, or were mislabeled in the ADFTD mutation database. A more likely explanation in this case is that the volunteer’s genotypes were wrong, Jose Bras and Rita Guerreiro of University College London explained to Alzforum. Genotyping chips often interpret deletions and insertions incorrectly, they said. In fact, Bras and Guerreiro ran into this same issue with some progranulin mutations on the NeuroXChip—a genotyping chip for neurological disorders that they helped design. Bras said that Illumina, the company that manufactures both NeuroX and 23andMe’s custom chips, gives researchers an estimate of how likely a given mutation is to be called correctly. But in the end, the only way to know for sure is to try the chip out, they said. If it proves not to work for a given genotype, researchers know to ignore it. “Of course, unlike 23andMe, we are not giving this raw data to customers,” Bras said. The progranulin problem is a prime example of just how unvalidated the raw data can be.

People who have had their entire genomes sequenced—an undertaking that is becoming increasingly affordable—have run into similar mismatches between their genotypes and phenotypes. A recent study reported that 11 out of 50 people who had their genomes sequenced supposedly carried pathogenic variants for various disorders. However, only two of those people actually had outward signs of the disease (Vassy et al., 2017).

Given the validation the FDA required to approve the 10 genetic health risk reports 23andMe issues, how does the agency view the raw data? Nary a mention of the raw data file appears in FDA’s authorization letter to 23andMe. The letter does state, however, that the 23andMe personal genome service, which includes the approved genetic health risk reports, cannot be used for “assessing the presence of deterministic autosomal dominant variants.” In an email to Alzforum, the FDA wrote, “Excluded from this authorization are diagnostic tests that are often used as the sole basis for major treatment decisions. Tests providing diagnostic information would be a different intended use to the genetic health risk reports recently authorized by the FDA and require a separate FDA submission.”

Alberto Gutierrez, director of the FDA’s Office of In Vitro Diagnostics and Radiological Health, which approved 23andMe’s genetic health risk reports, told Alzforum that the agency regulates the interpretation of genetic data, not the data itself. “There is a very strong desire by some people to own what they think is their data,” Gutierrez told Alzforum, referring to the raw data files. “As long as 23andMe is not making medical claims about it, we’re allowing them to share it.” What about third-party companies, such as Promethease, that do offer data interpretation? Gutierrez said that the FDA is watching such companies closely, and plans to contact those that “cross the line,” although exactly what that means is unclear at the moment.

In a June 20 commentary about the FDA’s approval of genetic health risk reports published in the Annals of Internal Medicine, Julia Wynn and Wendy Chung from Columbia University in New York criticized the agency’s attempt to separate genetic information from medical claims. “To allow DTC genetic testing and not expect persons to use the information to inform medical decisions is disingenuous and irresponsible,” they wrote. “This ruling is confusing—in asymptomatic persons, the genetic test provides the only data to support the diagnosis of or increased risk for such conditions as Alzheimer or Parkinson disease, and could be the sole basis for a medical decision.”

Good examples of medically actionable information are the BRCA mutations known to drive risk for breast cancer, FDA spokesperson Tara Goodin told Alzforum, because carriers could seek preventive procedures such as a mastectomy. BRCA mutations are not included in the genetic health risk reports, though some are in the raw data file. But how about the 75 autosomal-dominant AD mutations lurking in the raw data? Goodin told Alzforum that as the data are not validated, nor interpreted by 23andMe, customers should not view them as diagnostic tests. If someone were to find they harbored such a mutation, the next step would be to confirm the result via clinical testing with a health care provider, she said.

Randall Bateman of Washington University in St. Louis partly agrees. However, he recommended that before seeing a doctor, people who discover one of these deterministic mutations in their 23andMe raw data first secure life and long-term care insurance, especially if they have a family history of AD. The reason is that, unlike workplace equality or access to medical insurance, access to life and long-term care insurance are not protected under the Genetic Information Nondiscrimination Act (GINA). Once a person has a validated clinical result on his or her medical record, he or she may be required to divulge it on insurance application forms. This issue has even come up for people who carry two copies of the ApoE4 allele (May 2017 New York Times article). After getting insurance, the next step would be to decide, with the guidance of a genetic counselor, whether to seek bona fide clinical testing for the mutation, Bateman said.

Bateman readily acknowledged that discovering a familial AD mutation in a raw data file could be disturbing. “Maybe you were just interested in finding out if you descended from Vikings, and then you find one of these mutations instead,” he said. However, he added that given the strong family history of autosomal-dominant Alzheimer’s disease, the revelation would come as little surprise for most carriers of ADAD mutations. People who discover such a mutation in their raw data file can contact the DIAN study through the DIAN Expanded Registry, where they will be guided toward counseling if they wish. In fact, a few participants discovered DIAN after perusing their 23andMe raw data, though they did so without referral help from the company.

Just as 23andMe has no partnership with Banner or Novartis to direct ApoE4 homozygotes to the Generation trials program, it also does not point carriers of ADAD mutations to DIAN. 23andMe spokesperson Andy Kill told Alzforum that as of now, the company does not keep tabs on how many of its customers carry ADAD mutations. However, Bateman said that directing these carriers to the DIAN registry would align with the company’s mission of sharing useful information with its customers. Like the Generation program, DIAN is expanding its trials unit DIAN-TU in particular needs more participants to gain statistical power for its prevention studies.

Given the serious implications of carrying an ADAD mutation, Bateman raised the bar, suggesting that direct-to-consumer genetic testing companies have a responsibility to share this information with willing customers. After all, a person who has ordered this product has expressed an interest in genetic information, and arguably deserves follow-up in instances where there are concrete actions he or she can take in the face of distressing risk, such as join a prevention drug study. Bateman proposed a notification process by which the testing company would ask both carriers and non-carriers whether they would want to be notified if they did harbor a serious mutation. For those who answer yes, the company could perform clinical testing to confirm the result, and cover the cost of genetic counseling. Boxer would like to see a similar effort to refer carriers to registries that feed research studies such as ARTFL.

“These companies could develop a process to enable individuals, in a safe and ethical way, to find information that may change their lives and the lives of their families,” Bateman said. “As holders of this information, DTC companies are in the position to make a difference. The question is, what kind of difference do they want to make?”

Besides 23andMe, another venue for recruitment to prevention trials would be genetic-interpretation companies such as Promethease. People who find out their genotype there could be directed to trials such as the Generation program (for ApoE4 carriers) or the DIAN registry (for ADAD mutation carriers). Of course, this pool would be limited to customers who took the step of analyzing their data via Promethease.

Lennon, the co-founder of SNPedia and Promethease, said his company is willing to direct mutation carriers to prevention studies. “We could do this automatically, and easily,” he told Alzforum. “But every foundation or nonprofit we’ve gone to has ultimately said no.” While he declined to name the organizations, he said that they were wary of referrals based on unvalidated data, or that they asked for too much exclusivity at the expense of other foundations.

Lennon told Alzforum that Promethease tries to give carriers of pathogenic variants medically useful information as available. One source is the evidence-based summaries developed by ClinGen’s Actionability Working Group. This NIH-funded panel scores the “clinical actionability” of various pathogenic variants, and gives recommendations. Promethease presents this to carriers, as exemplified by the BRCA2 mutation below. Neurodegenerative diseases are largely absent from this ClinGen list, due to the lack of approved drugs, Lennon said.

A Call to Action? Promethease brings in recommendations from ClinGen to help mutation carriers take action to prevent or treat disease.

Even as researchers would like to see DTC companies step up their games on counseling and referral, 23andMe has contributed to research in other areas, especially Parkinson’s. Spurred in part by the discovery that Sergey Brin, the ex-husband of 23andMe founder Anne Wojcicki, carries a pathogenic LRRK2 mutation, 23andMe tackled the genetic architecture of PD. Partnering with the Michael J. Fox Foundation, 23andMe gathered a deeply genotyped and phenotyped cohort of PD patients who have contributed to GWAS, informed biomarker research, and whose DNA is currently being plumbed in search of rare pathogenic variants (Jul 2014 news and Nalls et al., 2015).—Jessica Shugart


Rezultati

Minimal marker set

Up to week 18, the high-quality COG-UK sequence alignment comprised 14,277 sequences, as indicated in the accompanying metadata file. We found 41 SNPs meeting our criteria of a minimum minor allele frequency of 0.1%. Of these, our pipeline identified 22 as sufficient to provide the maximum possible discrimination between samples in the COG-UK dataset. Three SNPs were removed manually from this list as either their flanking sequences (for probe design) were overlapping or contained ambiguous bases (‘N’) close to the SNP of interest. Prior to wet-lab marker validation, we found that these 19 SNPs were capable of delineating 59 distinct variants from the COG-UK sequence alignment (S3 Table). To test the discriminatory power of the 19-marker set (hereafter, named the test set), random pairs of haplotypes for our marker positions were sampled from the COG-UK sequence alignment without replacement. We found that 89.1% of 6,202 random sample pairs were distinct at one of more marker positions. The flanking sequences for the 19 selected SNPs of the test set (S1 Table), were sent to 3CR Biosciences for probe design.

Synonymous and non-synonymous SNPs.

All nineteen SNP markers in the test set target SNPs located in coding sequences. With regard to the codons within the open reading frame (ORF) of these genes, five of the SNPs were at position 1, six at position 2 and eight at position 3. Twelve of the SNPs were non-synonymous and would result in changes to the amino acid at the given position (Table 2).

Evaluation of the test set.

Initial evaluation of the test set was performed using the two cell culture propagated SARS-CoV-2 isolates GBR/Liverpool_strain/2020 and hCoV-19/England/02/2020. The two virus genomes vary at ten nucleotide positions (Table 1) but have no differences in the wt spike gene sequences. However, in addition to the wt viral genome, the hCoV-19/England/02/2020 virus stock was known to contain a variant genome that arose during viral passage in tissue culture, which had a 24 nt in frame deletion in the spike gene sequence (BrisΔS, Table 1). Genotypes were obtained for all 19 markers (Table 3).

Concordance between genotyping and sequencing.

The two SARS-CoV-2 isolates GBR/Liverpool_strain/2020 and hCoV-19/England/02/2020 had been sequenced, enabling a comparison with our genotyping data (Table 3). All genotyping results were concordant with the sequence data. In two cases, it was possible to confirm SNPs (at nts 11083 and 28144) differentiating the two wt SARS-CoV-2 isolates with both sequence and genotyping data. We also compared these data with the available COG-UK sequences from the 2020-05-08 dataset (representing PCR positives samples circulating March–May 2020). This showed that the majority of genotype calls concord with the major allele found in the COG-UK database.

Genotyping clinical SARS-CoV-2 samples

To further evaluate the test set we genotyped 50 SARS-CoV-2 positive samples obtained from PHE (samples collected from the South West of England). For 41 of the 50 samples, results were obtained from at least 50% of the SNP markers in our panel those that fell below this threshold were excluded from further analysis (S4 Table). For 22 of the remaining 41 samples results were obtained for all 19 markers and for a further 13 samples, results were obtained from at least 15 of the 20 markers.

We found that 11 of the 19 markers were polymorphic among the 50 PHE samples and could be used to assign them to 15 distinct groups (Fig 3 and S4 Table). To quantify the utility of our SNP panel in separating positive samples into distinct groups, we sampled random pairs of the 50 genotyped samples 1000 times and found that they were separated by at least one marker in 619 cases (61.9%).

SNPs with a single allele call per sample are marked in dark blue (major allele) or orange (minor allele). Mixed calls are shown in gold and missing data in light blue. Twelve out of 19 markers were polymorphic in our small test panel of PHE samples and cell lines (eleven out of 19 markers were polymorphic in PHE samples) and eight samples had mixed calls for one or more markers.

Marker fail rate in PHE samples.

The average fail rate by marker (that is, the marker produced no signal for some samples) was 19.4% ranging from 4% (marker Bris_SARS-CoV-2_25429) to 32% (markers Bris_SARS-CoV-2_2558 and Bris_SARS-CoV-2_25350). The number of fails per sample ranged from 0% (22 of the samples) to 80% (2 of the samples) those samples with fewer than 10 calls (9 in total) were removed from further analysis (S4 Table).

An evolving target

The Microreact website [8] shows how SARS-CoV-2 lineage frequencies have changed during the outbreak and similarly the SNPs we targeted in our panel also changed in frequency over time. To quantify the effect of alterations in SNP frequency over time on the discriminative power of the 19 SNP panel, it was tested bioinformatically against random pairs of samples drawn from week 19 through week 35 in the 2020-09-03 COG-UK data. The probability of the original marker set discriminating a random pair of samples decreased from 89.1 to 77.6%. There was, however, an anomaly in this analysis as our G/T SNP at position 11,083, recorded as a variant in the 2020-05-08 COG-UK data and polymorphic in our genotyping results, is reported as the non-IUPAC character “?” the 2020-09-03 COG alignment due to it exhibiting homoplasy in phylogenetic reconstruction (Andrew Rambaut, personal communication). The loss of data for this marker from the latest COG-UK alignment means we will have underestimated the discriminatory power of our panel on more recent samples. Nonetheless, we re-ran the SNP marker discovery pipeline on the week 19–35 sequences and found that the number of SNPs present at a frequency greater than 0.001 had increased from 41 to 97 (noting that the SNP at 11,083 has been masked out of that alignment) and that 51 markers were now required to discriminate all samples to the maximum amount possible. However, the majority of variants were extremely rare, such that just the first 24 markers (S5 Table) were capable of discriminating 95% of randomly selected sample pairs.


Promethease — a tool for anyone to understand genetic health risks

Promethease is a literature retrieval system that pulls its information from SNPedia, a vast wiki of research studies on how genes affect (predominantly) medical traits. The genetic information is then mapped against the genetic data you uploaded to generate a personal DNA report on genetic health risks. Medical information based on your raw data has been notably hard to come by in consumer DNA tests, owing to strict FDA oversight. Combine that with Promethease’s not-so-friendly user interface, and we have many people missing out on the benefits of a Promethease report — namely getting an idea of whether they have any medical risks to look out for.

That being said, there are clear limitations rooted in how Promethease functions. Details like the genetic variants referenced in a report not being corroborated by other research studies or having SNPs with contradicting results can really throw you off course. Many of these limitations can be controlled for by making use of the many filters they have.

So before we get into how to interpret your report, let me share my guidelines for pre-filtering the results. The main purpose is so that you don’t get unnecessarily overwhelmed when you first see your report, so feel free to adjust the specific numbers as you see fit. (Once you’re more familiar with the format, I encourage you to play around with the numerous filter settings.)


Poglej si posnetek: Создатель Ходячих мертвецов раскрыл происхождение вируса зомби (Avgust 2022).