Classification on Large Networks: A Quantitative Bound via Motifs and Graphons

Andreas Haupt, Thomas Schultz, Mohammad Khatami und Ngoc Tran
In: Advances in Mathematical Sciences, pages 107-126, Springer, Juli 2020
 

Abstract

When each data point is a large graph, graph statistics such as densities of certain subgraphs (motifs) can be used as feature vectors for machine learning. While intuitive, motif counts are expensive to compute and difficult to work with theoretically. Via graphon theory, we give an explicit quantitative bound for the ability of motif homomorphisms to distinguish large networks under both generative and sampling noise. Furthermore, we give similar bounds for the graph spectrum and connect it to homomorphism densities of cycles. This results in an easily computable classifier on graph data with theoretical performance guarantee. Our method yields competitive results on classification tasks for the autoimmune disease Lupus Erythematosus.

Bilder

Bibtex

@INCOLLECTION{Haupt:AMS2020,
     author = {Haupt, Andreas and Schultz, Thomas and Khatami, Mohammad and Tran, Ngoc},
      pages = {107--126},
      title = {Classification on Large Networks: A Quantitative Bound via Motifs and Graphons},
  booktitle = {Advances in Mathematical Sciences},
       year = {2020},
      month = jul,
  publisher = {Springer},
   abstract = {When each data point is a large graph, graph statistics such as densities of certain subgraphs
               (motifs) can be used as feature vectors for machine learning. While intuitive, motif counts are
               expensive to compute and difficult to work with theoretically. Via graphon theory, we give an
               explicit quantitative bound for the ability of motif homomorphisms to distinguish large networks
               under both generative and sampling noise. Furthermore, we give similar bounds for the graph spectrum
               and connect it to homomorphism densities of cycles. This results in an easily computable classifier
               on graph data with theoretical performance guarantee. Our method yields competitive results on
               classification tasks for the autoimmune disease Lupus Erythematosus.}
}