We prime-encode the natural numbers via recursive factorisation, iterated to the exponents, generating a corpus of planar rooted trees equivalently represented as Dyck words. This forms a deterministic text endowed with internal rules. Statistical analysis of the corpus reveals that the dictionary and the entropy grow sublinearly, compression shows non-monotonic trend, and the rank-frequency curves assume a stable parabolic form deviating from Zipf’s law. Correlation analysis using mean-squared displacement reveals a transition from normal diffusion to superdiffusion in the associated walk. These findings characterise the tree-encoded sequence as a statistically structured text with long-range correlations grounded in its generative arithmetic law, providing an empirical basis for subsequent theoretical investigations and empirical ones with large language models.

Statistical properties of the rooted-tree encoding of N / Contucci, Pierluigi; Giberti, Claudio; Osabutey, Godwin; Vernia, Cecilia. - In: PHYSICA. A. - ISSN 0378-4371. - 686:(2026), pp. 1-16. [10.1016/j.physa.2026.131361]

Statistical properties of the rooted-tree encoding of N

Giberti, Claudio;Osabutey, Godwin
;
Vernia, Cecilia
2026

Abstract

We prime-encode the natural numbers via recursive factorisation, iterated to the exponents, generating a corpus of planar rooted trees equivalently represented as Dyck words. This forms a deterministic text endowed with internal rules. Statistical analysis of the corpus reveals that the dictionary and the entropy grow sublinearly, compression shows non-monotonic trend, and the rank-frequency curves assume a stable parabolic form deviating from Zipf’s law. Correlation analysis using mean-squared displacement reveals a transition from normal diffusion to superdiffusion in the associated walk. These findings characterise the tree-encoded sequence as a statistically structured text with long-range correlations grounded in its generative arithmetic law, providing an empirical basis for subsequent theoretical investigations and empirical ones with large language models.
2026
686
1
16
Statistical properties of the rooted-tree encoding of N / Contucci, Pierluigi; Giberti, Claudio; Osabutey, Godwin; Vernia, Cecilia. - In: PHYSICA. A. - ISSN 0378-4371. - 686:(2026), pp. 1-16. [10.1016/j.physa.2026.131361]
Contucci, Pierluigi; Giberti, Claudio; Osabutey, Godwin; Vernia, Cecilia
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S037843712600097X-main.pdf

Open access

Tipologia: VOR - Versione pubblicata dall'editore
Licenza: [IR] creative-commons
Dimensione 3.84 MB
Formato Adobe PDF
3.84 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1395828
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact