as Marshall College Professor Jones could have said: "This belongs in a database!" https://doi.org/10.1021/acsomega.4c10413
Well, the first pKa column and all arsonic acids are now in @wikidata :)
as Marshall College Professor Jones could have said: "This belongs in a database!" https://doi.org/10.1021/acsomega.4c10413
Well, the first pKa column and all arsonic acids are now in @wikidata :)
new BridgeDb Datasources release: https://github.com/bridgedb/datasources/releases/tag/20270728
"Uses UniProtKB as name, removes EcoGene, and multiple small updates"
BridgeDb Datasources is a dataset with metadata about data sources and organisms used by BridgeDb Java and downstream tools like @wikipathways, PathVisio, and others
Join Franciszek Job at EuroSciPy as he presents a scalable framework to unify chemical datasets from sources like PubChem, UniChem & COCONUT.
Canonicalize with RDKit
Scale via Dask
Deduplicate with InChI keys
Ideal for ML pretraining, benchmarking, and chemical data analysis.
Schedule: https://lnkd.in/eaAxwUN2
Tickets: https://lnkd.in/end9aYzE
chemical compound identifiers in @wikidata: https://edu.nl/xb8y7
Most cheminformatics code that queries ChEMBL struggles with reproducibility.
chembl-downloader can help:
>>> import chembl_downloader as cd
>>> df = cd.query("""
SELECT chembl_id, pref_name
FROM molecule_dictionary
WHERE pref_name IS NOT NULL
""")
It's even sneaking its way into @wpwalters and @dr_greg_landrum blogs :)
Code/Docs: https://github.com/cthoyt/chembl-downloader
Preprint: https://arxiv.org/pdf/2507.17783
New Preprint Alert!
We're excited to share our latest work on #ChemRxiv! MARCUS (Molecular Annotation and Recognition for Curating Unravelled Structures) is a web-based platform for extracting chemical information from scientific papers.
Preprint: https://doi.org/10.26434/chemrxiv-2025-9p1q1
Try it out: https://marcus.decimer.ai
ha, I managed to convert a CXSMILES in @wikidata via the MDL V2000 molfile and the #inchi webtool into an #InChI (!B) and InChIKey :) https://www.wikidata.org/wiki/Q66421202#P117
heading to the 2nd day of the Technical InChI Meeting in Aachen/Germany
heading tomorrow to the InChI Technical Exchange Meeting Summer 2025 in Aachen/DE
Looking forward to it, and particularly talking about the InChI for inorganics and trying that in @wikidata :) See https://doi.org/10.26434/chemrxiv-2025-53n0w
And also the nano InChI, see https://doi.org/10.3390/nano10122493
updating some java #cheminformatics libraries... Euclid and CMLXOM, and possibly making a Bacting release... but need to continue working on the CDK Depiction book chapter too...
oh boy... finally getting there... so close to finalizing and release a new SMARTCyp?
See for context https://chem-bla-ics.linkedchemistry.info/2024/04/07/cdk2024.html and https://chem-bla-ics.linkedchemistry.info/2024/06/16/cdk2024-3.html #cdk2024
did you notice I started @Codeberg ?
# Qleverfile for PubChem, use with the
# QLever CLI (`pip install qlever`)
#
# qlever get-data # ~2 hours, ~120 GB, ~19 billion triples
# qlever index # ~6 hours, ~20 GB RAM, ~350 GB disk space (for the index)
# qlever start # a few seconds
Source: https://github.com/ad-freiburg/qlever-control/blob/main/src/qlever/Qleverfiles/Qleverfile.pubchem
I think I am going to try to recover a bit of #cheminformatics / #chemistry #history, and make the index of the Internet Journal of Chemistry (IJC) FAIR in @wikidata
While the journal no longer exists, many articles are cited quite a few times.
I did some exploration some time ago, and for some I found full text "self-archiving" versions online.
And, TIL that Web of Science has entries for the articles too, which I just added for the 9 articles already in #Wikidata: https://w.wiki/Eide
Anyone have views or references on the effectiveness of count/sparse #cheminformatics fingerprints compared to binary/dense fingerprints?
What about comparing methods to turn count/sparse fingerprints into binary ones? I know of several approaches, but nothing methodical or published.
I'm trying to understand how I might add sparse/count fingerprints to chemfp.
new blog post: "PFAS in the blood of the Dutch population" https://chem-bla-ics.linkedchemistry.info/2025/07/06/pfas-in-the-blood-of-the-dutch-population.html
About the recent @rivm report.
"So, what can I do to make this report more FAIR?"
and follows by CMLXOM 4.12: https://github.com/BlueObelisk/cmlxom/releases/tag/cmlxom-4.12
CMLXOM is a Java library for processing CML, implementing the XML object model (XOM) for the Chemical Markup Language (CML).
Euclid 2.11 is released: https://github.com/BlueObelisk/euclid/releases/tag/euclid-2.11
Euclid is a library of numeric, geometric and XML routines.
Version 5.0b1 of chemfp - the comprehensive package for binary #cheminformatics fingerprints - is out and ready for curious beta testers.
Here's are highlights. For more info and Linux install info see https://chemfp.com/chemfp-50b1-available-for-beta-testing.html
- shardsearch to search multiple files
- simhistogram for a histogram of pairwise Tanimoto comparisons
- the FPB file size limit increased from ~250M fingerprints to well over a billion
- new Klekota-Roth fingerprint type for RDKit and OpenEye
Ammar's PhD thesis can be found online at https://cris.maastrichtuniversity.nl/en/publications/enhancing-the-interoperability-and-reusability-of-nanosafety-data and his defense is live streamed, see the link at https://www.maastrichtuniversity.nl/events/phd-defence-ammar-ammar
It contains a lot of #openscience: https://social.edu.nl/@egonw/114766897696096032
And in the afternoon there is a minisymposium on the thesis topic: https://social.edu.nl/@egonw/114705860626722108