Skip to content

xyz2mol ? #121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Nick-Mul opened this issue Apr 20, 2025 · 4 comments
Open

xyz2mol ? #121

Nick-Mul opened this issue Apr 20, 2025 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@Nick-Mul
Copy link

I think this project is great but something I think could be really useful would be converting xyz file to a MolGraph, I had a unsuccessful attempt at it and I was wondering if you could suggest a way to create a MolGraph I was thinking via a dictionary

Many thanks,
Nick

# Function to read an XYZ file and extract atomic symbols and coordinates
function read_xyz(filename::String)
    open(filename, "r") do file
        n_atoms = parse(Int, readline(file)) # First line: number of atoms
        readline(file)  # Second line: comment, skip it
        
        atoms = []
        coords = []
        
        # Read the remaining lines: atomic symbols and coordinates
        for line in eachline(file)
            parts = split(line)
            push!(atoms, parts[1])  # Atomic symbol
            x, y, z = parse.(Float64, parts[2:4])  # Coordinates
            push!(coords, [x, y, z])
        end
        
        return atoms, hcat(coords...)
    end
end

# Function to calculate pairwise bond distances
function bond_distance_matrix(coords::Matrix{Float64})
    n = size(coords, 2)
    distance_matrix = zeros(Float64, n, n)
    
    # Calculate pairwise distances
    for i in 1:n
        for j in i+1:n
            distance = norm(coords[:, i] - coords[:, j])
            distance_matrix[i, j] = distance
            distance_matrix[j, i] = distance  # Symmetric matrix
        end
    end
    
    return distance_matrix
end

# Assign bonds based on distance and covalent radii
function assign_bonds(atoms, dist_matrix::Matrix{Float64})
    bonds = []
    n = length(atoms)
    for i in 1:n, j in i+1:n
        ri = get(covalent_radii, atoms[i], 0.0)
        rj = get(covalent_radii, atoms[j], 0.0)
        if ri == 0.0 || rj == 0.0
            continue  # Unknown element, skip
        end
        threshold = (ri + rj) * vdw_covalent_factor
        if dist_matrix[i, j]  threshold
            push!(bonds, (i, j))
        end
    end
    return bonds
end
@mojaie
Copy link
Owner

mojaie commented Apr 20, 2025

Thank you very much!
I'm not familiar with xyz format. Are there any specification of the file format?

I was working on z-matrix format (src/geometry/internal.jl), but not so active now.

@mojaie mojaie added enhancement New feature or request help wanted Extra attention is needed labels Apr 20, 2025
@Nick-Mul
Copy link
Author

the xyz files are used a lot in QM calculations. Typically the format is:

<number of atoms>
comment line
<element> <X> <Y> <Z>

comment line can be anything, but can be used to store some molecular information like charge and multiplicity.

https://en.wikipedia.org/wiki/XYZ_file_format
https://sites.google.com/site/orcainputlibrary/geometry-input#h.l5apadpu3hzw

FYI I was based my initial attempt on this line 62 https://github.com/TUHH-TVT/openCOSMO-RS_conformer_pipeline/blob/main/ConformerGenerator.py

@mojaie
Copy link
Owner

mojaie commented Apr 21, 2025

It looks quite difficult for me. One possible starting point is minimum spanning tree algorithm, but short distance paths does not always mean covalent bonds.
A good conformation should already have been generated. Even then, it will still be difficult to distinguish hydrogen bonds from covalent bonds (and being able to do so is not necessary in QM tasks).

@Nick-Mul
Copy link
Author

Yes it's a bit of a tricky problem and it tock a while for it get into rdkit.

Jenson group wrote xyz2mol in python https://github.com/jensengroup/xyz2mol which was based on this publication DOI: 10.1002/bkcs.10334

This was then ported over to C++ https://greglandrum.github.io/rdkit-blog/posts/2022-12-18-introducing-rdDetermineBonds.html

I believe the code does assign connectivity with a distance matrix but you're completely right it does require somewhat sensible conformations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants