Friday, May 30, 2025

0xEA Refitting Peng-Robinson Calculation's with Pubchem API

 

Okay, this is an exciting challenge! Refitting the PengRobinsonEOS program to accept PubChem API input for compound analysis means that instead of hardcoding Tc, Pc, and omega, we'll fetch these properties from PubChem for a given compound name.

Important Considerations and Limitations:

  1. Critical Properties in PubChem:

    • PubChem is a chemical information database, not primarily a thermodynamic property database for engineering calculations. While it has a vast amount of data, critical temperature (Tc), critical pressure (Pc), and acentric factor (omega) are often not directly available in PubChem for most compounds.
    • You might find them if they've been specifically submitted by a contributor (e.g., in the "Depositor-Supplied Synonyms and Properties" section, or in a "Chemical and Physical Properties" section if curated).
    • If critical properties are not found, the program will not be able to perform the EOS calculation. This is a significant limitation. For robust thermodynamic modeling, specialized databases (like NIST Chemistry WebBook, DIPPR, or commercial databases) or property estimation methods are typically used.
    • I will implement a search for these properties, but be aware they might not always be found. We'll need robust error handling for this.
  2. pubchempy vs. Direct PUG-REST:

    • pubchempy is excellent for common queries (SMILES, InChIKey, general properties).
    • For less common properties like critical constants, we might need to use direct PUG-REST requests to query specific sections of a compound's record, as pubchempy's get_properties might not expose all data fields. I'll show how to do this.
  3. Units: Ensure consistency. PubChem data might be in various units, so conversion might be necessary (e.g., MPa to Pa, Celsius to Kelvin). We'll standardize to SI units (Pa, K) for the EOS.

Refitted Program Design:

  1. Input: The user will provide a compound name.
  2. PubChem Lookup:
    • Use pubchempy to get the CID from the compound name.
    • Use direct PUG-REST calls to try and find Tc, Pc, and omega for that CID. We'll search in a few common sections.
  3. Peng-Robinson Calculation: If the properties are found, proceed with the existing PengRobinsonEOS calculations.
  4. Error Handling: Gracefully handle cases where properties are not found or API calls fail.

Refitted Program Code


Double-click (or enter) to edit


import numpy as np
from scipy.optimize import fsolve
import pubchempy as pcp
import requests
import json
import time

class PengRobinsonEOS:
    """
    Peng-Robinson Equation of State for a pure component.
    Calculates volume/density and provides a heuristic phase state.
    """
    def __init__(selfTcPcomega):
        """
        Tc: Critical Temperature (K)
        Pc: Critical Pressure (Pa)
        omega: Acentric Factor
        """
        if not all([isinstance(val, (int, float)) and val > 0 for val in [Tc, Pc, omega]]):
            raise ValueError("Tc, Pc, and omega must be positive numeric values.")

        self.Tc = Tc
        self.Pc = Pc
        self.omega = omega
        self.R = 8.314462618  # Universal Gas Constant (J/(mol*K))

        # Peng-Robinson constants
        self.A = 0.45724
        self.B = 0.07780

        # Calculate a_c and b_c
        self.ac = self.A * (self.R**2 * self.Tc**2) / self.Pc
        self.b = self.B * (self.R * self.Tc) / self.Pc

    def alpha(selfT):
        """Alpha parameter dependent on temperature."""
        kappa = 0.37464 + 1.54226 * self.omega - 0.26992 * self.omega**2
        Tr = T / self.Tc  # Reduced Temperature
        return (1 + kappa * (1 - np.sqrt(Tr)))**2

    def a_parameter(selfT):
        """Temperature-dependent 'a' parameter."""
        return self.ac * self.alpha(T)

    def Z_cubic_equation(selfZTP):
        """
        Cubic equation for compressibility factor Z for PR EOS.
        Z^3 - (1 - B')Z^2 + (A' - 3B'^2 - 2B')Z - (A'B' - B'^2 - B'^3) = 0
        where A' = a*P / (R^2 * T^2) and B' = b*P / (R*T)
        """
        A_prime = self.a_parameter(T) * P / (self.R**2 * T**2)
        B_prime = self.b * P / (self.R * T)

        return (Z**3 - (1 - B_prime) * Z**2 +
                (A_prime - 3 * B_prime**2 - 2 * B_prime) * Z -
                (A_prime * B_prime - B_prime**2 - B_prime**3))

    def calculate_Z(selfTP):
        """
        Calculates the compressibility factor Z for given T and P.
        Returns all real roots. For subcritical conditions, can be 1 or 3 roots.
        For supercritical, typically 1 real root.
        """
        initial_guesses = [0.01, 0.5, 0.99, 1.1] # Added 1.1 for higher Z roots in some cases
        roots = []
        for guess in initial_guesses:
            try:
                # Use a smaller tolerance for fsolve to ensure better precision for roots
                root = fsolve(self.Z_cubic_equation, guess, args=(T, P), xtol=1e-8)[0]
                # Filter out complex roots and add unique real roots
                if np.isreal(root) and np.isclose(root.imag, 0):
                    roots.append(np.real(root))
            except Exception:
                pass # fsolve might fail for some guesses or non-real roots

        # Remove duplicates, round for floating point noise, and sort
        unique_roots = sorted(list(set(np.round(roots, 6))))
        return unique_roots

    def calculate_molar_volume(selfTP):
        """
        Calculates the molar volume (m^3/mol) using calculated Z.
        Returns a list of possible volumes.
        """
        Zs = self.calculate_Z(T, P)
        volumes = []
        for Z in Zs:
            if Z > 0: # Z must be positive
                volumes.append(Z * self.R * T / P)
        return volumes

    def get_phase_state(selfTP):
        """
        Determines a very simplistic 'phase state' based on T, P relative to critical point.
        This is a heuristic and not a rigorous phase equilibrium calculation.
        """
        if T >= self.Tc and P >= self.Pc:
            return "Supercritical"
        elif T < self.Tc and P < self.Pc:
            return "Subcritical (potentially liquid/vapor)"
        elif T >= self.Tc and P < self.Pc:
            return "Superheated Vapor (above critical T, below critical P)"
        elif T < self.Tc and P >= self.Pc:
            return "Compressed Liquid (below critical T, above critical P)"
        else:
            return "Undefined"


# --- PubChem Integration Functions ---

def get_compound_cid(compound_name):
    """Searches PubChem for a compound by name and returns its CID."""
    try:
        compounds = pcp.get_compounds(compound_name, 'name')
        if compounds:
            print(f"  Found '{compound_name}'. CID: {compounds[0].cid}")
            return compounds[0].cid
        else:
            print(f"  No compound found for '{compound_name}'.")
            return None
    except Exception as e:
        print(f"  Error getting CID for '{compound_name}': {e}")
        return None

def fetch_critical_properties_from_pubchem(cid):
    """
    Fetches Critical Temperature (Tc), Critical Pressure (Pc), and Acentric Factor (omega)
    from PubChem using PUG-REST. These properties are often *not* directly available
    or are found in less structured 'Depositor-Supplied' fields.
    """
    base_url = "https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/"
    url = f"{base_url}cid/{cid}/JSON/"
    properties = {'Tc': None, 'Pc': None, 'omega': None}

    print(f"  Attempting to fetch detailed data for CID {cid} from PubChem...")
    try:
        response = requests.get(url, timeout=30) # 30-second timeout
        response.raise_for_status() # Raise an exception for HTTP errors (4xx or 5xx)
        data = response.json()

        # PubChem's PUG-View JSON structure is complex. We'll search common sections.
        # This is a heuristic search as critical properties aren't standardized.
        for record in data.get('Record', {}).get('Section', []):
            if record.get('TOCHeading') == 'Chemical and Physical Properties':
                for prop_section in record.get('Section', []):
                    # Look for Critical Temperature/Pressure
                    if 'Critical Temperature' in prop_section.get('TOCHeading', '') or \
                       'Critical Pressure' in prop_section.get('TOCHeading', ''):
                        for item in prop_section.get('Information', []):
                            for value_dict in item.get('Value', {}).get('StringWithMarkup', []):
                                text = value_dict.get('String')
                                if text:
                                    # Simple regex to find numbers followed by units
                                    # This is fragile and would need more robust parsing for production
                                    if 'Critical Temperature' in prop_section['TOCHeading'] and 'K' in text:
                                        try:
                                            # Assuming format like "273.15 K"
                                            val = float(text.split(' ')[0])
                                            properties['Tc'] = val
                                            print(f"    Found Critical Temperature: {val} K")
                                        except ValueError:
                                            pass
                                    if 'Critical Pressure' in prop_section['TOCHeading'] and 'Pa' in text:
                                        try:
                                            # Assuming format like "7.377e6 Pa"
                                            val = float(text.split(' ')[0])
                                            properties['Pc'] = val
                                            print(f"    Found Critical Pressure: {val} Pa")
                                        except ValueError:
                                            pass
                                    elif 'Critical Pressure' in prop_section['TOCHeading'] and 'MPa' in text:
                                        try:
                                            val = float(text.split(' ')[0]) * 1e6 # Convert MPa to Pa
                                            properties['Pc'] = val
                                            print(f"    Found Critical Pressure: {val/1e6} MPa (converted to Pa)")
                                        except ValueError:
                                            pass
                                    elif 'Critical Pressure' in prop_section['TOCHeading'] and 'bar' in text:
                                        try:
                                            val = float(text.split(' ')[0]) * 1e5 # Convert bar to Pa
                                            properties['Pc'] = val
                                            print(f"    Found Critical Pressure: {val/1e5} bar (converted to Pa)")
                                        except ValueError:
                                            pass

                    # Look for Acentric Factor (less common in general properties)
                    if 'Acentric Factor' in prop_section.get('TOCHeading', ''):
                        for item in prop_section.get('Information', []):
                            for value_dict in item.get('Value', {}).get('StringWithMarkup', []):
                                text = value_dict.get('String')
                                if text:
                                    try:
                                        val = float(text)
                                        properties['omega'] = val
                                        print(f"    Found Acentric Factor: {val}")
                                    except ValueError:
                                        pass

            # Also check Depositor-Supplied values, as these are often where such specific data lives
            if record.get('TOCHeading') == 'Depositor-Supplied Synonyms and Properties':
                 for section in record.get('Section', []):
                    for item in section.get('Information', []):
                        for value_dict in item.get('Value', {}).get('StringWithMarkup', []):
                            text = value_dict.get('String')
                            if text:
                                # Look for keywords and try to extract numbers
                                if 'critical temperature' in text.lower() and 'k' in text.lower() and properties['Tc'] is None:
                                    try:
                                        val = float(text.lower().split('critical temperature')[1].split('k')[0].strip())
                                        properties['Tc'] = val
                                        print(f"    Found Tc in depositor data: {val} K")
                                    except ValueError:
                                        pass
                                if 'critical pressure' in text.lower() and 'pa' in text.lower() and properties['Pc'] is None:
                                     try:
                                        val = float(text.lower().split('critical pressure')[1].split('pa')[0].strip())
                                        properties['Pc'] = val
                                        print(f"    Found Pc in depositor data: {val} Pa")
                                     except ValueError:
                                        pass
                                if 'acentric factor' in text.lower() and properties['omega'] is None:
                                     try:
                                        val = float(text.lower().split('acentric factor')[1].strip())
                                        properties['omega'] = val
                                        print(f"    Found Omega in depositor data: {val}")
                                     except ValueError:
                                        pass


        # Check if we found all properties
        if all(is not None for v in properties.values()):
            print("  Successfully found all critical properties.")
            return properties
        else:
            print("  Could not find all critical properties in PubChem for this compound.")
            print(f"  Found: Tc={properties['Tc']} K, Pc={properties['Pc']/1e6 if properties['Pc'] else None} MPa, Omega={properties['omega']}")
            return None

    except requests.exceptions.RequestException as e:
        print(f"  Network or API error while fetching PubChem data: {e}")
        return None
    except json.JSONDecodeError as e:
        print(f"  Error parsing JSON from PubChem: {e}")
        return None
    except Exception as e:
        print(f"  An unexpected error occurred: {e}")
        return None


# --- Main execution loop ---
if __name__ == "__main__":
    while True:
        compound_name_input = input("\nEnter compound name (e.g., carbon dioxide, methane, water, or 'exit' to quit): ").strip()
        if compound_name_input.lower() == 'exit':
            break

        print(f"\n--- Analyzing '{compound_name_input}' ---")
        cid = get_compound_cid(compound_name_input)

        if cid:
            # Add a small delay to respect PubChem's rate limits
            time.sleep(0.1) # 100 milliseconds
            critical_props = fetch_critical_properties_from_pubchem(cid)

            if critical_props:
                try:
                    compound_pr = PengRobinsonEOS(
                        critical_props['Tc'],
                        critical_props['Pc'],
                        critical_props['omega']
                    )

                    print(f"\n--- Peng-Robinson EOS Parameters for {compound_name_input} ---")
                    print(f"  Critical Temperature (Tc): {compound_pr.Tc:.2f} K")
                    print(f"  Critical Pressure (Pc): {compound_pr.Pc / 1e6:.4f} MPa")
                    print(f"  Acentric Factor (omega): {compound_pr.omega:.4f}")
                    print("-" * 40)

                    # --- Example calculations for this compound ---
                    # Test at a temperature just above Tc and pressure just above Pc
                    T_supercritical = compound_pr.Tc * 1.01
                    P_supercritical = compound_pr.Pc * 1.05

                    print(f"--- Test near Supercritical Point (T={T_supercritical:.2f} K, P={P_supercritical/1e6:.2f} MPa) ---")
                    volumes_sc = compound_pr.calculate_molar_volume(T_supercritical, P_supercritical)
                    if volumes_sc:
                        print(f"  Compressibility Factor Z: {volumes_sc[0] / (compound_pr.R * T_supercritical / P_supercritical):.4f}")
                        print(f"  Molar Volume (m^3/mol): {volumes_sc[0]:.6f}")
                    else:
                        print("  Could not calculate molar volume for these conditions.")
                    print(f"  Phase State (heuristic): {compound_pr.get_phase_state(T_supercritical, P_supercritical)}")
                    print("-" * 40)

                    # Test at subcritical conditions (e.g., 90% Tc, 80% Pc)
                    T_subcritical = compound_pr.Tc * 0.90
                    P_subcritical = compound_pr.Pc * 0.80

                    print(f"--- Test at Subcritical Conditions (T={T_subcritical:.2f} K, P={P_subcritical/1e6:.2f} MPa) ---")
                    volumes_sub = compound_pr.calculate_molar_volume(T_subcritical, P_subcritical)
                    if len(volumes_sub) > 1: # Expecting 3 roots for liquid/vapor
                        Zs_sub = compound_pr.calculate_Z(T_subcritical, P_subcritical)
                        print(f"  Roots for Z (typically vapor, liquid, unstable){[f'{z:.4f}' for z in Zs_sub]}")
                        # Take the minimum (liquid-like) and maximum (vapor-like) volumes
                        liquid_like_volume = min(volumes_sub)
                        vapor_like_volume = max(volumes_sub)
                        print(f"  Possible Molar Volumes (m^3/mol): Liquid-like={liquid_like_volume:.6f}, Vapor-like={vapor_like_volume:.6f}")
                    elif volumes_sub:
                        print(f"  Compressibility Factor Z: {volumes_sub[0] / (compound_pr.R * T_subcritical / P_subcritical):.4f}")
                        print(f"  Molar Volume (m^3/mol): {volumes_sub[0]:.6f}")
                    else:
                        print("  Could not calculate molar volume for these conditions.")
                    print(f"  Phase State (heuristic): {compound_pr.get_phase_state(T_subcritical, P_subcritical)}")
                    print("-" * 40)

                except ValueError as ve:
                    print(f"  Error initializing PengRobinsonEOS: {ve}")
                except Exception as ex:
                    print(f"  An error occurred during EOS calculation: {ex}")
            else:
                print(f"  Cannot perform EOS calculation for '{compound_name_input}' due to missing critical properties.")
        else:
            print(f"  Failed to find CID for '{compound_name_input}'.")


How to Run and Test:

  1. Save: Save the code as a Python file (e.g., pubchem_eos.py).
  2. Install Libraries: If you haven't already, install the necessary libraries:
    pip install numpy scipy pubchempy requests Pillow
    (Pillow isn't strictly needed for this code but is useful for general PubChem image functions).
  3. Execute: Run the script from your terminal:
    python pubchem_eos.py
  4. Enter Compound Names:
    • Try "carbon dioxide": You should find critical properties for CO2 as it's a very common substance for which this data is well-established and often appears in PubChem.
    • Try "water": Similarly, water's critical properties are usually available.
    • Try less common compounds: You will likely find that Tc, Pc, and omega are not available for many arbitrary compounds in PubChem. The program will report this.

Explanation of Changes:

  • get_compound_cid(compound_name): This function uses pubchempy to reliably get the PubChem CID for a given compound name.
  • fetch_critical_properties_from_pubchem(cid):
    • This is the most critical new function. It constructs a direct PUG-REST URL to get the full JSON view of a compound record (using pug_view).
    • It then iterates through the complex JSON structure to look for sections that might contain "Critical Temperature," "Critical Pressure," and "Acentric Factor."
    • String Parsing: It uses basic string splitting and type conversion to extract numeric values and converts common units (MPa, bar) to Pascals for consistency with the EOS. This part is inherently fragile due to the unstructured nature of much of PubChem's text-based data.
    • Return Value: Returns a dictionary {'Tc': ..., 'Pc': ..., 'omega': ...} if all are found, otherwise None.
  • Main Execution Loop (if __name__ == "__main__":)
    • Prompts the user for a compound name.
    • Calls get_compound_cid.
    • If a CID is found, it calls fetch_critical_properties_from_pubchem.
    • Conditional EOS Calculation: Only if critical_props are successfully retrieved will it instantiate PengRobinsonEOS and perform the calculations.
    • Error Handling: Includes try-except blocks for network issues, JSON parsing, and general exceptions during both PubChem interaction and EOS calculation.
    • Rate Limiting: Added time.sleep(0.1) after each PubChem API call to help comply with NCBI's usage policies.

This refitted program provides a practical example of integrating external data sources like PubChem with scientific calculation programs. Keep in mind the limitations regarding the availability of specific thermodynamic properties in PubChem.

No comments: